CN109710927A

CN109710927A - Name recognition methods, device, readable storage medium storing program for executing and the electronic equipment of entity

Info

Publication number: CN109710927A
Application number: CN201811519563.6A
Authority: CN
Inventors: 贾弼然; 崔朝辉; 赵立军; 张霞
Original assignee: Neusoft Corp
Current assignee: Neusoft Corp
Priority date: 2018-12-12
Filing date: 2018-12-12
Publication date: 2019-05-03
Anticipated expiration: 2038-12-12
Also published as: CN109710927B

Abstract

This disclosure relates to a kind of recognition methods, device, readable storage medium storing program for executing and electronic equipment for naming entity.Method comprises determining that t-th of target in text segments x_tCorresponding all possible true participle；For each true participle, determine that each participle state corresponds to the first condition Probability p (a really segmented respectively^d|l_i), wherein a^dD-th of characterization true participle, l_iCharacterize i-th of participle state；Correspond to the target according to each true participle and segments x_tSecond condition Probability p (x_t|a^d) and the first condition Probability p (a^d|l_i), determine that each participle state corresponds to the target and segments x_tThird condition Probability p (x_t|l_i)；According to the third condition Probability p (x_t|l_i), x is segmented to the target_tIt is named Entity recognition.In this way, improving the accuracy rate and recall rate of name Entity recognition, and it also can effectively avoid and occur the case where multiword, few word or wrong word during text identification.

Description

Name recognition methods, device, readable storage medium storing program for executing and the electronic equipment of entity

Technical field

This disclosure relates to natural language processing field, and in particular, to it is a kind of name the recognition methods of entity, device, can Read storage medium and electronic equipment.

Background technique

With the application of artificial intelligence, natural language processing is increasingly taken seriously and popularizes.And in natural language processing In engineering, name Entity recognition is one critically important step of natural language processing initial stage, for the time in text, number The entities such as word, name, place name and organization's title have great significance in many research fields.Name Entity recognition at present Hidden Markov model (HMM) is used mostly, but usually will appear some problems in identification process, for example, for opening It puts transliteration entity in collection text and is likely to occur very much different translation texts greatly, will cause very big ambiguity and very high in identification process Error rate, alternatively, will appear in marking and translating text obtained by some corpus of low quality multiword, few word or it is wrong not The problem of word.Therefore, the name entity in text cannot be accurately identified using existing HMM model.

Summary of the invention

In order to overcome problems of the prior art, the embodiment of the present disclosure provide a kind of recognition methods for naming entity, Device, readable storage medium storing program for executing and electronic equipment.

To achieve the goals above, disclosure first aspect provides a kind of recognition methods for naming entity, comprising:

Determine t-th of target participle x in text_tCorresponding all possible true participle；

For each true participle, determine that each participle state corresponds to the first condition probability really segmented respectively p(a^d|l_i), wherein a^dD-th of characterization true participle, l_iCharacterize i-th of participle state；

Correspond to the target according to each true participle and segments x_tSecond condition Probability p (x_t|a^d), Yi Jisuo State first condition Probability p (a^d|l_i), determine that each participle state corresponds to the target and segments x_tThird condition Probability p (x_t| l_i)；

According to the third condition Probability p (x_t|l_i), x is segmented to the target_tIt is named Entity recognition.

Optionally, described to be directed to each true participle, determine that each participle state corresponds to what this was really segmented respectively First condition Probability p (a^d|l_i), comprising:

For each true participle, the target participle x is determined_tCorresponding to the fourth condition probability really segmented p(a^d|x_t)；

X is segmented according to the target_tCorresponding to each fourth condition Probability p (a really segmented^d|x_t), estimate Each participle state is counted corresponding to each first condition Probability p (a really segmented^d|l_i)。

Optionally, described that x is segmented according to the target_tCorresponding to each fourth condition probability really segmented p(a^d|x_t), estimate that each participle state corresponds to each first condition Probability p (a really segmented^d|l_i), comprising:

(1), will be so that d (z to formula (2) according to the following formula_t,y_i) meet preset conditionIt is determined as each participle shape State corresponds to each first condition probability really segmented:

Wherein, D characterizes the sum really segmented,Characterize the target participle x_tCorresponding to d-th The fourth condition probability really segmented,It characterizes i-th of participle state and corresponds to first really segmented for d-th Conditional probability,Characterize the target participle x_tCorresponding to each fourth condition really segmented The vector of probability,It characterizes i-th of participle state and corresponds to the first condition each really segmented The vector of probability, d (z_t,y_i) characterization z_tAnd y_iRelative entropy.

Optionally, the preset condition are as follows: loss functionIt is minimum；Wherein, T_iCharacterization belongs to described I-th of participle state l_iTarget participle sum, L characterizes the sum of the participle state,Characterize i-th of participle shape State and the target segment x_tBetween it is whether relevant, if relevant be 1, be otherwise 0.

Optionally, described that target participle x is corresponded to according to each true participle_tSecond condition Probability p (x_t| a^d) and the first condition Probability p (a^d|l_i), determine that each participle state corresponds to the target and segments x_tThird condition Probability p (x_t|l_i), comprising:

(3) according to the following formula determine that each participle state corresponds to the target and segments x_tThird condition Probability p (x_t| l_i):

Wherein, D characterizes the sum really segmented.

Optionally, described according to the third condition Probability p (x_t|l_i), x is segmented to the target_tIt is named entity knowledge Not, comprising:

The corresponding participle state of maximum third condition probability is determined as the target participle x_tName Entity recognition knot Fruit.

Disclosure second aspect provides a kind of identification device for naming entity, comprising:

First determining module, for determining that t-th of target in text segments x_tCorresponding all possible true participle；

Second determining module, for being determined respectively for each of the first determining module determination true participle Each participle state corresponds to the first condition Probability p (a really segmented^d|l_i), wherein a^dD-th of characterization true participle, l_iTable Levy i-th of participle state；

Third determining module, it is described for being corresponded to according to each of the second determining module determination true participle Target segments x_tSecond condition Probability p (x_t|a^d) and the first condition Probability p (a^d|l_i), determine each participle state pair Target described in Ying Yu segments x_tThird condition Probability p (x_t|l_i)；

Identification module, the third condition Probability p (x for being determined according to the third determining module_t|l_i), to described Target segments x_tIt is named Entity recognition.

Optionally, second determining module includes:

First determines submodule, for being directed to each true participle, determines the target participle x_tIt is true corresponding to this Fourth condition Probability p (a segmented in fact^d|x_t)；

Submodule is estimated, for determining that the target that submodule determines segments x according to described first_tCorresponding to each institute State the fourth condition Probability p (a really segmented^d|x_t), estimate that each participle state corresponds to each institute really segmented State first condition Probability p (a^d|l_i)。

Optionally, the estimation submodule includes:

Second determines submodule, for (1), will be so that d (z to formula (2) according to the following formula_t,y_i) meet preset condition 'sIt is determined as each participle state corresponding to each first condition probability really segmented:

Optionally, the preset condition are as follows: loss functionIt is minimum；Wherein, T_iCharacterization belongs to described the I participle state l_iTarget participle sum, L characterizes the sum of the participle state,Characterize i-th of participle state X is segmented with the target_tBetween it is whether relevant, if relevant be 1, be otherwise 0.

Optionally, the third determining module includes:

Third determines submodule, segments x for according to the following formula (3), determining that each participle state corresponds to the target_t Third condition Probability p (x_t|l_i):

Wherein, D characterizes the sum really segmented.

Optionally, the identification module includes:

4th determines submodule, segments for the corresponding participle state of maximum third condition probability to be determined as the target x_tName Entity recognition result.

The disclosure third aspect also provides a kind of computer readable storage medium, is stored thereon with computer program, the journey The step of the method provided by disclosure first aspect is realized when sequence is executed by processor.

Disclosure fourth aspect also provides a kind of electronic equipment, comprising:

Memory is stored thereon with computer program；

Processor, for executing the computer program in the memory, to realize that disclosure first aspect is mentioned The step of the method for confession.

Through the above technical solutions, due to considering target participle when being named Entity recognition to target participle Corresponding all possible true participle and each participle state correspond to the first condition probability really segmented and each true Participle corresponds to the second condition probability of target participle, in this way, resulting each participle state corresponds to the Article 3 of target participle Relationship between part probability, substantially characterization target participle, true participle and participle state three, and then it is real to improve name The accuracy rate and recall rate of body identification, and also can effectively avoid and occur multiword, few word or wrong other during text identification The case where word.

Other feature and advantage of the disclosure will the following detailed description will be given in the detailed implementation section.

Detailed description of the invention

Attached drawing is and to constitute part of specification for providing further understanding of the disclosure, with following tool Body embodiment is used to explain the disclosure together, but does not constitute the limitation to the disclosure.In the accompanying drawings:

Fig. 1 is a kind of flow chart of recognition methods for naming entity shown according to an exemplary embodiment.

Fig. 2 is a kind of flow chart of the recognition methods of the name entity shown according to another exemplary embodiment.

Fig. 3 is a kind of flow chart of identification device for naming entity shown according to an exemplary embodiment.

Fig. 4 is the block diagram of a kind of electronic equipment shown according to an exemplary embodiment.

Specific embodiment

It is described in detail below in conjunction with specific embodiment of the attached drawing to the disclosure.It should be understood that this place is retouched The specific embodiment stated is only used for describing and explaining the disclosure, is not limited to the disclosure.

Firstly, being illustrated to HMM model.

HMM model is made of five parts:

(1) state number L, that is, the number of character labeling state set are segmented in model.

(2) the number T for the distinct symbols (also referred to as segmenting) that each participle state may export_i, i.e. character labeling state The sum of participle may be exported.

(3) state transition probability matrix A={ a_ij, refer to the probability matrix converted between all character labeling states:

a_ij=P (l_j|l_i),1≤i,j≤L

a_ij≥0

Wherein, a_ijCharacter labeling state is characterized from state l_iIt is transferred to state l_jProbability, i characterize i-th of character labeling State, j characterize j-th of character labeling state.

(4) in state l_iIn the case of occur participle x_tProbability distribution matrix B={ b_i(t) }, wherein the moment of probability distribution Battle array is also referred to as emission probability matrix, characterizes the relationship between state and participle:

b_i(t)=P (x_t|l_i), 1≤i≤L, 1≤t≤T_i

b_i(t)≥0

Wherein, t characterizes t-th of participle, T_iCharacterization belongs to i-th of character labeling state l_iParticiple sum.

(5) original state matrix probability distribution π={ π_i, i.e., one participle is initially the general of which character labeling state Rate:

π_i=P (l_i),1≤i≤L

π_i≥0

To sum up, the five-tuple of HMM model can be denoted as to μ=(l, t, A, B, π).

When being named Entity recognition using the HMM model, firstly, entering text into the HMM model and to this Text carries out thick cutting and then the result after the completion of thick cutting is compared with training corpus, carries out character labeling, and unite Count out calculated value required for Viterbi (viterbi) algorithm, wherein the calculated value is each parameter in five-tuple in HMM model Numerical value then text is identified on the basis of obtained calculated value using Viterbi (viterbi) algorithm.That is, It calculates separately out in state l_iIn the case of occur participle x_tProbability distribution matrix, wherein 1≤i≤L, 1≤t≤M, and according to this Probability distribution matrix is to text identification.Illustratively, referring to existing character labeling collection, state l_iIt can be such as are as follows: surname, two-character given name Lead-in, name above etc. roles.Wherein, for none absolute standard of character labeling collection, the summary and specially of forefathers is needed Family's knowledge is adjusted.

To sum up, the name accuracy of entity in text and above-mentioned identified in state l_iIn the case of occur participle x_t's The accuracy of probability distribution matrix is related, therefore, in order to improve the accuracy for naming Entity recognition in text, need to ensure in HMM It is calculated in state l in model_iIn the case of occur participle x_tProbability distribution matrix accuracy.

Then, the recognition methods of the name entity provided the disclosure is illustrated.Referring to FIG. 1, Fig. 1 is shown according to one Example property implements a kind of flow chart of the recognition methods of the name entity exemplified.As shown in Figure 1, this method may include following step Suddenly.

In a step 11, t-th of target participle x in text is determined_tCorresponding all possible true participle.

Text is carried out multiple target participles can be obtained after cutting, to the determination of each target participle and target participle pair The all possible true participle answered.Wherein, target participle can be an individual word, or multiple word compositions Word is not especially limited in the disclosure.

In the disclosure, for ease of description, it is illustrated for being segmented to a target.Illustratively, such as step 11 It is described, x is segmented for t-th of target in text_t, determine that the target segments x_tCorresponding all possible true participle.Specifically Ground segments x in known target_tWhen, it can count according to history text recognition result and identify that the target segments x in practical situations_t When all possible true participle that is encountered.

For example, text is " Beatrice ", it is the name of a singer, should is " the refined emerald green silk of shellfish " after correct translation, And this name can be translated into " Beatrice " by some people sometimes, in this way, gained target participle includes " than " and " Aunar is beautiful This ", wherein for target participle " ratio ", it is considered that the corresponding true participle of target participle in statistics are as follows: " shellfish " and " ratio "；For target participle " Aunar beautiful this ", it is considered that the target segments corresponding true participle in statistics are as follows: " refined kingfisher Silk " and " Aunar beautiful this ".

For another example target participle is " Beijing " or " Concord Hospital " or " mind if text is " mind section, BJ Union Hospital " Section ", and it is considered that true participle is " Beijing " or " Concord Hospital " or " neurology department " in statistics.

In step 12, for each true participle, determine respectively each participle state correspond to this really segment first Conditional probability p (a^d|l_i)。

Wherein, a^dD-th of characterization true participle, l_iI-th of participle state is characterized, which is according to People's Daily It marks what corpus determined, and is stored in advance in the HMM model.

In step 13, the second condition Probability p (x of target participle is corresponded to according to each true participle_t|a^d), Yi Ji One conditional probability p (a^d|l_i), determine that each participle state corresponds to target and segments x_tThird condition Probability p (x_t|l_i)。

Second condition Probability p (x_t|a^d) target participle and the relationship really segmented are characterized, it can be identified according to history text As a result it determines.Specifically, as described above, x is segmented in the known target_tCorresponding all possible true participle situation Under, it calculates and occurs target participle x under conditions of there is each true participle_tProbability, as second condition Probability p (x_t| a^d), and then according to the step second condition Probability p (x_t|a^d) and step 12 in determine first condition Probability p (a^d|l_i), it determines Each participle state corresponds to target participle x out_tThird condition Probability p (x_t|l_i)。

Wherein, second condition Probability p (x_t|a^d) can be expressed asw(x_t,a^d) characterization be There is true participle a^dWhen occur target participle x_tNumber, w (a^d) the true participle a of characterization appearance^dNumber.

At step 14, according to third condition Probability p (x_t|l_i), x is segmented to target_tIt is named Entity recognition.

Determining third condition Probability p (x_t|l_i) after, further according to the third condition Probability p (x_t|l_i), to mesh Mark participle x_tIt is named Entity recognition.

Above-mentioned steps 11- step 14, Jin Ershi are performed both by it should be noted that can segment to each target in text Entity recognition now is named to target each in text participle.

By adopting the above technical scheme, due to considering target participle when being named Entity recognition to target participle Corresponding all possible true participle and each participle state correspond to the first condition probability really segmented and each true Participle corresponds to the second condition probability of target participle, in this way, resulting each participle state corresponds to the Article 3 of target participle Relationship between part probability, substantially characterization target participle, true participle and participle state three, and then it is real to improve name The accuracy rate and recall rate of body identification, and also can effectively avoid and occur multiword, few word or wrong other during text identification The case where word.

Determining each participle state corresponding to the first condition Probability p (a each really segmented^d|l_i) after, it can basis Total probability formula determines that each participle state corresponds to the target and segments x_tThird condition Probability p (x_t|l_i), it is specifically, above-mentioned The specific embodiment of step 13 can be with are as follows: (3) according to the following formula determine that each participle state corresponds to the target and segments x_t Third condition Probability p (x_t|l_i):

Wherein, D characterizes the sum really segmented.

In this way, identified each participle state, which corresponds to target, segments x_tThird condition probability, it is characterization target participle, true Relationship between real participle and participle state three, and then the accuracy rate and recall rate of name Entity recognition are improved, and also It can effectively avoid and occur the case where multiword, few word or wrong word during text identification.

In addition, determining third condition Probability p (x_t|l_i) after, in a kind of possible embodiment, according to this Three conditional probability p (x_t|l_i) x is segmented to target_tIt is named Entity recognition.Be preferably carried out in mode in another kind, in order into One step improves the accuracy rate and recall rate to name Entity recognition, and the specific embodiment of step 14 can be with are as follows: by maximum third The corresponding participle state of conditional probability is determined as the target participle x_tName Entity recognition result.

It specifically, include that multiple participle states correspond to target participle x in third condition probability_tMultiple probability, And due to occurring target participle x under each participle state under normal conditions_tProbability it is not identical, that is, it is above-mentioned multiple general Rate is different, and the probability highest occurred under target participle participle state corresponding to maximum third condition probability, because The corresponding participle state of maximum third condition probability can be determined as target participle x in the disclosure by this_tName entity Recognition result segments x to target to further increase_tName Entity recognition accuracy.

Illustratively, for " Beatrice " this text, if translation gained target participle is " ratio ", and the target is segmented " than " corresponding true participle is " shellfish " and " ratio ", conventionally, as inputting true participle not in HMM model " shellfish ", and due to " comparing " and not having " surname " this participle state in the historical data, when identifying target participle " ratio ", " ratio " can not be identified as " surname ".Similarly, for target participle " Aunar beautiful this ", due to inputting true point not in HMM model Word " refined kingfisher silk ", and due to " Aunar beautiful this " in the historical data and do not have " name " this participle state, in identification mesh When mark participle " Aunar beautiful this ", " Aunar beautiful this " can not be identified as " name ".In this way, in the prior art, if input text is " Beatrice " can not be identified as name by " Beatrice ".

In the present solution, being input in HMM model by will really segment " shellfish ", in this way, in identification target participle " ratio " When encountered it is all possible it is true participle be " shellfish " and " ratio ", and determine participle state be " surname " correspond to really segment The first condition probability of " shellfish " and " ratio " is respectively p (shellfish | surname) and p (ratio | surname), and, each true participle corresponds to target The second condition probability of participle is respectively p (ratio | shellfish) and p (ratio | ratio), and then determines that participle state is " surname " corresponding to target Segment third condition Probability p (ratio | surname)=p (ratio | shellfish) p (shellfish | surname)+p (ratio | the ratio) p (compare | surname) of " ratio ".As described above, Due to " comparing " and not having " surname " this participle state in the historical data, the numerical value of p (ratio | surname) can be zero, and " shellfish " Corresponding to " surname ", this participle state belongs to truth, and probability value p (shellfish | surname) can be bigger, therefore, in p (ratio | ratio) p Increase p (ratio | shellfish) p (shellfish | surname) on the basis of (ratio | surname), so that the Probability p (ratio | surname) determined increases, it can mention significantly A possibility that " ratio " is identified as " surname " by height.Similarly, according to principle as above, the Probability p determined (Aunar beautiful this | name) also can be by Increase, it can greatly improve a possibility that " Aunar beautiful this " is identified as " name ".To sum up, if input text is " Beatrice " that translates can be identified as name by " Beatrice ".

To sum up, in terms of existing technologies, in the disclosure, participle state is influenced corresponding to target using true participle The probability of participle further increases the accuracy of the name Entity recognition to target participle.

Belong to implicit parameter due to segmenting state in HMM model, each point can not be determined according to history text recognition result Word state corresponds to the first condition Probability p (a really segmented^d|l_i), it therefore, in the disclosure, can be according to literary from history The fourth condition probability determined in this recognition result estimates above-mentioned first condition probability.Specifically, as shown in Fig. 2, it is above-mentioned Step 12 may comprise steps of.

In step 121, for each true participle, determine that target segments x_tCorresponding to the fourth condition really segmented Probability p (a^d|x_t)。

In the disclosure, which segments x_tCorresponding to the fourth condition Probability p (a really segmented^d|x_t) can be described as after Test probability, and the posterior probability can be obtained through statistics.As described above, x is segmented in known target_tIn the case where, it can In history text recognition result, determine to segment x with the target_tCorresponding all true participles, and then can determine that this Target segments x_tCorresponding to the fourth condition Probability p (a each really segmented^d|x_t)。

Wherein, fourth condition Probability p (a^d|x_t) can be expressed asw(a^d,x_t) characterization be There is target participle x_tWhen there is true participle a^dNumber, w (x_t) characterization occur target participle x_tNumber.

In step 122, x is segmented according to target_tCorresponding to the fourth condition Probability p (a each really segmented^d|x_t), estimate It counts each participle state and corresponds to the first condition Probability p (a each really segmented^d|l_i)。

Since above-mentioned fourth condition probability characterizes the relationship between true participle and target participle, and the fourth condition probability Be determined from history text recognition result, so, the fourth condition probability is more accurate.Therefore, in the disclosure, root According to the accurate fourth condition probability, it is general can accurately to estimate that each participle state corresponds to the first condition each really segmented Rate p (a^d|l_i), it is ensured that the first condition Probability p (a estimated^d|l_i) accuracy.

Illustratively, it is contemplated that KL divergence (Kullback-Leibler divergence), also known as relative entropy are to measure In similar events space two probability distribution relative mistakes away from estimate, therefore, can be determined according to relative entropy formula with the above-mentioned 4th The immediate first condition Probability p (a of conditional probability^d|l_i)。

Specifically, the embodiment of step 122 can be with are as follows: (1) to formula (2) according to the following formula, it will be so that d (z_t,y_i) Meet preset conditionIt is determined as a participle state and corresponds to the first condition probability each really segmented:

Wherein, D characterizes the sum really segmented,It characterizes target and segments x_tCorresponding to d-th of true participle Fourth condition probability,It is general corresponding to d-th of first condition really segmented to characterize i-th of participle state Rate,Characterize the target participle x_tCorresponding to each fourth condition probability really segmented Vector,It characterizes i-th of participle state and corresponds to the first condition probability each really segmented Vector, d (z_t,y_i) characterization z_tAnd y_iRelative entropy.

Wherein, above-mentioned identified each true participle is an independent unit, but has dependence within a context Relationship.And occur under each participle state in HMM model it is true participle be it is unfixed, each participle state can produce It is raw a variety of possible.Therefore, in the disclosure, the first condition Probability p (a^d|l_i) also need to meet the following conditions:

0≤P(a^d|l_i)≤1

In addition, a kind of possible embodiment are as follows: the preset condition can characterize the first condition probability that user is received The difference of distribution and fourth condition probability distribution, the difference can be the numerical value of default, is also possible to the number of user's self-setting Value, and above-mentioned numerical value is all larger than zero.

In view of will appear the probability of many similarities and differences words in practical problem, and cutting mark participle shape is being carried out to text When state, each participle state may include multiple target participles, therefore, in order to improve the identification to entity is named in entire text Accuracy, another preferred embodiment are as follows: the preset condition be loss functionIt is minimum.Wherein, The T_iCharacterization belongs to i-th of participle state l_iTarget participle sum, L characterize participle state sum,I-th point of characterization Word state and target segment x_tBetween it is whether relevant, if relevant be 1, be otherwise 0.

In this way, can be by above-mentioned solution so that d (z_t,y_i) meet preset conditionThe problem of be converted into solution formula (4) The problem of, that is, it will solve so that d (z_t,y_i) meet preset conditionThe problem of be converted into solve optimize the problem of, in turn It is solved for formula (4) and above-mentioned formula (2), obtained optimal solution is

According to formula (4), solve: for arbitraryHave

Wherein, as described above,It indicates if state lⁱWith observation x_tBetween be related,Otherwise It is not involved in calculating.Therefore, for arbitraryHave:

Wherein, T_iExpression belongs to i-th of participle state l_iTarget participle sum.

Furthermore, it is possible to using theorem to obtainedIt is verified, to determine above-mentioned formula (5) or formula (6) for public affairs The optimal solution of formula (4).Wherein, solution required by theorem proving is utilizedThe as optimal solution of formula (4), belongs to the prior art, this Place repeats no more.

By adopting the above technical scheme, by relative entropy formula, first condition Probability p (a will be solved^d|l_i) the problem of, become The optimization problem of one convex function contains Strict local minimizer value point using this provable optimization problem of theorem, and finds out this Solution is the first condition Probability p (a^d|l_i)。

Based on the same inventive concept, the disclosure also provides a kind of identification device for naming entity.Referring to FIG. 3, Fig. 3 is root A kind of block diagram of the identification device of name entity shown according to an exemplary embodiment.As shown in figure 3, the identification of the name entity Device may include:

First determining module 31, for determining that t-th of target in text segments x_tCorresponding all possible true point Word；

Second determining module 32, it is true respectively for being directed to each of the first determining module determination true participle Fixed each participle state corresponds to the first condition Probability p (a really segmented^d|l_i), wherein a^dD-th of characterization true participle, l_i Characterize i-th of participle state；

Third determining module 33, for corresponding to institute according to each of the second determining module determination true participle State target participle x_tSecond condition Probability p (x_t|a^d) and the first condition Probability p (a^d|l_i), determine each participle state X is segmented corresponding to the target_tThird condition Probability p (x_t|l_i)；

Identification module 34, the third condition Probability p (x for being determined according to the third determining module_t|l_i), to institute State target participle x_tIt is named Entity recognition.

Optionally, second determining module includes:

Optionally, the estimation submodule includes:

Second determines submodule, is used for according to above-mentioned formula (1) to formula (2), will be so that d (z_t,y_i) meet preset condition 'sIt is determined as each participle state corresponding to each first condition probability really segmented.

Optionally, the third determining module includes:

Third determines submodule, for determining that each participle state corresponds to the target and segments x according to above-mentioned formula (3)_t Third condition Probability p (x_t|l_i)。

Optionally, the identification module includes:

About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.

Fig. 4 is the block diagram of a kind of electronic equipment 400 shown according to an exemplary embodiment.As shown in figure 4, the electronics is set Standby 400 may include: processor 401, memory 402.The electronic equipment 400 can also include multimedia component 403, input/ Export one or more of (I/O) interface 404 and communication component 405.

Wherein, processor 401 is used to control the integrated operation of the electronic equipment 400, to complete above-mentioned name entity All or part of the steps in recognition methods.Memory 402 is for storing various types of data to support in the electronic equipment 400 operation, these data for example may include any application or method for operating on the electronic equipment 400 Instruction and the relevant data of application program, such as contact data, the message of transmitting-receiving, picture, audio, video etc..This is deposited Reservoir 402 can realize by any kind of volatibility or non-volatile memory device or their combination, for example, it is static with Machine accesses memory (Static Random Access Memory, abbreviation SRAM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory, abbreviation EEPROM), erasable programmable Read-only memory (Erasable Programmable Read-Only Memory, abbreviation EPROM), programmable read only memory (Programmable Read-Only Memory, abbreviation PROM), and read-only memory (Read-Only Memory, referred to as ROM), magnetic memory, flash memory, disk or CD.Multimedia component 403 may include screen and audio component.Wherein Screen for example can be touch screen, and audio component is used for output and/or input audio signal.For example, audio component may include One microphone, microphone is for receiving external audio signal.The received audio signal can be further stored in storage Device 402 is sent by communication component 405.Audio component further includes at least one loudspeaker, is used for output audio signal.I/O Interface 404 provides interface between processor 401 and other interface modules, other above-mentioned interface modules can be keyboard, mouse, Button etc..These buttons can be virtual push button or entity button.Communication component 405 is for the electronic equipment 400 and other Wired or wireless communication is carried out between equipment.Wireless communication, such as Wi-Fi, bluetooth, near-field communication (Near Field Communication, abbreviation NFC), 2G, 3G or 4G or they one or more of combination, therefore corresponding communication Component 405 may include: Wi-Fi module, bluetooth module, NFC module.

In one exemplary embodiment, electronic equipment 400 can be by one or more application specific integrated circuit (Application Specific Integrated Circuit, abbreviation ASIC), digital signal processor (Digital Signal Processor, abbreviation DSP), digital signal processing appts (Digital Signal Processing Device, Abbreviation DSPD), programmable logic device (Programmable Logic Device, abbreviation PLD), field programmable gate array (Field Programmable Gate Array, abbreviation FPGA), controller, microcontroller, microprocessor or other electronics member Part is realized, for executing the recognition methods of above-mentioned name entity.

In a further exemplary embodiment, a kind of computer readable storage medium including program instruction is additionally provided, it should The step of recognition methods of above-mentioned name entity is realized when program instruction is executed by processor.For example, this computer-readable is deposited Storage media can be the above-mentioned memory 402 including program instruction, and above procedure instruction can be by the processor of electronic equipment 400 401 execute to complete the recognition methods of above-mentioned name entity.

The preferred embodiment of the disclosure is described in detail in conjunction with attached drawing above, still, the disclosure is not limited to above-mentioned reality The detail in mode is applied, in the range of the technology design of the disclosure, a variety of letters can be carried out to the technical solution of the disclosure Monotropic type, these simple variants belong to the protection scope of the disclosure.

It is further to note that specific technical features described in the above specific embodiments, in not lance In the case where shield, it can be combined in any appropriate way.In order to avoid unnecessary repetition, the disclosure to it is various can No further explanation will be given for the combination of energy.

In addition, any combination can also be carried out between a variety of different embodiments of the disclosure, as long as it is without prejudice to originally Disclosed thought equally should be considered as disclosure disclosure of that.

Claims

1. a kind of recognition methods for naming entity characterized by comprising

For each true participle, determine that each participle state corresponds to the first condition Probability p (a really segmented respectively^d| l_i), wherein a^dD-th of characterization true participle, l_iCharacterize i-th of participle state；

Correspond to the target according to each true participle and segments x_tSecond condition Probability p (x_t|a^d) and described One conditional probability p (a^d|l_i), determine that each participle state corresponds to the target and segments x_tThird condition Probability p (x_t|l_i)；

2. being determined respectively each the method according to claim 1, wherein described be directed to each true participle Participle state corresponds to the first condition Probability p (a really segmented^d|l_i), comprising:

For each true participle, the target participle x is determined_tCorresponding to the fourth condition Probability p (a really segmented^d| x_t)；

X is segmented according to the target_tCorresponding to each fourth condition Probability p (a really segmented^d|x_t), estimation is each Participle state corresponds to each first condition Probability p (a really segmented^d|l_i)。

3. according to the method described in claim 2, it is characterized in that, described segment x according to the target_tCorresponding to each described Fourth condition Probability p (a really segmented^d|x_t), estimate each participle state correspond to it is each it is described really segment it is described First condition Probability p (a^d|l_i), comprising:

(1), will be so that d (z to formula (2) according to the following formula_t,y_i) meet the y of preset condition_i ^dIt is determined as each participle state pair It should be in each first condition probability really segmented:

Wherein, D characterizes the sum really segmented,Characterize the target participle x_tIt is true corresponding to d-th The fourth condition probability of participle,I-th of participle state is characterized corresponding to d-th of first condition really segmented Probability,Characterize the target participle x_tCorresponding to each fourth condition probability really segmented Vector,It characterizes i-th of participle state and corresponds to the first condition probability each really segmented Vector, d (z_t,y_i) characterization z_tAnd y_iRelative entropy.

4. according to the method described in claim 3, it is characterized in that, the preset condition are as follows: loss function It is minimum；Wherein, T_iCharacterization belongs to i-th of participle state l_iTarget participle sum, L characterizes the participle state Sum,I-th of participle state of characterization and the target segment x_tBetween it is whether relevant, if relevant be 1, otherwise for 0。

5. method according to any of claims 1-4, which is characterized in that described according to each true participle pair Target described in Ying Yu segments x_tSecond condition Probability p (x_t|a^d) and the first condition Probability p (a^d|l_i), determine each point Word state corresponds to the target and segments x_tThird condition Probability p (x_t|l_i), comprising:

(3) according to the following formula determine that each participle state corresponds to the target and segments x_tThird condition Probability p (x_t|l_i):

Wherein, D characterizes the sum really segmented.

6. method according to any of claims 1-4, which is characterized in that described according to the third condition Probability p (x_t|l_i), x is segmented to the target_tIt is named Entity recognition, comprising:

The corresponding participle state of maximum third condition probability is determined as the target participle x_tName Entity recognition result.

7. a kind of identification device for naming entity characterized by comprising

Second determining module, for determining each point respectively for each of the first determining module determination true participle Word state corresponds to the first condition Probability p (a really segmented^d|l_i), wherein a^dD-th of characterization true participle, l_iCharacterization the I participle state；

Third determining module, for corresponding to the target according to each of the second determining module determination true participle Segment x_tSecond condition Probability p (x_t|a^d) and the first condition Probability p (a^d|l_i), determine that each participle state corresponds to The target segments x_tThird condition Probability p (x_t|l_i)；

Identification module, the third condition Probability p (x for being determined according to the third determining module_t|l_i), to the target Segment x_tIt is named Entity recognition.

8. the method according to the description of claim 7 is characterized in that the third determining module includes:

Third determines submodule, segments x for according to the following formula (3), determining that each participle state corresponds to the target_t? Three conditional probability p (x_t|l_i):

Wherein, D characterizes the sum really segmented.

9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor The step of any one of claim 1-6 the method is realized when row.

10. a kind of electronic equipment characterized by comprising

Memory is stored thereon with computer program；

Processor, for executing the computer program in the memory, to realize described in any one of claim 1-6 The step of method.