CN109710927B

CN109710927B - Named entity identification method and device, readable storage medium and electronic equipment

Info

Publication number: CN109710927B
Application number: CN201811519563.6A
Authority: CN
Inventors: 贾弼然; 崔朝辉; 赵立军; 张霞
Original assignee: Neusoft Corp
Current assignee: Neusoft Corp
Priority date: 2018-12-12
Filing date: 2018-12-12
Publication date: 2022-12-20
Anticipated expiration: 2038-12-12
Also published as: CN109710927A

Abstract

The disclosure relates to a method and a device for identifying named entities and a readable storage mediumAnd an electronic device. The method comprises the following steps: determining the t-th target participle x in text _t All possible corresponding real participles; respectively determining a first conditional probability p (a) of each participle state corresponding to each real participle for each real participle ^d |l _i ) Wherein a is ^d Characterizing the d-th real participle, l _i Characterizing an ith word segmentation state; according to each real participle corresponding to the target participle x _t Second conditional probability p (x) _t |a ^d ) And the first conditional probability p (a) ^d |l _i ) Determining that each participle state corresponds to the target participle x _t Third conditional probability p (x) _t |l _i ) (ii) a According to the third conditional probability p (x) _t |l _i ) For the target word segmentation x _t Named entity recognition is performed. Therefore, the accuracy and recall rate of named entity recognition are improved, and the situations of multiple characters, few characters or wrongly-written characters in the text recognition process can be effectively avoided.

Description

Named entity identification method and device, readable storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of natural language processing, and in particular, to a method and an apparatus for identifying a named entity, a readable storage medium, and an electronic device.

Background

With the application of artificial intelligence, natural language processing is increasingly gaining importance and popularity. In the natural language processing engineering, named entity recognition is an important step in the early stage of natural language processing, and has great significance for entities such as time, numbers, names of people, place names, names of organizations and the like in texts in many research fields. At present, hidden Markov Models (HMMs) are mostly used for named entity recognition, but some problems usually occur in the recognition process, for example, different translated words may occur to transliterated entities in an open-set text, which may cause great ambiguity and high error rate in the recognition process, or problems of many words, few words, or wrongly written words may occur in texts obtained by labeling and translating some linguistic data with low quality. Therefore, named entities in text cannot be accurately identified using existing HMM models.

Disclosure of Invention

In order to overcome the problems in the prior art, embodiments of the present disclosure provide a method and an apparatus for identifying a named entity, a readable storage medium, and an electronic device.

In order to achieve the above object, a first aspect of the present disclosure provides a method for identifying a named entity, including:

determining the t-th target word segmentation x in the text _t All possible corresponding real participles;

respectively determining a first conditional probability p (a) of each participle state corresponding to each real participle for each real participle ^d |l _i ) Wherein a is ^d Characterizing the d-th real participle, l _i Representing the ith word segmentation state;

according to each real participle corresponding to the target participle x _t Second conditional probability p (x) _t |a ^d ) And the first conditional probability p (a) ^d |l _i ) Determining that each participle state corresponds to the target participle x _t Third conditional probability p (x) _t |l _i )；

According to the third conditional probability p (x) _t |l _i ) For the target word segmentation x _t Named entity recognition is performed.

Optionally, for each of the real participles, determining a first conditional probability p (a) that each participle state corresponds to the real participle respectively ^d |l _i ) The method comprises the following steps:

determining the target participle x for each of the real participles _t A fourth conditional probability p (a) corresponding to the true participle ^d |x _t )；

According to the target word segmentation x _t The fourth conditional probability p (a) corresponding to each of the real participles ^d |x _t ) Estimate each ofThe participle state corresponds to the first conditional probability p (a) of each of the real participles ^d |l _i )。

Optionally, the word segmentation according to the target x _t The fourth conditional probability p (a) corresponding to each of the real participles ^d |x _t ) Estimating said first conditional probability p (a) that each participle state corresponds to each said real participle ^d |l _i ) The method comprises the following steps:

according to the following equations (1) to (2), d (z) will be made _t ,y _i ) Satisfying predetermined conditions

Determining the first conditional probability that each participle state corresponds to each of the real participles:

wherein D characterizes a total number of the real participles,

characterizing the target participle x _t A fourth conditional probability corresponding to the d-th real participle,

a first conditional probability characterizing an ith participle state as corresponding to a d-th real participle,

characterizing the target participle x _t A vector of fourth conditional probabilities corresponding to each of the real participles,

characterizing the ith participle stateVector of first conditional probabilities, d (z), corresponding to each real participle _t ,y _i ) Characterization z _t And y _i Relative entropy of (2).

Optionally, the preset conditions are: loss function

Minimum; wherein, T _i The representation belongs to the ith participle state l _i L characterizes a total number of the participle states,

representing the ith word segmentation state and the target word segmentation x _t If the relation exists, the relation is 1, otherwise, the relation is 0.

Optionally, said method further comprises said determining each said real participle corresponds to said target participle x _t Second conditional probability p (x) _t |a ^d ) And the first conditional probability p (a) ^d |l _i ) Determining that each participle state corresponds to the target participle x _t Third conditional probability p (x) _t |l _i ) The method comprises the following steps:

determining that each participle state corresponds to the target participle x according to the following formula (3) _t Third conditional probability p (x) _t |l _i )：

Wherein D characterizes a total number of the true participles.

Optionally, said determining is according to said third conditional probability p (x) _t |l _i ) For the target participle x _t Conducting named entity recognition, including:

determining the participle state corresponding to the maximum third conditional probability as the target participle x _t Identifies the result.

A second aspect of the present disclosure provides an apparatus for identifying a named entity, including:

a first determining module for determining the t target participle x in the text _t All possible corresponding real participles;

a second determining module, configured to determine, for each of the real participles determined by the first determining module, a first conditional probability p (a) that each participle state corresponds to the real participle ^d |l _i ) Wherein a is ^d Characterizing the d-th real participle, l _i Representing the ith word segmentation state;

a third determining module, configured to determine that each of the real participles corresponds to the target participle x according to the determination result of the second determining module _t Second conditional probability p (x) _t |a ^d ) And the first conditional probability p (a) ^d |l _i ) Determining that each participle state corresponds to the target participle x _t Third conditional probability p (x) _t |l _i )；

An identification module for determining the third conditional probability p (x) according to the third condition probability _t |l _i ) For the target word segmentation x _t Named entity recognition is performed.

Optionally, the second determining module includes:

a first determining sub-module for determining the target participle x for each of the real participles _t A fourth conditional probability p (a) corresponding to the true participle ^d |x _t )；

An estimation submodule for determining the target word segmentation x according to the first determination submodule _t The fourth conditional probability p (a) corresponding to each of the real participles ^d |x _t ) Estimating said first conditional probability p (a) that each participle state corresponds to each said real participle ^d |l _i )。

Optionally, the estimation sub-module comprises:

a second determination submodule for making d (z) according to the following equations (1) to (2) _t ,y _i ) Satisfying predetermined conditions

Determining the first conditional probability for each participle state corresponding to each of the real participles:

wherein D characterizes a total number of the real participles,

characterizing a first conditional probability that an ith participle state corresponds to a d-th true participle,

vector characterizing the first conditional probability that the ith participle state corresponds to each real participle, d (z) _t ,y _i ) Characterization z _t And y _i Relative entropy of (2).

Optionally, the preset conditions are: loss function

Minimum; wherein, T _i The representation belongs to the ith participle state l _i L characterizes a total number of the participles states,

characterizing the ith participleState and the target participle x _t Whether the two are related or not is judged, if so, the relation is 1, otherwise, the relation is 0.

Optionally, the third determining module includes:

a third determining sub-module for determining that each participle state corresponds to the target participle x according to the following formula (3) _t Third conditional probability p (x) _t |l _i )：

Wherein D characterizes a total number of the true participles.

Optionally, the identification module comprises:

a fourth determining submodule, configured to determine a word segmentation state corresponding to the maximum third conditional probability as the target word segmentation x _t Identifies the result.

The third aspect of the present disclosure also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method provided by the first aspect of the present disclosure.

The fourth aspect of the present disclosure also provides an electronic device, including:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to implement the steps of the method provided by the first aspect of the present disclosure.

According to the technical scheme, all possible real participles corresponding to the target participle, the first conditional probability of each participle state corresponding to the real participle and the second conditional probability of each real participle corresponding to the target participle are taken into consideration when the target participle is subjected to named entity recognition, so that the obtained third conditional probability of each participle state corresponding to the target participle substantially represents the relation among the target participle, the real participle and the participle state, the accuracy and recall rate of named entity recognition are improved, and the situations of multiple characters, few characters or wrongly written characters in the text recognition process can be effectively avoided.

Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:

FIG. 1 is a flow diagram illustrating a method for named entity identification in accordance with an exemplary embodiment.

FIG. 2 is a flow diagram illustrating a method for named entity identification in accordance with another exemplary embodiment.

Fig. 3 is a flow diagram illustrating an apparatus for identifying named entities in accordance with an exemplary embodiment.

FIG. 4 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.

First, an HMM model will be explained.

The HMM model consists of five parts:

(1) The number of state words L in the model, namely the number of the role labeling state sets.

(2) The number of different symbols (also called participles) that each participle state may output, T _i I.e. the role tagging state may output the total number of participles.

(3) State transition probability matrix a = { a = _ij And (5) a probability matrix for conversion among all role labeling states is defined as follows:

a _ij ＝P(l _j |l _i ),1≤i,j≤L

a _ij ≥0

wherein, a _ij Characterizing role annotation State Slave State l _i Transition to State l _j I represents the ith angular color annotation state and j represents the jth angular color annotation state.

(4) In a state of _i Occurrence of participle x _t Probability distribution matrix B = { B = { B } _i (t), wherein the probability distribution matrix, also called the emission probability matrix, characterizes the relationship between states and participles:

b _i (t)＝P(x _t |l _i ),1≤i≤L，1≤t≤T _i

b _i (t)≥0

wherein T represents the tth participle, T _i The characterization belongs to the ith angular color annotation state l _i The total number of participles of (c).

(5) Initial state matrix probability distribution pi = { pi = _i I.e. the probability of which role a participle starts with, states:

π _i ＝P(l _i ),1≤i≤L

π _i ≥0

in summary, the quintuple of the HMM model can be denoted as μ = (l, t, a, B, pi).

When the HMM model is used for named entity recognition, firstly, a text is input into the HMM model and roughly cut, then, the result after rough cutting is compared with a training corpus, role marking is carried out, and a calculation value required by a Viterbi (viterbi) algorithm is calculated, wherein the calculation value is the number of each parameter in a quintuple in the HMM modelThe value is then identified on the basis of the resulting calculated value using the viterbi (viterbi) algorithm. That is, the state l is calculated separately _i Occurrence of participle x _t Wherein i is greater than or equal to 1 and less than or equal to L, t is greater than or equal to 1 and less than or equal to M, and recognizing the text according to the probability distribution matrix. Illustratively, the state l is referenced to an existing set of role labels _i May for example be: the characters of the surname, the double first name, the person first name and the above, etc. Wherein, the role mark set has no absolute standard and needs the adjustment of the prior general statement and expert knowledge.

In summary, the accuracy of named entities in the text is the same as determined above in state l _i Occurrence of participle x _t Is determined, and therefore, in order to improve the accuracy of the recognition of the named entity in the text, it is ensured that the state l calculated in the HMM model is in state i _i Occurrence of participle x _t The accuracy of the probability distribution matrix.

Next, a method for identifying a named entity provided in the present disclosure will be described. Referring to fig. 1, fig. 1 is a flow chart illustrating a method for identifying a named entity according to an exemplary embodiment. As shown in fig. 1, the method may include the following steps.

In step 11, the t-th target participle x in the text is determined _t Corresponding to all possible real participles.

After the text is segmented, a plurality of target segmented words can be obtained, and all possible real segmented words corresponding to the target segmented words are determined for each target segmented word. The target word segmentation may be a single word or a word composed of multiple words, and is not specifically limited in this disclosure.

In the present disclosure, for convenience of explanation, one target word segmentation may be exemplified. Illustratively, as described in step 11, for the t-th target participle x in the text _t Determining the target participle x _t Corresponding to all possible real participles. In particular, in the known target participle x _t Then, the target word segmentation x can be identified under the actual condition through statistics according to the historical text identification result _t All possible true participles encountered.

For example, the text is "Beatrice", which is the name of a singer, and it should be "beidellite" after being correctly translated, and some people sometimes translate the name of the person into "bialarce", so that the obtained target participles include "ratio" and "adalarce", wherein for the target participle "ratio", the real participle corresponding to the target participle can be considered statistically as: "shellfish" and "ratio"; for the target word "atlas", the real word corresponding to the target word may be considered as: "Yacuisi" and "atlas".

For another example, if the text is "Peking collaborate with Hospital department", the target participle is "Peking" or "collaborating with Hospital" or "department of neurology", and the real participle can be considered as "Peking" or "collaborating with Hospital" or "department of neurology" in statistics.

In step 12, for each real participle, a first conditional probability p (a) of each participle state corresponding to the real participle is determined respectively ^d |l _i )。

Wherein, a ^d Characterizing the d-th real participle, l _i And characterizing the ith word segmentation state, wherein the word segmentation state is determined according to the daily newspaper marking corpus of people and is stored in the HMM model in advance.

In step 13, according to the second conditional probability p (x) that each real participle corresponds to the target participle _t |a ^d ) And a first conditional probability p (a) ^d |l _i ) Determining that each participle state corresponds to a target participle x _t Third conditional probability p (x) _t |l _i )。

Second conditional probability p (x) _t |a ^d ) The relationship between the representation target word segmentation and the real word segmentation can be determined according to the historical text recognition result. In particular, as described above, given the target participle x _t Under the condition of all corresponding possible real participles, calculating the target participle x appearing under the condition that each real participle appears _t Is the second conditional probability p (x) _t |a ^d ) According to whichStep second conditional probability p (x) _t |a ^d ) And the first conditional probability p (a) determined in step 12 ^d |l _i ) Determining that each participle state corresponds to a target participle x _t Third conditional probability p (x) _t |l _i )。

Wherein the second conditional probability p (x) _t |a ^d ) Can be expressed as

w(x _t ,a ^d ) Characterized in that the real participle a appears ^d Temporal target word segmentation x _t Number of times of (a), w (a) ^d ) Characterizing the occurrence of true participles a ^d The number of times.

In step 14, according to the third conditional probability p (x) _t |l _i ) To target participle x _t Named entity recognition is performed.

After determining the third conditional probability p (x) _t |l _i ) Then, further according to the third conditional probability p (x) _t |l _i ) To target participle x _t Named entity recognition is performed.

It should be noted that the above steps 11 to 14 may be performed on each target participle in the text, so as to implement named entity recognition on each target participle in the text.

By adopting the technical scheme, all possible real participles corresponding to the target participle are considered when the target participle is subjected to named entity recognition, and the first conditional probability that each participle state corresponds to the real participle and the second conditional probability that each real participle corresponds to the target participle are considered, so that the obtained third conditional probability that each participle state corresponds to the target participle substantially represents the relation among the target participle, the real participle and the participle state, the accuracy and recall rate of named entity recognition are improved, and the situations of multiple characters, few characters or wrongly-distinguished characters in the text recognition process can be effectively avoided.

Determining the first conditional probability p (a) of each participle state corresponding to each real participle ^d |l _i ) Then, can be based onDetermining, by a probability formula, that each participle state corresponds to the target participle x _t Third conditional probability p (x) _t |l _i ) Specifically, the specific implementation of step 13 may be: determining that each participle state corresponds to the target participle x according to the following formula (3) _t Third conditional probability p (x) _t |l _i )：

Wherein D represents the total number of true participles.

Thus, each determined participle state corresponds to a target participle x _t The third conditional probability represents the relationship among the target participle, the real participle and the participle state, so that the accuracy and recall rate of named entity recognition are improved, and the situations of multiple characters, few characters or wrongly-written characters in the text recognition process can be effectively avoided.

In addition, a third conditional probability p (x) is determined _t |l _i ) Then, in a possible embodiment, according to the third conditional probability p (x) _t |l _i ) Segmenting target word x _t Named entity recognition is performed. In another preferred embodiment, in order to further improve the accuracy and recall rate of the named entity identification, the specific implementation manner of step 14 may be: determining the participle state corresponding to the maximum third conditional probability as the target participle x _t The named entity of (1) identifies the result.

Specifically, the third conditional probability includes a plurality of participle states corresponding to the target participle x _t And since the target participle x usually appears in each participle state _t The probabilities of the target participle are different, that is, the probabilities are all different, and the probability that the target participle appears in the participle state corresponding to the maximum third conditional probability is the highest, so in the present disclosure, the participle state corresponding to the maximum third conditional probability may be determined as the target participle x _t To further improve the segmentation of the target wordx _t The accuracy of named entity identification.

For example, for a text of "Beatrice", if the target segmented word obtained by translation is "ratio", and the real segmented word corresponding to the target segmented word "ratio" is "shell" and "ratio", in the prior art, since the real segmented word "shell" is not input in the HMM model, and since the "ratio" does not have a segmented state of "last name" in the history data, the "ratio" cannot be recognized as "last name" when the target segmented word "ratio" is recognized. Similarly, for the target segmented word "atlas", since the real segmented word "jazz" is not input in the HMM model, and since "atlas" does not have a segmented state of "name" in the history data, it is impossible to recognize "atlas" as "name" when recognizing the target segmented word "atlas". Thus, in the conventional technique, "biaterice" cannot be recognized as a person name if the input text is "Beatrice".

In the present scheme, by inputting the real participles "shell" into the HMM model, all possible real participles encountered when identifying the target participle "ratio" are "shell" and "ratio", and determining that the first conditional probabilities of the participle states of "last name" corresponding to the real participles "shell" and "ratio" are p (shell | last name) and p (shell | last name), respectively, and the second conditional probabilities of each real participle corresponding to the target participle are p (shell | and p (ratio | ratio), respectively, thereby determining that the third conditional probability p (shell | last name) = p (shell | last name) + p (ratio | last name) of the participle states of "last name" corresponding to the target participle "ratio", respectively. As described above, since "ratio" does not have a word-dividing state of "last name" in the history data, the value of p (ratio | last name) is zero, and "shell" corresponds to the word-dividing state of "last name", and the probability value p (bei | last name) is larger, therefore, p (ratio | shell) p (bei | last name) is added on the basis of p (ratio | ratio) p (ratio | last name), so that the determined probability p (ratio | last name) is increased, that is, the possibility of identifying "ratio" as "last name" can be greatly increased. Similarly, the determined probability p (the attrices | name) is also increased in accordance with the above principle, that is, the possibility of identifying "attrices" as the "name" can be greatly increased. In summary, if the input text is "Beatrice", the translated "bialarce" can be recognized as a person name.

In summary, compared with the prior art, in the present disclosure, the probability that the word segmentation state corresponds to the target word segmentation is influenced by using the real word segmentation, so as to further improve the accuracy of the named entity identification of the target word segmentation.

Since the participle state belongs to the hidden parameter in the HMM model, the first conditional probability p (a) that each participle state corresponds to the real participle cannot be determined according to the historical text recognition result ^d |l _i ) Therefore, in the present disclosure, the first condition probability may be estimated from a fourth condition probability determined from the historical text recognition result. Specifically, as shown in fig. 2, the step 12 may include the following steps.

In step 121, for each real participle, a target participle x is determined _t A fourth conditional probability p (a) corresponding to the true participle ^d |x _t )。

In this disclosure, the target participle x _t A fourth conditional probability p (a) corresponding to the true participle ^d |x _t ) This may be referred to as a posterior probability, and this posterior probability may be statistically derived. As described above, in the known target participle x _t In the case of (2), the word segmentation x with the target word can be determined in the historical text recognition result _t Corresponding all real participles, and then the target participle x can be determined _t Fourth conditional probability p (a) corresponding to each true participle ^d |x _t )。

Wherein the fourth conditional probability p (a) ^d |x _t ) Can be expressed as

w(a ^d ,x _t ) Characterised by the presence of the target word-segment x _t The real participle a appears ^d Number of times of (a), w (x) _t ) Characterizing emerging target participles x _t The number of times.

In step 122, the target participle x is segmented according to the target _t Corresponding to each real participleFourth condition probability p (a) ^d |x _t ) Estimating a first conditional probability p (a) that each participle state corresponds to each real participle ^d |l _i )。

The fourth condition probability represents the relation between the real participle and the target participle, and is determined from the historical text recognition result, so the fourth condition probability is more accurate. Therefore, in the present disclosure, from the accurate fourth conditional probability, the first conditional probability p (a) that each participle state corresponds to each true participle can be accurately estimated ^d |l _i ) Ensuring the estimated first conditional probability p (a) ^d |l _i ) To the accuracy of (2).

For example, considering that KL divergence (Kullback-Leibler divergence), also called relative entropy, is a measure for measuring the relative difference between two probability distributions in the same event space, the first condition probability p (a) closest to the fourth condition probability can be determined according to the relative entropy formula ^d |l _i )。

Specifically, the implementation of step 122 may be: according to the following equations (1) to (2), d (z) will be made to _t ,y _i ) Satisfying predetermined conditions

Determining a first conditional probability that each participle state corresponds to each true participle:

wherein D represents the total number of the real participles,

characterizing a target participle x _t A fourth conditional probability corresponding to the d-th real participle,

vector characterizing the first conditional probability that the ith participle state corresponds to each real participle, d (z) _t ,y _i ) Characterization z _t And y _i Relative entropy of (c).

Each real participle determined above is an independent unit, but has a dependency relationship in context. And the real participles appearing in each participle state in the HMM model are not fixed, and each participle state may result in many possibilities. Thus, in this disclosure, the first conditional probability p (a) ^d |l _i ) The following conditions also need to be satisfied:

0≤P(a ^d |l _i )≤1

furthermore, one possible implementation is: the preset condition may represent a difference between the first condition probability distribution and the fourth condition probability distribution accepted by the user, where the difference may be a default numerical value or a numerical value set by the user, and the numerical values are all greater than zero.

Considering that the probability of many different and identical characters occurs in an actual problem, and when segmenting and labeling word segmentation states of a text, each word segmentation state may include a plurality of target words, therefore, in order to improve the accuracy of identifying named entities in the whole text, another preferred embodiment is: the predetermined condition being a loss function

And is minimal. Wherein, the T _i The representation belongs to the ith participle state l _i L represents the total number of participles states,

representing the ith word segmentation state and target word segmentation x _t If the relation exists, the relation is 1, otherwise, the relation is 0.

Thus, the above solution can be made such that d (z) _t ,y _i ) Satisfying predetermined conditions

Is converted into a problem solving equation (4), i.e., the solution is made such that d (z) _t ,y _i ) Satisfying predetermined conditions

The problem (2) is converted into a problem for solving optimization, and then the solution is carried out according to the formula (4) and the formula (2), and the obtained optimal solution is the optimal solution

According to the formula (4), the following is solved: for any

Are all provided with

Wherein, as described above, the first and second substrates,

indicates if the state l ⁱ And the observed value x _t There is a connection between them,

otherwise

And does not participate in the calculation. Therefore, for any

All have:

wherein, T _i Indicating the status of belonging to the ith participle _i Total number of target participles.

Furthermore, the obtained values can be compared by theorem

Verification is performed to determine whether the above equation (5) or equation (6) is the optimal solution of equation (4). Wherein the solution is proved by theorem

I.e. the optimal solution of equation (4), which belongs to the prior art and is not described herein again.

By adopting the technical scheme, the first conditional probability p (a) is solved through a relative entropy formula ^d |l _i ) Becomes an optimization problem of a convex function, and the optimization problem can be proved to contain strict local minimum points by using theorem, and the solution is determined as the first conditional probability p (a) ^d |l _i )。

Based on the same inventive concept, the disclosure also provides a named entity recognition device. Referring to fig. 3, fig. 3 is a block diagram illustrating an apparatus for identifying a named entity according to an example embodiment. As shown in fig. 3, the means for identifying the named entity may include:

a first determining module 31 for determining the t-th target participle x in the text _t All possible corresponding real participles;

a second determining module 32, configured to determine, for each of the real participles determined by the first determining module, a first conditional probability p (a) that each participle state corresponds to the real participle, respectively ^d |l _i ) Wherein a is ^d Characterizing the d-th real participle, l _i Representing the ith word segmentation state;

a third determining module 33, configured to determine that each of the real participles determined by the second determining module corresponds to the target participle x _t Second conditional probability p (x) _t |a ^d ) And the first conditional probability p (a) ^d |l _i ) Determining that each participle state corresponds to the target participle x _t Third conditional probability p (x) _t |l _i )；

An identification module 34 for determining the third conditional probability p (x) according to the third determination module _t |l _i ) For the target word segmentation x _t Named entity recognition is performed.

Optionally, the second determining module includes:

Optionally, the estimation sub-module comprises:

a second determination submodule for making d (z) to be equal to the above-mentioned formula (1) to formula (2) _t ,y _i ) Satisfying predetermined conditions

Determining the first conditional probability that the respective participle state corresponds to each of the real participles.

Optionally, the preset conditions are: loss function

Optionally, the third determining module includes:

a third determining sub-module, configured to determine that each participle state corresponds to the target participle x according to the above formula (3) _t Third conditional probability p (x) _t |l _i )。

Optionally, the identification module comprises:

a fourth determining submodule, configured to determine a word segmentation state corresponding to the maximum third conditional probability as the target word segmentation x _t The named entity of (1) identifies the result.

With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

Fig. 4 is a block diagram illustrating an electronic device 400 according to an example embodiment. As shown in fig. 4, the electronic device 400 may include: a processor 401 and a memory 402. The electronic device 400 may also include one or more of a multimedia component 403, an input/output (I/O) interface 404, and a communication component 405.

The processor 401 is configured to control the overall operation of the electronic device 400, so as to complete all or part of the steps in the named entity identification method. The memory 402 is used to store various types of data to support operations at the electronic device 400, such as instructions for any application or method operating on the electronic device 400 and application-related data, such as contact data, messaging, pictures, audio, video, and the like. The Memory 402 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically Erasable Programmable Read-Only Memory (EEPROM), erasable Programmable Read-Only Memory (EPROM), programmable Read-Only Memory (PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia components 403 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 402 or transmitted through the communication component 405. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 404 provides an interface between the processor 401 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 405 is used for wired or wireless communication between the electronic device 400 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding Communication component 405 may include: wi-Fi module, bluetooth module, NFC module.

In an exemplary embodiment, the electronic Device 400 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the named entity identification method described above.

In another exemplary embodiment, a computer-readable storage medium is also provided, comprising program instructions which, when executed by a processor, carry out the steps of the above-mentioned method of identifying a named entity. For example, the computer readable storage medium may be the memory 402 comprising program instructions executable by the processor 401 of the electronic device 400 to perform the named entity identification method described above.

The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.

It should be noted that the various features described in the foregoing embodiments may be combined in any suitable manner without contradiction. In order to avoid unnecessary repetition, various possible combinations will not be separately described in this disclosure.

In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims

1. A method for identifying a named entity, comprising:

determining the t-th target participle x in text _t All possible corresponding real participles;

respectively determining a first conditional probability p (a) of each participle state corresponding to each real participle for each real participle ^d |l _i ) Which isIn (a) ^d Characterizing the d-th real participle, l _i Representing the ith word segmentation state;

2. The method according to claim 1, characterized in that for each of said real participles, a first conditional probability p (a) is determined that the participle state corresponds to the real participle, respectively ^d |l _i ) The method comprises the following steps:

According to the target word segmentation x _t The fourth conditional probability p (a) corresponding to each of the real participles ^d |x _t ) Estimating said first conditional probability p (a) that each participle state corresponds to each said real participle ^d |l _i )。

3. The method of claim 2, wherein the target-based word segmentation x is performed according to the target word segmentation _t The fourth conditional probability p (a) corresponding to each of the real participles ^d |x _t ) Estimating said first conditional probability p (a) that each participle state corresponds to each said real participle ^d |l _i ) The method comprises the following steps:

according to the following equations (1) to (2), d (z) will be made _t ,y _i ) Y satisfying a predetermined condition _i ^d Determining the first conditional probability for each participle state corresponding to each of the real participles:

wherein D characterizes a total number of the real participles,

4. The method according to claim 3, wherein the preset condition is: loss function

5. The method according to any of claims 1-4, wherein said each of said real participles corresponds to said target participle x _t Second conditional probability p (x) _t |a ^d ) And the first conditional probability p (a) ^d |l _i ) Determining that each participle state corresponds to the target participle x _t Third conditional probability p (x) _t |l _i ) The method comprises the following steps:

Wherein D characterizes a total number of the true participles.

6. The method according to any one of claims 1-4, wherein said determining is based on said third conditional probability p (x) _t |l _i ) For the target participle x _t Conducting named entity recognition, including:

7. An apparatus for identifying named entities, comprising:

a second determining module, configured to determine, for each of the real participles determined by the first determining module, a first conditional probability p (a) that each participle state corresponds to the real participle ^d |l _i ) Wherein a is ^d Characterizing the d-th real participle, l _i Characterizing an ith word segmentation state;

An identification module for identifying the third conditional probability p (x) determined by the third determination module _t |l _i ) For the target word segmentation x _t Named entity recognition is performed.

8. The apparatus of claim 7, wherein the third determining module comprises:

Wherein D characterizes a total number of the true participles.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.

10. An electronic device, comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 6.