CN109299458B

CN109299458B - Entity identification method, device, equipment and storage medium

Info

Publication number: CN109299458B
Application number: CN201811061626.8A
Authority: CN
Inventors: 徐波
Original assignee: Duoyi Network Co ltd; GUANGDONG LIWEI NETWORK TECHNOLOGY CO LTD; Guangzhou Duoyi Network Co ltd
Current assignee: Duoyi Network Co ltd; GUANGDONG LIWEI NETWORK TECHNOLOGY CO LTD; Guangzhou Duoyi Network Co ltd
Priority date: 2018-09-12
Filing date: 2018-09-12
Publication date: 2023-03-28
Anticipated expiration: 2038-09-12
Also published as: CN109299458A

Abstract

The invention discloses an entity recognition method, which comprises the steps of obtaining a trained entity recognition model based on LSTM, wherein the entity recognition model based on LSTM is trained by using labeled training corpus; inputting a text to be recognized into the trained entity recognition model based on the LSTM, and acquiring the probability that each character in the text to be recognized belongs to a label; inputting the probability into a CRF model to obtain the mark of each character; the LSTM network has great dependence on data, the size and quality of data quantity can influence the model training result, the LSTM model and the CRF model are combined, the LSTM model is used for solving the problem of extracting sequence characteristics, the CRF model can be used for effectively utilizing sentence-level marking information, the execution efficiency of a dialogue system is improved through the LSTM + CRF model, meanwhile, entity recognition and word segmentation are realized, and the entity recognition accuracy and efficiency are improved.

Description

Entity identification method, device, equipment and storage medium

Technical Field

The present invention relates to the field of information technology, and in particular, to a method, an apparatus, a device, and a storage medium for entity identification.

Background

In the field of artificial intelligence, attempts to mimic human conversational capabilities may be traced back to early stages of artificial intelligence. In the last years, the application of the message service class is rapidly grown, domestic WeChat, foreign WhatsApp, facebook Messenger and the like almost occupy all fragmented time of users, hundreds of millions of active users actually become the entrance of a browser in the era of mobile internet, the users can obtain most information by using only one application, and the flow bonus brought by downloading the mobile application is gradually disappeared, so that the advantages of a conversation system are reflected, the development cost is low, and the conversation system can be attached to a software platform.

In a dialog system, entity words in a sentence input by a user often need to be recognized, that is, entity recognition, and word segmentation is needed for subsequent analysis. However, in the existing dialogue system, two tasks of entity recognition and word segmentation are processed separately.

When the inventor implements the invention, the entity identification application in the prior art is found to have the following defects: the entity identification is to identify entity words therein from the sentence level, such as: person name, place name, organization name. It is similar to word segmentation, and if these two tasks are performed in isolation, it will result in the accuracy of entity word recognition and word segmentation to be reduced, such as sentences: the great bridge of Changjiang river in Nanjing. If the entity word 'Changjiang river bridge' is not recognized, the entity word is likely to be cut into: nanjing/city chang/river bridge. On the contrary, if the entity word "Yangtze river bridge" is considered, the method is divided into: nanjing city/Changjiang river bridge.

Disclosure of Invention

In view of this, embodiments of the present invention provide an entity recognition method, apparatus, device and storage medium, which can improve the execution efficiency of a dialog system and improve the accuracy of entity recognition and word segmentation by combining entity recognition with word segmentation tasks.

In a first aspect, an embodiment of the present invention provides an entity identification method, including the following steps:

acquiring an LSTM-based entity recognition model after training is finished, wherein the LSTM-based entity recognition model is trained by using labeled training corpora;

inputting a text to be recognized into the trained entity recognition model based on the LSTM, and acquiring the probability that each character in the text to be recognized belongs to a label;

and inputting the probability into a CRF model to obtain the mark of each character.

In a first possible implementation manner of the first aspect, the obtaining a trained LSTM-based entity recognition model, where the LSTM-based entity recognition model is trained using labeled corpus, includes:

acquiring the labeled training corpus;

converting the words and characters in the labeled training corpus into vectors;

and inputting the vectors of the words and the characters into the LSTM-based entity recognition model, and training parameters in the LSTM-based entity recognition model by using a back propagation method to obtain the LSTM-based entity recognition model after training.

With reference to the first aspect and the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the obtaining the labeled corpus includes:

and training the training corpus in an IB mode to obtain the labeled training corpus.

In a third possible implementation manner of the first aspect, the inputting the text to be entity-recognized into the trained LSTM-based entity recognition model, and obtaining the probability that each character in the text to be entity-recognized belongs to a label includes:

and sequentially inputting the characters of the text to be recognized by the entity into the trained entity recognition model based on the LSTM, and acquiring the probability that each character in the text to be recognized by the entity belongs to the label.

In a fourth possible implementation manner of the first aspect, the inputting the probability into the CRF model to obtain the label of each character includes:

inputting the probability into a prediction formula, and solving the maximum value of the prediction formula to obtain the optimal output label sequence, wherein the prediction formula is

Wherein y is a label sequence to be predicted of the text to be recognized by the entity, and y = (y) ₁ ，y ₂ ，…,y _n )，X＝p _i,yi The probability, p, of labeling each character in the text to be recognized by the entity _i,yi Meaning that the ith word is marked as the yth _i The probability of an individual label; a. The _yi,yi+1 Mean the th _i Individual label is transferred to the y-th _i+1 The probability of an individual label;

and labeling according to the optimal output label sequence to further obtain the label of each character.

In a second aspect, an embodiment of the present invention further provides an entity identification apparatus, including:

the entity recognition model acquisition module is used for acquiring a trained entity recognition model based on the LSTM, wherein the entity recognition model based on the LSTM is trained by using the labeled training corpus;

the probability acquisition module is used for inputting the text to be recognized into the entity recognition model based on the LSTM after the training is finished, and acquiring the probability that each character in the text to be recognized belongs to the label;

and the mark acquisition module is used for inputting the probability into a CRF model to obtain the mark of each character.

In a first possible implementation manner of the second aspect, the entity identification model obtaining module includes:

acquiring the labeled training corpus;

In a second possible implementation manner of the second aspect, the tag obtaining module includes:

Wherein y is a tag sequence to be predicted of the text to be recognized by the entity, and y = (y) ₁ ，y ₂ ，…,y _n )，X＝p _i,yi Probability p of labeling each character in the text for the entity identification _i,yi Means that the ith word is marked as the yth _i The probability of each label; a. The _yi,yi+1 Mean the th _i Individual label is transferred to the y _i+1 The probability of an individual label;

In a third aspect, an embodiment of the present invention further provides an entity identification device, which is characterized by including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor implements the entity identification method as described above when executing the computer program.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the entity identification method described above.

The embodiment of the invention has the following beneficial effects:

acquiring an LSTM-based entity recognition model after training is finished, wherein the LSTM-based entity recognition model is trained by using labeled training corpora; inputting a text to be recognized into the trained entity recognition model based on the LSTM, and acquiring the probability that each character in the text to be recognized belongs to a label; inputting the probability into a CRF model to obtain the mark of each character; by combining the LSTM-based entity recognition model with the CRF model, entity recognition and word segmentation can be simultaneously carried out, time consumption for model prediction is reduced, word segmentation is carried out by using information of entity words obtained by entity recognition, and word segmentation accuracy and efficiency can be improved.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:

fig. 1 is a schematic diagram of an entity identification device according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating an entity identification method according to a second embodiment of the present invention;

fig. 3 is a schematic diagram illustrating the result of LSTM entity identification provided in the second embodiment of the present invention;

FIG. 4 is a diagram illustrating the result of LSTM + CRF entity identification provided in the second embodiment of the present invention;

FIG. 5 is a diagram illustrating an entity identification display result according to a second embodiment of the present invention;

fig. 6 is a schematic structural diagram of an entity identifying device according to a third embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As will be appreciated by one skilled in the art, the present invention may be embodied as an apparatus, method or computer program product. Accordingly, the present disclosure may be embodied in the form of: may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software, and may be referred to herein generally as a "circuit," module "or" system. Furthermore, in some embodiments, the invention may also be embodied in the form of a computer program product in one or more computer-readable media having computer-readable program code embodied in the medium.

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The present invention is described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a virtual machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

Example one

Referring to fig. 1, fig. 1 is a schematic diagram of an entity identification device according to an embodiment of the present invention, configured to execute an entity identification method according to an embodiment of the present invention, as shown in fig. 1, the entity identification device includes: at least one processor 11, such as a CPU, at least one network interface 14 or other user interface 13, a memory 15, at least one communication bus 12, the communication bus 12 being used to enable connectivity communications between these components. The user interface 13 may optionally include a USB interface, a wired interface, and other standard interfaces. The network interface 14 may optionally include a Wi-Fi interface as well as other wireless interfaces. The memory 15 may comprise a high-speed RAM memory, and may further comprise a non-volatile memory (non-volatile) such as at least one disk memory. The memory 15 may optionally comprise at least one memory device located remotely from the aforementioned processor 11.

In some embodiments, memory 15 stores the following elements, executable modules or data structures, or a subset thereof, or an expanded set thereof:

an operating system 151, which contains various system programs for implementing various basic services and for processing hardware-based tasks;

and (5) a procedure 152.

Specifically, the processor 11 is configured to call the program 152 stored in the memory 15 to execute the entity identification method according to the following embodiments.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor being the control center for the entity identification method, with various interfaces and lines connecting the various parts of the overall entity identification method.

The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the entity-identified electronic device by executing or executing the computer programs and/or modules stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, a text conversion function, etc.), and the like; the storage data area may store data (such as audio data, text message data, etc.) created according to the use of the cellular phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

Wherein, the entity recognition integrated module can be stored in a computer readable storage medium if it is implemented in the form of software functional unit and sold or used as an independent product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

A method of entity identification of an embodiment of the present invention will be described below with reference to the accompanying drawings.

Example two

Fig. 2 is a flowchart illustrating an entity identification method according to a second embodiment of the present invention.

An entity identification method, comprising the steps of:

s11, obtaining an LSTM-based entity recognition model after training is completed, wherein the LSTM-based entity recognition model is trained by using labeled training corpora;

s12, inputting a text to be recognized into the entity recognition model based on the LSTM after the training is finished, and acquiring the probability that each character in the text to be recognized belongs to a label;

and S13, inputting the probability into a CRF model to obtain the mark of each character.

In the embodiment of the invention, in order to improve the accuracy and efficiency of entity recognition, the LSTM model and the CRF model are combined, and the entity recognition and sentence entity recognition can be realized simultaneously.

Preferably, the obtaining of the trained entity recognition model based on LSTM includes:

acquiring the labeled training corpus;

Further, the acquiring the labeled corpus includes:

In the embodiment of the present invention, first, a labeled corpus is obtained, where the labeled corpus is a process of manual labeling, and the corpus is labeled according to an IB (Inside, begin) (or labeled in other ways, such as by replacing 0,1, 2), where Begin: the first character mark belonging to the entity word is added with the current suffix if the corresponding character is the entity word. And (5) Inside: if the character is not the first character and belongs to the entity word part, adding the current suffix. Is provided with the followings: the suffix of the person name is P, the suffix of the organization name is C, the suffix of the place name is L, and if an entity identification unit is an entity start, the mark is (tagB- \8230;); if an entity identification unit is an entity intermediate vocabulary, the mark is (tag I- \8230; b). Taking the most common names of People (PER), location name (LOC) and organization name (ORG) in an entity as an example, for each sentence in the corpus, each character is labeled, for example: mariteng is the CEO of Tencent. The following can be noted: the mark of the horse is B-P; the chemical marker is I-P; the label of Teng is I-P; is marked as B; the label of the Teng is B-C; the label of the message is I-C; is marked as B; the label of C is B; the label of E is I; the label of O is I.

In the embodiment of the present invention, the words and characters in the labeled training corpus are converted into vectors, because the computer can only calculate the type of numeric value, and the input word x is character type, and the computer cannot directly calculate, vector conversion is required, and the converted vectors may be called word vectors, also called word-embedded vectors. Firstly, obtaining a word list of all words needing prediction and training according to statistics, and if the size of the word list is k, endowing each word in the word list with a unique id, wherein the value range of the id is 0-k-1, the size of a random initialization matrix is [ k, dim ], the dim is a preset threshold value, searching for the corresponding id according to each character, and further obtaining a corresponding word vector. In constructing word vectors (WordEmbedding), the first step of processing text corpora by using a mathematical model is to convert a text into a mathematical representation, and there are two methods, the first method can represent a word by a one-hot matrix, and the one-hot matrix refers to a matrix in which each line has only one element of 1 and the other elements of 0. For each word in the dictionary, a number is assigned, and when a certain sentence is coded, every word in the dictionary is codedAnd converting the word into a one-hot matrix with the position of 1 corresponding to the word number in the dictionary. For example, we will express "I love chips", which can be expressed as a matrix

A wordlebelling matrix can also be used, which assigns each word a vector representation of a fixed length, which can be set by itself, such as 300, and is actually much smaller than the dictionary length (such as 10000). And the value of the angle between two word vectors can be used as a measure of their relationship and can be expressed as ≧ based on a matrix>

In the embodiment of the present invention, the vectors of words and characters are input into the LSTM-based entity recognition model, and parameters in the LSTM-based entity recognition model are trained by using a back propagation method to obtain the LSTM-based entity recognition model after training, wherein the LSTM-based entity recognition model has a calculation formula as follows:

wherein σ is a sigmoid operation for each element, a representative point times, x _t For input, h _t For output, all W, h, c and b in the formula are initialized randomly, and corresponding probabilities can be obtained by inputting corresponding vectors into the formula, for example, "I love china" is input into the first layer LSTM neuron unit of the LSTM-based entity recognition model, and simultaneously, the output of the ith LSTM unit of the first layer LSTM is simultaneously used as the input of the (I + 1) th LSTM unit of the first layer LSTM, and then, the probability that each character output by each neuron unit of the LSTM belongs to each label is given.

In this embodiment, after obtaining the probability that each character belongs to each label, parameters in the LSTM-based entity recognition model are trained using a back propagation method to obtain a trained LSTM-based entity recognition model.The back propagation is to update the LSTM parameters based on the LSTM output result by using a chain derivation strategy, where the chain derivation is a complex function obtained by "taking multiple functions together, and the derivative is equal to the derivative of the value of the outside function substituted by the inside function, multiplied by the derivative of the inside function", for example, f (x) = x ×, and ² g (x) =2x +1, then { f [ g (x)]}'＝2[g(x)]×g'(x)＝2[2x+1]X 2=8x +4. Thereby updating the parameters in the calculation formula of the LSTM-based entity recognition model.

Preferably, after obtaining the trained LSTM-based entity recognition model, the inputting the text to be entity recognized into the trained LSTM-based entity recognition model, and obtaining the probability that each character in the text to be entity recognized belongs to the label includes:

and sequentially inputting the characters of the text to be recognized into the entity recognition model based on the LSTM after the training is finished, and acquiring the probability that each character in the text to be recognized belongs to a label.

In this embodiment, the formula is calculated based on the above-mentioned LSTM-based entity recognition model:

the entity recognition model based on the LSTM reads in one character of the text to be recognized in each step, and the probability that the character belongs to the IOB mark can be obtained through calculation in the entity recognition model based on the LSTM. Referring to fig. 3, the sentence "martemailing is CEO of Tencent", and after inputting a character at each step, the probability that the character corresponds to each label is obtained. For example, the character "horse" has a probability of 0.5 for belonging to tag B, a probability of 0.9 for belonging to tag B-P, a probability of 0.8 for belonging to tag B-L, a probability of 0.2 for belonging to tag B-C, a probability of 0.4 for belonging to tag I, a probability of 0.5 for belonging to tag I-P, a probability of 0.1 for belonging to tag I-L, and a probability of 0.5 for belonging to tag I-C.

Preferably, the inputting the probability into the CRF model to obtain the mark of each character includes:

predicting the probability inputSolving the maximum value of the prediction formula to obtain the optimal output label sequence, wherein the prediction formula is

Wherein y is a label sequence to be predicted of the text to be recognized by the entity, and y = (y) ₁ ，y ₂ ，…,y _n )，X＝p _i,yi The probability that each character in the text to be recognized belongs to the label is marked for the entity to be recognized, namely the ith character is marked as the yth character _i The probability of an individual label; a. The _yi,yi+1 Mean the th _i Individual label is transferred to the y _i+1 The probability of an individual label;

In this example, referring to FIG. 4, the schematic structure diagram of LSTM + CRF, for each input X = (X1, X2, \8230;, xn), we get a predicted label sequence y = (y =) ₁ ，y ₂ ，…,y _n ) Defining the score of the prediction as

Wherein p is _i,yi Probability of output as yi for i-th position softmax, A _yi,yi+1 For transition probability from yi to yi +1, when the number of tag (B-person, B-location \8230;) is n, the transition probability matrix is (n + 2) × (n + 2) because a start position and an end position are additionally added. The scoring function S well compensates for the deficiency of the conventional BilSTM because when a predicted sequence has a high score, it is not the label corresponding to the maximum probability value output by softmax at each position, and it is also necessary to consider the maximum sum of the transition probabilities before, i.e. it is also necessary to comply with the output rule (B cannot be followed by B), for example, if the most likely sequence output by BilSTM is BBIBIOOO, then because B->B has a small or even negative probability, then according to the s-score, such a sequence will not get the highest score, i.e. it is not the one we want. Taking "the martemaciation is CEO in tengchun" as an example, the maximum score sequence obtained after passing through the CRF model is as follows:

<xnotran> S (' CEO', (B-P, I-P, I-P, B, B-C, I-C, B, B, I, I)) = A (B-P, I-P) + A (I-P ) + A (I-P, B) + A (B, B-C) + A (B-C, I-C) + A (I-C, B) + A (B, B) + A (B, I) + A (I, I) +0.9+0.9+0.9+0.8+0.8+0.9+0.8+0.9+0.9+0.9. </xnotran> Wherein A is _yi,yi+1 Is from y _i To y _i+1 The transition probability value of (2) is obtained by the statistics of the labeled data. Therefore, the word segmentation result is as follows: martematerium/yes/Tencent/CEO.

It should be noted that, the introduced CRF model is obtained by modeling the output label binary, then calculating by using dynamic programming, and finally labeling according to the obtained optimal path.

In this embodiment, marks of each character may be displayed on the text to be recognized by the entity, for example, referring to fig. 5, and a label of the corresponding character is displayed at a preset position of each character of the text to be recognized by the entity, for example, above or below the character, or a subscript or a superscript.

The embodiment has the following beneficial effects:

acquiring an LSTM-based entity recognition model after training is finished, wherein the LSTM-based entity recognition model is trained by using labeled training corpora; inputting a text to be recognized into the trained entity recognition model based on the LSTM, and acquiring the probability that each character in the text to be recognized belongs to a label; inputting the probability into a CRF model to obtain the mark of each character; the LSTM network has great dependence on data, the size and quality of data volume can influence the model training result, the LSTM model and the CRF model are combined, the LSTM model is used for solving the problem of extracting sequence characteristics, the CRF model can effectively utilize sentence-level marking information, the LSTM + CRF model improves the execution efficiency of a dialogue system, meanwhile, entity recognition and word segmentation are realized, and the entity recognition accuracy and efficiency are improved.

EXAMPLE III

Referring to fig. 6, a schematic structural diagram of an entity identification apparatus according to a third embodiment of the present invention is provided;

an entity identification apparatus comprising:

an entity recognition model obtaining module 31, configured to obtain a trained LSTM-based entity recognition model, where the LSTM-based entity recognition model is trained using labeled training corpora;

a probability obtaining module 32, configured to input the text to be entity-recognized into the trained LSTM-based entity recognition model, and obtain a probability that each character in the text to be entity-recognized belongs to a label;

and a mark acquiring module 33, configured to input the probability into the CRF model to obtain a mark of each character.

Preferably, the entity recognition model obtaining module 31 includes:

the corpus acquiring unit is used for acquiring the labeled corpus;

a vector obtaining unit, configured to convert words and characters in the labeled training corpus into vectors;

and the parameter training unit is used for inputting the vectors of the words and the characters into the LSTM-based entity recognition model, and training parameters in the LSTM-based entity recognition model by using a back propagation method so as to obtain the LSTM-based entity recognition model after training.

Preferably, the corpus acquiring unit includes:

and training the training corpus in an IOB mode to obtain the labeled training corpus.

Preferably, the probability obtaining module 32 includes:

Preferably, the mark acquiring module 33 includes:

The embodiment has the following beneficial effects:

acquiring an LSTM-based entity recognition model after training is finished, wherein the LSTM-based entity recognition model is trained by using labeled training corpora; inputting a text to be recognized into the trained entity recognition model based on the LSTM, and acquiring the probability that each character in the text to be recognized belongs to a label; inputting the probability into a CRF model to obtain the mark of each character; the LSTM network has great dependence on data, the size and quality of data quantity can influence the model training result, the LSTM model and the CRF model are combined, the LSTM model is used for solving the problem of extracting sequence characteristics, the CRF model can be used for effectively utilizing sentence-level marking information, the execution efficiency of a dialogue system is improved through the LSTM + CRF model, meanwhile, entity recognition and word segmentation are realized, and the entity recognition accuracy and efficiency are improved.

It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

It should be noted that, in the foregoing embodiments, the description of each embodiment has an emphasis, and in a part that is not described in detail in a certain embodiment, reference may be made to the related description of other embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that acts and simulations are necessarily required of the invention.

Claims

1. An entity identification method, comprising:

inputting the probability into a CRF model to obtain the mark of each character, wherein the method specifically comprises the following steps: inputting the probability into a prediction formula, and solving the maximum value of the prediction formula to obtain the optimal output label sequence, wherein the prediction formula is

Wherein y is a tag sequence to be predicted of the text to be recognized by the entity, and y = (y) ₁ ，y ₂ ，…,y _n )，X＝p _i,yi The probability, p, of labeling each character in the text to be recognized by the entity _i,yi Meaning that the ith word is marked as the yth _i The probability of an individual label; a. The _yi,yi+1 Mean the th _i Individual label is transferred to the y _i+1 The probability of an individual label; and labeling according to the optimal output label sequence to further obtain the label of each character.

2. The entity recognition method according to claim 1, wherein the obtaining of the trained LSTM-based entity recognition model, wherein the LSTM-based entity recognition model is trained using labeled corpus, comprises:

acquiring the labeled training corpus;

3. The entity recognition method according to claim 2, wherein the obtaining the labeled corpus comprises:

4. The entity recognition method according to claim 1, wherein the inputting the text to be recognized into the trained LSTM-based entity recognition model, and the obtaining the probability that each character in the text to be recognized belongs to the label comprises:

5. An entity identification apparatus, comprising:

the mark acquisition module is used for inputting the probability into a CRF model to obtain marks of each character, and specifically comprises the following steps: inputting the probability into a prediction formula, and solving the maximum value of the prediction formula to obtain the optimal output label sequence, wherein the prediction formula is

Wherein y is a label sequence to be predicted of the text to be recognized by the entity, and y = (y) ₁ ，y ₂ ，…,y _n )，X＝p _i,yi The probability, p, of labeling each character in the text to be recognized by the entity _i,yi Means that the ith word is marked as the yth _i The probability of an individual label; a. The _yi,yi+1 Mean the th y _i Individual label is transferred to the y-th _i+1 The probability of an individual label; and labeling according to the optimal output label sequence to further obtain the label of each character.

6. The entity recognition apparatus of claim 5, wherein the entity recognition model obtaining module comprises:

acquiring the labeled training corpus;

7. An entity identification device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor when executing the computer program implementing the entity identification method of any one of claims 1 to 4.

8. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the entity identification method according to any one of claims 1 to 4.