WO2020143163A1 - 基于注意力机制的命名实体识别方法、装置和计算机设备 - Google Patents

基于注意力机制的命名实体识别方法、装置和计算机设备 Download PDF

Info

Publication number
WO2020143163A1
WO2020143163A1 PCT/CN2019/091305 CN2019091305W WO2020143163A1 WO 2020143163 A1 WO2020143163 A1 WO 2020143163A1 CN 2019091305 W CN2019091305 W CN 2019091305W WO 2020143163 A1 WO2020143163 A1 WO 2020143163A1
Authority
WO
WIPO (PCT)
Prior art keywords
named entity
text
recognized
entity recognition
training text
Prior art date
Application number
PCT/CN2019/091305
Other languages
English (en)
French (fr)
Inventor
丁程丹
许开河
王少军
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020143163A1 publication Critical patent/WO2020143163A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the present application relates to the field of artificial intelligence technology, and in particular, to a named entity recognition method, device, and computer equipment based on an attention mechanism.
  • Named entity recognition (Named Entity Recognition; hereinafter referred to as: NER) refers to the identification of entities with specific meaning in the text, mainly including person names, place names, institution names and/or proper nouns. Natural language processing and machine learning are an important direction of artificial intelligence. In language text processing, named entity recognition is a prerequisite for language text processing. The quality of recognition directly affects subsequent work, so named entity recognition is information processing. Prerequisites and important tasks.
  • the hidden layer can only hide the length not greater than the length threshold
  • the node performs the operation.
  • the hidden layer can perform operations on all the hidden nodes input, which has no effect on the final recognition result of the named entity.
  • the hidden layer has to discard some hidden nodes. In this way, the abandoned hidden node is likely to contain the named entity information of the text, which will cause inaccurate identification of the named entity.
  • the purpose of the present application is to provide a named entity recognition method, device and computer equipment based on the attention mechanism, so as to realize the recognition of the named entity through the attention mechanism and improve the recognition accuracy of the named entity.
  • an embodiment of the present application provides a named entity recognition method based on an attention mechanism, which includes: segmenting a text to be recognized, and mapping the word segmentation of the text to be recognized into a vector to obtain words of the text to be recognized Vector; the word vector of the text to be recognized is given attention weight, and the word vector with attention weight is input into a named entity recognition model for layer-by-layer operation to obtain the named entity recognition result of the text to be recognized; wherein, The named entity recognition model includes at least two hidden layers. When performing the layer-by-layer operation through the named entity recognition model, the hidden node output from the previous hidden layer is input to the next hidden layer.
  • an embodiment of the present application provides a named entity recognition device based on an attention mechanism, including: a word segmentation module for segmenting text to be recognized; a mapping module for segmenting the to-be-recognized obtained by the word segmentation module The word segmentation of the text is mapped to a vector to obtain the word vector of the text to be recognized; the recognition module is used to assign attention weight to the word vector of the text to be recognized obtained by the mapping module, and to assign the word to the attention weight
  • the vector input named entity recognition model performs layer-by-layer operation to obtain the named entity recognition result of the text to be recognized; wherein, the named entity recognition model includes at least two hidden layers, and the layered operation is performed by the named entity recognition model , Input the hidden node output from the previous hidden layer to the next hidden layer.
  • an embodiment of the present application provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor.
  • a computer program stored on the memory and executable on the processor.
  • an embodiment of the present application provides a computer non-volatile readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the method as described above is implemented.
  • the word segmentation of the text to be recognized is mapped to a vector to obtain the word vector of the text to be recognized, and then the word vector of the text to be recognized is given attention weight and will be given attention
  • the weighted word vector is input to the named entity recognition model to perform layer-by-layer operation to obtain the named entity recognition result of the text to be recognized; wherein the named entity recognition model includes at least two hidden layers, and the layered operation is performed by the named entity recognition model
  • the hidden nodes output by the previous hidden layer are input to the next hidden layer. Since the hidden nodes input by each hidden layer are given attention weights, each hidden layer performs the hidden node according to the attention weight of the hidden nodes.
  • the operation can realize the recognition of the named entity through the attention mechanism, improve the recognition accuracy of the named entity, and thus can avoid the loss of the hidden node caused by the length of the hidden layer node exceeding the hidden layer length threshold.
  • FIG. 1 is a flowchart of an embodiment of a method for identifying named entities based on an attention mechanism in the present application
  • FIG. 2 is a flowchart of another embodiment of a method for identifying named entities based on an attention mechanism in the present application
  • FIG. 3 is a flowchart of another embodiment of a method for identifying named entities based on an attention mechanism in the present application
  • FIG. 4 is a flow chart of yet another embodiment of a method for identifying named entities based on an attention mechanism in the present application
  • FIG. 5 is a schematic structural diagram of an embodiment of a named entity recognition device based on an attention mechanism in the present application
  • FIG. 6 is a schematic structural diagram of another embodiment of a named entity recognition device based on an attention mechanism according to this application;
  • FIG. 7 is a schematic structural diagram of an embodiment of a computer device according to this application.
  • FIG. 1 is a flowchart of an embodiment of a method for identifying named entities based on an attention mechanism of the present application. As shown in FIG. 1, the above method for identifying named entities based on an attention mechanism may include:
  • Step 101 Perform word segmentation on the text to be recognized, and map the word segmentation of the text to be recognized into a vector to obtain a word vector of the text to be recognized.
  • the text to be recognized can be a sentence, which can include words and punctuation marks.
  • the word segmentation of the text to be recognized may be to separate each word and punctuation in the sentence of the text to be recognized. For example, "Chinese women's volleyball team won the first place in the group stage, and reached the final.”
  • the result of the word segmentation can be: "/China/China/Female/Volleyball/Win/Win/Deal/Small/Group/Contest/ The first/one/,/and/and/in/in/out/decision/match/./” maps the word segmentation of the text to be recognized to a vector, which can be achieved by separating each word and punctuation from the text to be recognized Look up the word segmentation vector mapping table to get the corresponding word vector.
  • the word segmentation vector mapping table here may be a word segmentation vector mapping table stored or loaded in advance.
  • step 102 the word vectors of the text to be recognized are given attention weights, and the word vectors with the attention weights are input into a named entity recognition model to perform layer-by-layer operation to obtain a named entity recognition result of the text to be recognized.
  • the named entity recognition model includes at least two hidden layers.
  • step 102 may further include: acquiring the attention weight of the word vector of the text to be recognized according to the context semantics of the text to be recognized.
  • the attention weight of each word vector of the text to be recognized may be the same or different.
  • the hidden nodes input by the hidden layers can be given different or the same attention weight. This embodiment does not limit this.
  • the named entity recognition model performs layer-wise operation on the input word vectors by using one or a combination of the following algorithms: Bi-directional Long-Short-Term Memory Neural Network (Bi-directional Long Short-Term Memory; hereinafter referred to as: Bi -LSTM), Conditional Random Fields (Conditional Random Fields; hereinafter referred to as: CRF) and Convolutional Neural Network (Convolutional Neural Network; hereinafter referred to as: CNN).
  • Bi-directional Long-Short-Term Memory Neural Network Bi-directional Long Short-Term Memory
  • CRF Conditional Random Fields
  • CNN Convolutional Neural Network
  • the word segmentation of the text to be recognized is mapped to a vector to obtain the word vector of the text to be recognized, and then the word vector of the text to be recognized is given attention
  • the weight vector and input the word vector with the attention weight into the named entity recognition model to perform layer-by-layer operation to obtain the named entity recognition result of the text to be recognized; wherein, the named entity recognition model includes at least two hidden layers.
  • the entity recognition model performs layer-by-layer operation, the hidden nodes output by the previous hidden layer are input to the next hidden layer.
  • each hidden layer is based on the attention of the hidden node Force weights, computing hidden nodes, can realize the recognition of named entities through the attention mechanism, improve the accuracy of named entity recognition, and then avoid the hidden nodes caused by the length of the hidden layer node exceeding the length threshold of the hidden layer Loss.
  • FIG. 2 is a flowchart of another embodiment of a named entity recognition method based on an attention mechanism of the present application.
  • the named entity recognition model has an initial layer and the following initial Taking two hidden layers and three computing layers as an example, step 102 may include:
  • step 201 the word vector of the text to be recognized is input to the initial layer of the named entity recognition model, and the hidden layer is output after the initial layer is calculated.
  • the word vectors of the text to be recognized are spliced into a vector string and input into the named entity recognition model for layer-by-layer operation.
  • the above hidden nodes are equivalent to feature vectors representing the features of the text to be recognized.
  • the length of the vector that the hidden layer of the named entity recognition model can process can be the length of the vector string formed by concatenating the hidden nodes input by the hidden layer.
  • each hidden node output from the initial layer is given attention weight according to the semantics of the text to be recognized.
  • the hidden nodes input to each hidden layer are given attention weight according to the semantics of the text to be recognized above and below before being input to the hidden layer.
  • the attention weight can be realized: if the length of the hidden node input to the hidden layer exceeds the length threshold that the hidden layer can handle, then at this time, according to the attention weight given by the hidden layer node, the hidden with high attention weight is preferentially calculated Layer nodes, discard hidden nodes with low attention weight.
  • the hidden nodes input to each hidden layer are given attention weight. For example, "Gao Xiaohong saw the porcelain of the Ming Dynasty in the Palace Museum".
  • the word segmentation vector obtained from this sentence is input to the initial layer of the named entity recognition model.
  • the hidden nodes output from the initial layer can be: h11, h21, h31...hn1.
  • the hidden nodes output by these initial layers are input to the first hidden layer. Since they are calculated from the word vectors of the text to be recognized, the hidden nodes output by the initial layer will carry the semantic features of the text to be recognized.
  • h11 is derived from the word vectors of the words “high” and “small”
  • h21 is derived from the word vectors of the word “red”, although “high”, “small” and “red” Separating these three words separately is not a named entity, but according to the semantics of the three words " ⁇ ” above and below, it is judged that " ⁇ ” is a named entity, so the hidden nodes h11, h21 can be given higher attention Weights.
  • the two words "gu” and “gong” are not separately named entities. However, according to the semantics of the "Forbidden City” above and below, it is a named entity.
  • the hidden node h31 is obtained by the word vector operation of the "gu"
  • the hidden node h41 is obtained by the word vector operation of the "gu”. Therefore, the hidden nodes h31 and h41 are also Can be given a higher attention weight.
  • step 203 the hidden node output from the initial layer to which the attention weight is given is input to the first hidden layer, and the hidden layer of the first layer is output after the operation.
  • each hidden node output from the first hidden layer is given attention weight according to the semantics of the text to be recognized.
  • the hidden nodes h11, h21, h31...hn1 input to the first hidden layer are also feature vectors with upper and lower semantic information of the text to be recognized . Therefore, for the same reason, the hidden nodes input to each hidden layer can determine the attention weight of each hidden node according to the semantics of the text to be recognized above and below.
  • step 205 the hidden node output from the first hidden layer given the attention weight is input into the second hidden layer, and the second hidden layer outputs the recognition result of the text to be recognized after the operation.
  • the above embodiment only lists the case where the named entity recognition model has three computing layers.
  • the number of computing layers of the named entity recognition model may also be 2, 4, 5, 6, etc.
  • the specific number of layers can be based on Actually, it needs to be set, but the named entity recognition model recognizes the text to be recognized.
  • the method of recognizing named entities is similar to the above embodiment, and may include: after giving attention weight to each hidden node to be input of each hidden layer, it will be given The hidden node with the attention weight is input to the corresponding hidden layer for calculation.
  • the semantics of the top and bottom can be used as an auxiliary judgment condition.
  • FIG. 3 is a flowchart of another embodiment of a method for identifying named entities based on an attention mechanism of the present application. As shown in FIG. 3, in the embodiment shown in FIG. 1 of the present application, before step 102, the method may further include:
  • Step 301 Obtain training text and segment the training text.
  • Step 302 Mark the named entities in the training text after word segmentation.
  • labeling the named entity in the training text after word segmentation may be: whether the word segmentation of the training text belongs to the named entity, the position of the word segmentation of the training text in the named entity to which it belongs, and/or the Mark the type of named entity to which the participle belongs.
  • the named entities in the training text can be marked by BIO labeling and/or IOBES labeling.
  • the named entity recognition model is a Bi-LSTM model
  • the training text can be annotated in the manner of IOBES (Inside, Other, Begin, End, Single). If a participle is a separate entity, it is marked as (tag S-...); if a participle is an entity, it is marked as (tag B-...); if a participle is an entity intermediate vocabulary, it is marked as (tag I-...); if a participle is the end of an entity, it is marked as (tag E-...); if a participle is not an entity, it is marked as (tag O).
  • IOBES Inside, Other, Begin, End, Single.
  • PER personal name
  • place name LOC
  • organization name ORG
  • the named entity recognition model is the Bi-LSTM+CRF model
  • the training text can be marked according to the BIO method, that is, B-PER, I-PER represent the first word of the person’s name, and the non-first word of the person’s name
  • B-LOC, I- LOC stands for the first word of the place name and non-first word of the place name
  • B-ORG and I-ORG stands for the first word of the name of the organization
  • O means that the word is not part of the named entity.
  • Step 303 Map the word segmentation of the training text to a vector to obtain the word vector of the training text.
  • each word and character separated from the training text obtains the corresponding word vector by searching the word segmentation vector mapping table.
  • the word segmentation vector mapping table here is a word segmentation vector mapping table stored or loaded in advance.
  • Step 304 Input the word vector of the training text into the named entity recognition model to be trained for layer-by-layer operation to train the named entity model to be trained.
  • step 304 may be the same as the above-mentioned named entity recognition model to recognize text to be recognized. The difference is that the named entity recognition model to be trained here is not trained, so There may be an error between the recognition result of the named entity of the training text output by the trained named entity model and the named entity marked in step 302.
  • the layered operation of the named entity recognition model to be trained may be one or a combination of the following algorithms: Bi-LSTM, CRF and CNN.
  • the training of the named entity model to be trained is to train the parameters of the named entity recognition model to be trained layer by layer and the attention weight given to the hidden nodes of each hidden layer.
  • FIG. 4 is a flowchart of another embodiment of a method for identifying named entities based on an attention mechanism in the present application. As shown in FIG. 4, in the embodiment shown in FIG. 3 of the present application, after step 304, the method may further include:
  • Step 401 After the end of the training process, obtain the named entity recognition result of the training text output by the named entity model to be trained.
  • Step 402 Compare the named entity recognition result of the training text with the named entity marked in the training text.
  • the comparison method may be that, according to the named entity recognition result of the training text and the word vector of the training text, a loss function reflecting the accuracy of the named entity recognition result of the training text is constructed.
  • the constructed loss function may be the square difference between the recognition result of the named entity and the word vector of the training text.
  • Step 403 According to the comparison result, adjust the attention weight given to the word vector in the next training process.
  • the gradient descent algorithm can be used to solve the minimum value of the loss function, and the gradient descent algorithm can use the negative gradient direction to determine the parameter adjustment direction of the loss function for each iteration. Therefore, the named entity recognition model to be trained can be obtained for the training text.
  • the gradual reduction of the loss function means that the parameters of the word vectors of the training text to be trained by the named entity recognition model to be trained layer by layer and the hidden nodes of the hidden layers are given more and more accurate attention weights.
  • Step 404 If the error between the named entity recognition result of the training text and the named entity marked in the training text is less than a predetermined error threshold, obtain a trained named entity recognition model.
  • the above-mentioned predetermined error threshold can be set by itself according to system performance and/or implementation requirements during specific implementation.
  • the size of the above-mentioned predetermined error threshold is not limited.
  • FIG. 5 is a schematic structural diagram of an embodiment of a named entity recognition apparatus based on an attention mechanism of the present application.
  • the named entity recognition apparatus based on an attention mechanism provided in this embodiment can implement the named entity recognition method based on the attention mechanism provided by the present application .
  • the above named entity recognition device based on the attention mechanism may include: a word segmentation module 51, a mapping module 52, and a recognition module 53;
  • the word segmentation module 51 is used to segment the text to be recognized; wherein, the text to be recognized may be a sentence, and the sentence may include words and punctuation marks.
  • the word segmentation module 51 performs word segmentation on the text to be recognized, which may be to separate each word and punctuation in the sentence of the text to be recognized. For example, "Chinese women's volleyball team won the first place in the group stage, and reached the final.”
  • the result of the word segmentation can be: "/China/China/Female/Volleyball/Win/Win/Deal/Small/Group/Contest/ The first/one/,/and/and/in/in/out/decision/match/./"
  • the mapping module 52 is configured to map the word segmentation of the text to be recognized obtained by the word segmentation module 51 into a vector to obtain the word vector of the text to be recognized; specifically, the mapping module 52 maps the word segmentation of the text to be recognized to a vector, which may be Each word and punctuation mark separated in the text to be recognized obtain the corresponding word vector by searching the word segmentation vector mapping table.
  • the word segmentation vector mapping table here may be a word segmentation vector mapping table stored or loaded in advance.
  • the recognition module 53 is used to assign attention weight to the word vector of the text to be recognized obtained by the mapping module 52, and input the attention weighted word vector into the named entity recognition model for layer-by-layer operation to obtain the name of the text to be recognized Entity recognition result; wherein, the named entity recognition model includes at least two hidden layers.
  • the hidden node output from the previous hidden layer is input to the next hidden layer.
  • the method of performing layer-wise operation on the input word vector by the named entity recognition model may be one or a combination of the following algorithms: Bi-LSTM, CRF, and CNN.
  • the mapping module 52 maps the word segmentation of the text to be recognized into a vector to obtain the word vector of the text to be recognized, and then the recognition module 53 will The word vector of the text to be recognized is given attention weight, and the word vector with attention weight is input into the named entity recognition model to perform layer-by-layer operation to obtain the named entity recognition result of the text to be recognized; wherein, the named entity recognition model includes At least two hidden layers, when performing layer-by-layer operation through the above named entity recognition model, the hidden nodes output by the previous hidden layer are input to the next hidden layer, because the hidden nodes input by each hidden layer are given attention weight , Each hidden layer performs operations on hidden nodes according to the attention weight of the hidden nodes, which can realize the recognition of named entities through the attention mechanism, improve the recognition accuracy of named entities, and thus can avoid the length of hidden layer nodes exceeding the hidden layer The length threshold, and the loss caused by hidden nodes
  • FIG. 6 is a schematic structural diagram of another embodiment of a named entity recognition device based on an attention mechanism of the present application. Compared with the named entity recognition device based on an attention mechanism shown in FIG. 5, the difference lies in that The named entity recognition device based on the attention mechanism may further include: an acquisition module 54;
  • the obtaining module 54 is configured to obtain the attention weight of the word vector of the text to be recognized according to the context semantics of the text to be recognized before the recognition module 53 assigns the word vector of the text to be recognized to the attention weight.
  • the attention weight of each word vector of the text to be recognized may be the same or different.
  • the hidden nodes input by the hidden layers can be given the same or different attention weights. This embodiment does not limit this.
  • the above named entity recognition device based on the attention mechanism may further include: a labeling module 55 and a training module 56;
  • the word segmentation module 51 is also used to obtain the training text before the recognition module 53 assigns the word vector of the text to be recognized to the attention weight, and inputs the attention weighted word vector into the named entity recognition model for layer-by-layer operation, and Word segmentation of the above training text;
  • the labeling module 55 is used to label named entities in the training text after the word segmentation module 51 is segmented; in this embodiment, the labeling module 55 is specifically used to determine whether the word segmentation of the training text belongs to the named entity and the word segmentation of the training text is The position in the named entity to which it belongs and/or the type of the named entity to which the participle of the training text belongs is marked.
  • the labeling module 55 may label the named entities in the training text by means of BIO labeling and/or IOBES labeling.
  • the named entity recognition model is a Bi-LSTM model
  • the training text can be annotated in the manner of IOBES (Inside, Other, Begin, End, Single). If a participle is a separate entity, it is marked as (tag S-...); if a participle is an entity, it is marked as (tag B-...); if a participle is an entity intermediate vocabulary, it is marked as (tag I-...); if a participle is the end of an entity, it is marked as (tag E-...); if a participle is not an entity, it is marked as (tag O).
  • IOBES Inside, Other, Begin, End, Single.
  • PER personal name
  • place name LOC
  • organization name ORG
  • the named entity recognition model is the Bi-LSTM+CRF model
  • the training text can be marked according to the BIO method, that is, B-PER, I-PER represent the first word of the person’s name, and the non-first word of the person’s name
  • B-LOC, I- LOC stands for the first word of the place name and non-first word of the place name
  • B-ORG and I-ORG stands for the first word of the name of the organization
  • O means that the word is not part of the named entity.
  • the mapping module 52 is also used to map the word segmentation of the training text to a vector to obtain the word vector of the training text; wherein, the mapping module 52 can separate each word and character separated by the training text by searching the word segmentation vector mapping table to obtain the corresponding Word vector.
  • the word segmentation vector mapping table here is a word segmentation vector mapping table stored or loaded in advance.
  • the training module 56 is configured to input the word vector of the training text obtained by the mapping module 52 into the named entity recognition model to be trained to perform layer-by-layer operation to train the named entity model to be trained.
  • the training module 56 may also obtain the named entity recognition result of the training text output by the named entity model to be trained after the training process ends;
  • the results of the named entity recognition are compared with the named entities marked in the training text; according to the comparison results, the attention weight given to the word vector in the next training process is adjusted; if the recognition result of the named entity in the training text is the same as that marked in the training text
  • the error of the named entity is less than the predetermined error threshold, and a trained named entity recognition model is obtained.
  • the above-mentioned predetermined error threshold can be set by itself according to system performance and/or implementation requirements during specific implementation. In this embodiment, the size of the above-mentioned predetermined error threshold is not limited.
  • the computer device may include a memory, a processor, and a computer program stored on the memory and executable on the processor.
  • the named entity recognition method based on the attention mechanism provided by the embodiments of the present application may be implemented.
  • FIG. 7 shows a block diagram of an exemplary computer device 12 suitable for implementing embodiments of the present application.
  • the computer device 12 shown in FIG. 7 is only an example, and should not bring any limitation to the functions and use scope of the embodiments of the present application.
  • the computer device 12 is represented in the form of a general-purpose computing device.
  • the components of the computer device 12 may include, but are not limited to, one or more processors or processing units 16, a system memory 28, and a bus 18 connecting different system components (including the system memory 28 and the processing unit 16).
  • the bus 18 represents one or more of several types of bus structures, including a memory bus or a memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures.
  • the computer device 12 typically includes a variety of computer system readable media.
  • the system memory 28 may include a computer system readable medium in the form of volatile memory, such as random access memory (Random Access Memory; hereinafter referred to as RAM) 30 and/or cache memory 32.
  • RAM random access memory
  • the computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media.
  • the storage system 34 may be used to read and write non-removable, non-volatile magnetic media (not shown in FIG.
  • a program/utility tool 40 having a set of (at least one) program modules 42 may be stored in, for example, the memory 28.
  • Such program modules 42 include, but are not limited to, an operating system, one or more application programs, and other programs Modules and program data, each of these examples or some combination may include the implementation of the network environment.
  • the program module 42 generally performs the functions and/or methods in the embodiments described in this application.
  • the computer device 12 may also communicate with one or more external devices 14 (such as a keyboard, pointing device, display 24, etc.), and may also communicate with one or more devices that enable a user to interact with the computer device 12, and/or with This allows the computer device 12 to communicate with any device (such as a network card, modem, etc.) that communicates with one or more other computing devices. This communication can be performed through an input/output (I/O) interface 22.
  • the computer device 12 can also be connected to one or more networks (such as a local area network (Local Area Network; hereinafter referred to as LAN), wide area network (Wide Area Network; hereinafter referred to as WAN) and/or a public network such as the Internet through the network adapter 20 ) Communication.
  • networks such as a local area network (Local Area Network; hereinafter referred to as LAN), wide area network (Wide Area Network; hereinafter referred to as WAN) and/or a public network such as the Internet through the network adapter 20
  • the network adapter 20 communicates with other modules of the computer device 12 through the bus 18.
  • the processing unit 16 executes various functional applications and data processing by running the program stored in the system memory 28, for example, to implement the named entity recognition method based on the attention mechanism provided by the embodiment of the present application.
  • An embodiment of the present application also provides a computer non-volatile readable storage medium on which a computer program is stored.
  • the named entity recognition method based on the attention mechanism provided by the embodiment of the present application may be implemented .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Character Discrimination (AREA)
  • Machine Translation (AREA)

Abstract

本申请提出了一种基于注意力机制的命名实体识别方法、装置和计算机设备,上述基于注意力机制的命名实体识别方法包括:对待识别文本进行分词,并将所述待识别文本的分词映射为向量,得到所述待识别文本的词向量;将所述待识别文本的词向量赋予注意力权重,并将赋予注意力权重的词向量输入命名实体识别模型进行逐层运算,获得所述待识别文本的命名实体识别结果;其中,所述命名实体识别模型包括至少两层隐藏层,通过所述命名实体识别模型进行逐层运算时,将上一层隐藏层输出的隐藏节点输入下一层隐藏层。本申请可以实现通过注意力机制对命名实体进行识别,提高命名实体的识别准确率。

Description

基于注意力机制的命名实体识别方法、装置和计算机设备
本申请要求于2019年01月07日提交中国专利局、申请号为201910012152.6、申请名称为“基于注意力机制的命名实体识别方法、装置和计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种基于注意力机制的命名实体识别方法、装置和计算机设备。
背景技术
命名实体识别(Named Entity Recognition;以下简称:NER)是指识别文本中具有特定意义的实体,主要包括人名、地名、机构名和/或专有名词等。自然语言处理和机器学习是人工智能的一个重要方向,在语言文本处理中,命名实体识别是语言文本处理的一个前提工作,识别的质量直接影响到后续的工作,因此命名实体识别是信息处理的前提和重要任务。
现有相关技术中,命名实体的识别方式主要有两种,第一种,基于正则规则的方式;第二种,基于深度学习的方式。然而,第一种实现方式虽然实现简单,但识别效果不是很好;第二种实现方式,由于深度学习模型的隐藏层的运算能力有限,也即隐藏层只能对长度不大于长度阈值的隐藏节点进行运算。当输入隐藏层的隐藏节点的长度不大于长度阈值时,隐藏层能够对输入的全部隐藏节点进行运算,这对最后的命名实体的识别结果没有影响。但是,当输入隐藏层输入隐藏节点的长度大于长度阈值时,隐藏层只好舍弃部分隐藏节点。如此,被舍弃的隐藏节点很有可能包含文本的命名实体信息,这样会造成对命名实体识别的不准确。
因此,如何提高对文本中命名实体识别的准确率,成为目前亟待解决的技术问题。
申请内容
有鉴于此,本申请的目的在于提供一种基于注意力机制的命名实体识别方法、装置和计算机设备,以实现通过注意力机制对命名实体进行识别,提高命名实体的识别准确率。
第一方面,本申请实施例提供一种基于注意力机制的命名实体 识别方法,包括:对待识别文本进行分词,并将所述待识别文本的分词映射为向量,得到所述待识别文本的词向量;将所述待识别文本的词向量赋予注意力权重,并将赋予注意力权重的词向量输入命名实体识别模型进行逐层运算,获得所述待识别文本的命名实体识别结果;其中,所述命名实体识别模型包括至少两层隐藏层,通过所述命名实体识别模型进行逐层运算时,将上一层隐藏层输出的隐藏节点输入下一层隐藏层。
第二方面,本申请实施例提供一种基于注意力机制的命名实体识别装置,包括:分词模块,用于对待识别文本进行分词;映射模块,用于将所述分词模块获得的所述待识别文本的分词映射为向量,得到所述待识别文本的词向量;识别模块,用于将所述映射模块得到的所述待识别文本的词向量赋予注意力权重,并将赋予注意力权重的词向量输入命名实体识别模型进行逐层运算,获得所述待识别文本的命名实体识别结果;其中,所述命名实体识别模型包括至少两层隐藏层,通过所述命名实体识别模型进行逐层运算时,将上一层隐藏层输出的隐藏节点输入下一层隐藏层。
第三方面,本申请实施例提供一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时,实现如上所述的方法。
第四方面,本申请实施例一种计算机非易失性可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如上所述的方法。
以上技术方案中,对待识别文本进行分词之后,将上述待识别文本的分词映射为向量,得到上述待识别文本的词向量,然后将上述待识别文本的词向量赋予注意力权重,并将赋予注意力权重的词向量输入命名实体识别模型进行逐层运算,获得上述待识别文本的命名实体识别结果;其中,上述命名实体识别模型包括至少两层隐藏层,通过上述命名实体识别模型进行逐层运算时,将上一层隐藏层输出的隐藏节点输入下一层隐藏层,由于各隐藏层输入的隐藏节点均被赋予了注意力权重,各隐藏层根据隐藏节点的注意力权重,对隐藏节点进行运算,可以实现通过注意力机制对命名实体进行识别,提高命名实体的识别准确率,进而可以避免由于隐藏层节点的长度超出隐藏层的长度阈值,而造成的隐藏节点的损失。
附图说明
为了更清楚地说明本申请具体实施方式或现有技术中的技术方 案,下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施方式,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请基于注意力机制的命名实体识别方法一个实施例的流程图;
图2为本申请基于注意力机制的命名实体识别方法另一个实施例的流程图;
图3为本申请基于注意力机制的命名实体识别方法再一个实施例的流程图;
图4为本申请基于注意力机制的命名实体识别方法再一个实施例的流程图;
图5为本申请基于注意力机制的命名实体识别装置一个实施例的结构示意图;
图6为本申请基于注意力机制的命名实体识别装置另一个实施例的结构示意图;
图7为本申请计算机设备一个实施例的结构示意图。
具体实施方式
为了更好的理解本申请的技术方案,下面结合附图对本申请实施例进行详细描述。
图1为本申请基于注意力机制的命名实体识别方法一个实施例的流程图,如图1所示,上述基于注意力机制的命名实体识别方法可以包括:
步骤101,对待识别文本进行分词,并将待识别文本的分词映射为向量,得到待识别文本的词向量。
其中,待识别文本可以是一句话,这句话里可以包括字以及标点符号。对待识别文本进行分词可以是将待识别文本这句话中的每一个字、标点符号都分离出来。例如,“中国女排赢得了小组赛第一,并且进入了决赛。”对这句话分词的结果可以是:“/中/国/女/排/赢/得/了/小/组/赛/第/一/,/并/且/进/入/了/决/赛/。/”将待识别文本的分词映射为向量,可以是将待识别文本中分离出来的每一个字、标点符号通过查找分词向量映射表得到对应的词向量。这里的分词向量映射表可以是预先存储或加载的分词向量映射表。
步骤102,将待识别文本的词向量赋予注意力权重,并将赋予注意力权重的词向量输入命名实体识别模型进行逐层运算,获得待 识别文本的命名实体识别结果。其中,命名实体识别模型包括至少两层隐藏层,通过命名实体识别模型进行逐层运算时,将上一层隐藏层输出的隐藏节点输入下一层隐藏层。
进一步地,步骤102之前,还可以包括:根据上述待识别文本的上下文语义,获取上述待识别文本的词向量的注意力权重。
其中,待识别文本的各词向量被输入命名实体识别模型时,待识别文本的各词向量的注意力权重可以是相同或不同的。在命名实体识别模型对待识别文本的词向量进行逐层运算过程中,根据待识别文本的上、下文语义,各隐藏层输入的各隐藏节点可以被赋予不同或相同的注意力权重。本实施例对此不作限定。
本实施例中,命名实体识别模型对输入的词向量进行逐层运算的方式可以是采用以下算法之一或组合:双向长短期记忆神经网络(Bi-directional Long Short-Term Memory;以下简称:Bi-LSTM)、条件随机场(Conditional Random Fields;以下简称:CRF)和卷积神经网络(Convolutional Neural Network;以下简称:CNN)。
上述基于注意力机制的命名实体识别方法中,对待识别文本进行分词之后,将上述待识别文本的分词映射为向量,得到上述待识别文本的词向量,然后将上述待识别文本的词向量赋予注意力权重,并将赋予注意力权重的词向量输入命名实体识别模型进行逐层运算,获得上述待识别文本的命名实体识别结果;其中,上述命名实体识别模型包括至少两层隐藏层,通过上述命名实体识别模型进行逐层运算时,将上一层隐藏层输出的隐藏节点输入下一层隐藏层,由于各隐藏层输入的隐藏节点均被赋予了注意力权重,各隐藏层根据隐藏节点的注意力权重,对隐藏节点进行运算,可以实现通过注意力机制对命名实体进行识别,提高命名实体的识别准确率,进而可以避免由于隐藏层节点的长度超出隐藏层的长度阈值,而造成的隐藏节点的损失。
图2为本申请基于注意力机制的命名实体识别方法另一个实施例的流程图,如图2所示,本申请图1所示实施例中,以命名实体识别模型有一初始层以及初始以下的两层隐藏层三层运算层为例,步骤102可以包括:
步骤201,将待识别文本的词向量输入命名实体识别模型的初始层,初始层经运算后输出隐藏节点。
其中,待识别文本的词向量进行拼接一个向量串输入命名实体识别模型进行逐层运算。上述的隐藏节点相当于表示待识别文本特 征的特征向量。命名实体识别模型的隐藏层能够处理的向量长度可以是该隐层输入的隐藏节点相互拼接后形成的向量串的长度。
步骤202,对初始层输出的各隐藏节点,根据待识别文本的上、下文语义赋予注意力权重。
本实施例中,输入各隐藏层的隐藏节点在被输入隐藏层之前,都要根据待识别文本的上、下文语义被赋予了注意力权重。该注意力权重可以实现:如果输入该隐藏层的隐藏节点的长度超出了该隐藏层能够处理的长度阈值,此时会根据隐藏层节点被赋予的注意力权重,优先运算注意力权重高的隐藏层节点,将那些注意力权重低的隐藏节点舍弃。
具体地,根据待识别文本的上、下文语义对输入各隐藏层的隐藏节点赋予注意力权重。例如,“高小红在故宫博物馆看到了明朝的瓷器”,由这句话得到的分词向量输入命名实体识别模型的初始层,初始层输出的隐藏节点可以为:h11、h21、h31……hn1。这些初始层输出的隐藏节点输入第一层隐藏层,由于是由待识别文本的词向量计算得到的,初始层输出的隐藏节点会带有待识别文本的上、下文语义特征。如果h11是由“高”、“小”这个两个字的词向量运算得来的,h21是由“红”这个字的词向量得来的,虽然“高”、“小”、“红”这三个字单独拆分出来不是命名实体,但是根据“高小红”这个三个字的上、下文语义判断“高小红”是命名实体,因此,隐藏节点h11、h21可以被赋予较高一些的注意力权重。
再例如,“故”“宫”这两个字单独拆分出来都不是命名实体。但是根据上、下文语义“故宫”合在一起是命名实体,隐藏节点h31由“故”的词向量运算得到,隐藏节点h41由“宫”的词向量运算得到,因此,隐藏节点h31、h41也可以被赋予较高一些的注意力权重。
步骤203,将被赋予了注意力权重的初始层输出的隐藏节点输入第一层隐藏层,第一层隐藏层经运算后输出隐藏节点。
步骤204,对第一层隐藏层输出的各隐藏节点,根据待识别文本的上、下文语义赋予注意力权重。
虽然第一层隐藏层的运算的隐藏节点不是待识别文本的词向量,但是输入第一隐藏层的隐藏节点h11、h21、h31……hn1也是带有待识别文本的上、下文语义信息的特征向量。因此,同理,输入各隐藏层的隐藏节点均可以根据待识别文本的上、下文语义确定 各隐藏节点的注意力权重。
“高小红在故宫博物馆看到了明朝的瓷器”这句话在命名实体识别运算过程中,如果初始层输出的隐藏节点的长度大于第一层隐藏层的长度阈值,则与“在”、“看”“到”“了”“的”这些字有关的隐藏节点可以被赋予较低的注意力权重,这样隐藏层的运算资源可以更多的来对比较可能是命名实体的一些词进行运算。
步骤205,将被赋予了注意力权重的第一层隐藏层输出的隐藏节点输入第二层隐藏层,第二层隐藏层经运算后输出待识别文本的识别结果。
上述实施例,仅仅列举了命名实体识别模型有三层运算层的情况,当然,命名实体识别模型的运算层数也可以是2层、4层、5层、6层……,具体层数可以根据实际需要设置,但是命名实体识别模型对待识别文本的进行命名实体的识别方法与上述实施例相似,都可以包括:对各隐藏层的各待输入的隐藏节点赋予注意力权重之后,再将被赋予了注意力权重的隐藏节点输入相应的隐藏层进行运算。
进一步的,对隐藏层输入的隐藏节点赋予注意力权重,可以是根据上、下文语义判断那些更可能是命名实体,对可能是命名实体的输入向量赋予更高的权重,也就是说在对命名实体进行识别过程中可以将上、下文语义作为了一个辅助判断条件。
图3为本申请基于注意力机制的命名实体识别方法再一个实施例的流程图,如图3所示,本申请图1所示实施例中,步骤102之前,还可以包括:
步骤301,获取训练文本,并对训练文本进行分词。
步骤302,对进行分词后的训练文本中的命名实体进行标注。
具体地,对进行分词后的训练文本中的命名实体进行标注可以为:对上述训练文本的分词是否属于命名实体、上述训练文本的分词在其所属命名实体中的位置和/或上述训练文本的分词所属命名实体的类型进行标注。
在具体实现时,可以采用BIO标注和/或IOBES标注的方式对训练文本中的命名实体进行标注。
举例来说,命名实体识别模型是Bi-LSTM模型,对训练文本可以按照IOBES(Inside、Other、Begin、End、Single)的方式进行标注。如果对一个分词是一个单独的实体,则标记为(tag S-…);如果一个分词是一个实体开始,则标记为(tag B-…);如 果一个分词是一个实体中间词汇,则标记为(tag I-…);如果一个分词是一个实体的结束,则标记为(tag E-…);如果一个分词不是一个实体,则标记为(tag O)。人名(PER)、地名(LOC)和机构名(ORG)为例,“王明出生在北京,现在在中国河北省唐山市创利工作。”标注的结果为:王(B-PER)、明(E-PER)、出(O)、生(O)、在(O)、北(B-LOC)、京(S-LOC),(O)、现(O)、在(O)、在(O)、河(B-LOC)、北(I-LOC)、省(E-LOC)、唐(B-LOC)、山(I-LOC)、市(E-LOC)、创(B-ORG)、利(E-ORG)、工(O)、作(O)。(O)。
再例如,命名实体识别模型是Bi-LSTM+CRF模型,对训练文本可以按照BIO的方式进行标注,即B-PER、I-PER代表人名首字、人名非首字,B-LOC、I-LOC代表地名首字、地名非首字,B-ORG、I-ORG代表组织机构名首字、组织机构名非首字,O代表该字不属于命名实体的一部分。“高小明帮助中国队获胜”的标注的结果为:高(B-PER)、小(I-PER)、明(I-PER)、帮(O)、助(O)、中(B-ORG)、国(I-ORG)、队(I-ORG)、获(O)、胜(O)。
步骤303,将训练文本的分词映射为向量,得到训练文本的词向量。
其中,将训练文本分离出来的每一个字、字符通过查找分词向量映射表得到对应的词向量。这里的分词向量映射表是预先存储或加载的分词向量映射表。
步骤304,将训练文本的词向量输入待训练的命名实体识别模型进行逐层运算,以对待训练的命名实体模型进行训练。
其中,具体地,步骤304的具体实施方式可以和上述的命名实体识别模型对待识别文本的识别过程是相同的,不同之处在于,这里的待训练命名实体识别模型是没有经过训练的,因此待训练的命名实体模型输出的训练文本的命名实体的识别结果与步骤302中标注的命名实体之间可能存在误差。
本实施例中,待训练的命名实体识别模型进行逐层运算可以是采用以下算法之一或组合:Bi-LSTM、CRF和CNN。对待训练的命名实体模型进行训练,也就是对待训练的命名实体识别模型逐层运算的参数以及各隐藏层的隐藏节点被赋予的注意力权重进行训练。
图4为本申请基于注意力机制的命名实体识别方法再一个实施例的流程图,如图4所示,本申请图3所示实施例中,步骤304之 后,还可以包括:
步骤401,在本次训练过程结束之后,获得待训练的命名实体模型输出的训练文本的命名实体识别结果。
步骤402,将训练文本的命名实体识别结果与训练文本中标注的命名实体进行对比。
具体地,比对方式可以是,根据训练文本的命名实体识别结果和训练文本的词向量,构造反映训练文本的命名实体识别结果准确度的损失函数。构造的损失函数可以是命名实体识别结果和训练文本的词向量的平方差。
步骤403,根据对比结果,调整下次训练过程中赋予词向量的注意力权重。
具体地,可以利用梯度下降算法求解损失函数的最小值,梯度下降算法可以利用负梯度方向来决定每次迭代的损失函数的参数调整方向,因此,可以得到待训练命名实体识别模型进行训练文本的词向量逐层运算的参数以及各隐藏层的隐藏节点被赋予的注意力权重的调整方向。损失函数的逐步减小意味着待训练命名实体识别模型进行训练文本的词向量逐层运算的参数以及各隐藏层的隐藏节点被赋予的注意力权重越来越精确。
步骤404,如果训练文本的命名实体识别结果与训练文本中标注的命名实体的误差小于预定的误差阈值,获得训练好的命名实体识别模型。
其中,上述预定的误差阈值可以在具体实现时,根据系统性能和/或实现需求等自行设定,本实施例对上述预定的误差阈值的大小不作限定。
图5为本申请基于注意力机制的命名实体识别装置一个实施例的结构示意图,本实施例提供的基于注意力机制的命名实体识别装置可以实现本申请提供的基于注意力机制的命名实体识别方法。如图5所示,上述基于注意力机制的命名实体识别装置可以包括:分词模块51、映射模块52和识别模块53;
其中,分词模块51,用于对待识别文本进行分词;其中,待识别文本可以是一句话,这句话里可以包括字以及标点符号。分词模块51对待识别文本进行分词可以是将待识别文本这句话中的每一个字、标点符号都分离出来。例如,“中国女排赢得了小组赛第一,并且进入了决赛。”对这句话分词的结果可以是:“/中/国/女/排/赢/得/了/小/组/赛/第/一/,/并/且/进/入/了/决/赛/。/”
映射模块52,用于将分词模块51获得的上述待识别文本的分词映射为向量,得到上述待识别文本的词向量;具体地,映射模块52将待识别文本的分词映射为向量,可以是将待识别文本中分离出来的每一个字、标点符号通过查找分词向量映射表得到对应的词向量。这里的分词向量映射表可以是预先存储或加载的分词向量映射表。
识别模块53,用于将映射模块52得到的上述待识别文本的词向量赋予注意力权重,并将赋予注意力权重的词向量输入命名实体识别模型进行逐层运算,获得上述待识别文本的命名实体识别结果;其中,上述命名实体识别模型包括至少两层隐藏层,通过上述命名实体识别模型进行逐层运算时,将上一层隐藏层输出的隐藏节点输入下一层隐藏层。
本实施例中,命名实体识别模型对输入的词向量进行逐层运算的方式可以是采用以下算法之一或组合:Bi-LSTM、CRF和CNN。
上述基于注意力机制的命名实体识别装置中,分词模块51对待识别文本进行分词之后,映射模块52将上述待识别文本的分词映射为向量,得到上述待识别文本的词向量,然后识别模块53将上述待识别文本的词向量赋予注意力权重,并将赋予注意力权重的词向量输入命名实体识别模型进行逐层运算,获得上述待识别文本的命名实体识别结果;其中,上述命名实体识别模型包括至少两层隐藏层,通过上述命名实体识别模型进行逐层运算时,将上一层隐藏层输出的隐藏节点输入下一层隐藏层,由于各隐藏层输入的隐藏节点均被赋予了注意力权重,各隐藏层根据隐藏节点的注意力权重,对隐藏节点进行运算,可以实现通过注意力机制对命名实体进行识别,提高命名实体的识别准确率,进而可以避免由于隐藏层节点的长度超出隐藏层的长度阈值,而造成的隐藏节点的损失。
图6为本申请基于注意力机制的命名实体识别装置另一个实施例的结构示意图,与图5所示的基于注意力机制的命名实体识别装置相比,不同之处在于,图6所示的基于注意力机制的命名实体识别装置还可以包括:获取模块54;
其中,获取模块54,用于在识别模块53将上述待识别文本的词向量赋予注意力权重之前,根据上述待识别文本的上下文语义,获取上述待识别文本的词向量的注意力权重。
具体地,待识别文本的各词向量被输入命名实体识别模型时,待识别文本的各词向量的注意力权重可以是相同或不同的。在命名 实体识别模型对待识别文本的词向量进行逐层运算过程中,根据待识别文本的上、下文语义,各隐藏层输入的各隐藏节点可以被赋予相同或不同的注意力权重。本实施例对此不作限定。
进一步地,上述基于注意力机制的命名实体识别装置还可以包括:标注模块55和训练模块56;
分词模块51,还用于在识别模块53将上述待识别文本的词向量赋予注意力权重,并将赋予注意力权重的词向量输入命名实体识别模型进行逐层运算之前,获取训练文本,并对上述训练文本进行分词;
标注模块55,用于对分词模块51进行分词后的训练文本中的命名实体进行标注;本实施例中,标注模块55,具体用于对训练文本的分词是否属于命名实体、训练文本的分词在其所属命名实体中的位置和/或训练文本的分词所属命名实体的类型进行标注。
在具体实现时,标注模块55可以采用BIO标注和/或IOBES标注的方式对训练文本中的命名实体进行标注。
举例来说,命名实体识别模型是Bi-LSTM模型,对训练文本可以按照IOBES(Inside、Other、Begin、End、Single)的方式进行标注。如果对一个分词是一个单独的实体,则标记为(tag S-…);如果一个分词是一个实体开始,则标记为(tag B-…);如果一个分词是一个实体中间词汇,则标记为(tag I-…);如果一个分词是一个实体的结束,则标记为(tag E-…);如果一个分词不是一个实体,则标记为(tag O)。人名(PER)、地名(LOC)和机构名(ORG)为例,“王明出生在北京,现在在中国河北省唐山市创利工作。”标注的结果为:王(B-PER)、明(E-PER)、出(O)、生(O)、在(O)、北(B-LOC)、京(S-LOC),(O)、现(O)、在(O)、在(O)、河(B-LOC)、北(I-LOC)、省(E-LOC)、唐(B-LOC)、山(I-LOC)、市(E-LOC)、创(B-ORG)、利(E-ORG)、工(O)、作(O)。(O)。
再例如,命名实体识别模型是Bi-LSTM+CRF模型,对训练文本可以按照BIO的方式进行标注,即B-PER、I-PER代表人名首字、人名非首字,B-LOC、I-LOC代表地名首字、地名非首字,B-ORG、I-ORG代表组织机构名首字、组织机构名非首字,O代表该字不属于命名实体的一部分。“高小明帮助中国队获胜”的标注的结果为:高(B-PER)、小(I-PER)、明(I-PER)、帮(O)、助(O)、中(B-ORG)、国(I-ORG)、队(I-ORG)、获(O)、胜 (O)。
映射模块52,还用于将上述训练文本的分词映射为向量,得到上述训练文本的词向量;其中,映射模块52可以将训练文本分离出来的每一个字、字符通过查找分词向量映射表得到对应的词向量。这里的分词向量映射表是预先存储或加载的分词向量映射表。
训练模块56,用于将映射模块52得到的上述训练文本的词向量输入待训练的命名实体识别模型进行逐层运算,以对上述待训练的命名实体模型进行训练。
具体地,训练模块56对上述待训练的命名实体模型进行训练之后,还可以在本次训练过程结束之后,获得上述待训练的命名实体模型输出的训练文本的命名实体识别结果;将上述训练文本的命名实体识别结果与上述训练文本中标注的命名实体进行对比;根据对比结果,调整下次训练过程中赋予词向量的注意力权重;如果训练文本的命名实体识别结果与上述训练文本中标注的命名实体的误差小于预定的误差阈值,获得训练好的命名实体识别模型。其中,上述预定的误差阈值可以在具体实现时,根据系统性能和/或实现需求等自行设定,本实施例对上述预定的误差阈值的大小不作限定。
图7为本申请计算机设备一个实施例的结构示意图,上述计算机设备可以包括存储器、处理器及存储在上述存储器上并可在上述处理器上运行的计算机程序,上述处理器执行上述计算机程序时,可以实现本申请实施例提供的基于注意力机制的命名实体识别方法。
图7示出了适于用来实现本申请实施方式的示例性计算机设备12的框图。图7显示的计算机设备12仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图7所示,计算机设备12以通用计算设备的形式表现。计算机设备12的组件可以包括但不限于:一个或者多个处理器或者处理单元16,系统存储器28,连接不同系统组件(包括系统存储器28和处理单元16)的总线18。
总线18表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。计算机设备12典型地包括多种计算机系统可读介质。系统存储器28可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(Random  Access Memory;以下简称:RAM)30和/或高速缓存存储器32。计算机设备12可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统34可以用于读写不可移动的、非易失性磁介质(图7未显示,通常称为“硬盘驱动器”)。具有一组(至少一个)程序模块42的程序/实用工具40,可以存储在例如存储器28中,这样的程序模块42包括——但不限于——操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本申请所描述的实施例中的功能和/或方法。计算机设备12也可以与一个或多个外部设备14(例如键盘、指向设备、显示器24等)通信,还可与一个或者多个使得用户能与该计算机设备12交互的设备通信,和/或与使得该计算机设备12能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口22进行。并且,计算机设备12还可以通过网络适配器20与一个或者多个网络(例如局域网(Local Area Network;以下简称:LAN),广域网(Wide Area Network;以下简称:WAN)和/或公共网络,例如因特网)通信。如图7所示,网络适配器20通过总线18与计算机设备12的其它模块通信。处理单元16通过运行存储在系统存储器28中的程序,从而执行各种功能应用以及数据处理,例如实现本申请实施例提供的基于注意力机制的命名实体识别方法。
本申请实施例还提供一种计算机非易失性可读存储介质,其上存储有计算机程序,上述计算机程序被处理器执行时可以实现本申请实施例提供的基于注意力机制的命名实体识别方法。
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。

Claims (20)

  1. 一种基于注意力机制的命名实体识别方法,其特征在于,包括:
    对待识别文本进行分词,并将所述待识别文本的分词映射为向量,得到所述待识别文本的词向量;
    将所述待识别文本的词向量赋予注意力权重,并将赋予注意力权重的词向量输入命名实体识别模型进行逐层运算,获得所述待识别文本的命名实体识别结果;其中,所述命名实体识别模型包括至少两层隐藏层,通过所述命名实体识别模型进行逐层运算时,将上一层隐藏层输出的隐藏节点输入下一层隐藏层。
  2. 根据权利要求1所述的方法,其特征在于,所述将所述待识别文本的词向量赋予注意力权重之前,还包括:
    根据所述待识别文本的上下文语义,获取所述待识别文本的词向量的注意力权重。
  3. 根据权利要求1所述的方法,其特征在于,所述将所述待识别文本的词向量赋予注意力权重,并将赋予注意力权重的词向量输入命名实体识别模型进行逐层运算之前,还包括:
    获取训练文本,并对所述训练文本进行分词;
    对进行分词后的训练文本中的命名实体进行标注;
    将所述训练文本的分词映射为向量,得到所述训练文本的词向量;
    将所述训练文本的词向量输入待训练的命名实体识别模型进行逐层运算,以对所述待训练的命名实体模型进行训练。
  4. 根据权利要求3所述的方法,其特征在于,所述将所述训练文本的词向量输入待训练的命名实体识别模型进行逐层运算,以对所述待训练的命名实体模型进行训练之后,还包括:
    在本次训练过程结束之后,获得所述待训练的命名实体模型输出的训练文本的命名实体识别结果;
    将所述训练文本的命名实体识别结果与所述训练文本中标注的命名实体进行对比;
    根据对比结果,调整下次训练过程中赋予词向量的注意力权重;
    如果训练文本的命名实体识别结果与所述训练文本中标注的命名实体的误差小于预定的误差阈值,获得训练好的命名实体识别模型。
  5. 根据权利要求3所述的方法,其特征在于,所述对进行分词后的训练文本中的命名实体进行标注包括:
    对所述训练文本的分词是否属于命名实体、所述训练文本的分词在其所属命名实体中的位置和/或所述训练文本的分词所属命名实体的类型进 行标注。
  6. 一种基于注意力机制的命名实体识别装置,其特征在于,包括:
    分词模块,用于对待识别文本进行分词;
    映射模块,用于将所述分词模块获得的所述待识别文本的分词映射为向量,得到所述待识别文本的词向量;
    识别模块,用于将所述映射模块得到的所述待识别文本的词向量赋予注意力权重,并将赋予注意力权重的词向量输入命名实体识别模型进行逐层运算,获得所述待识别文本的命名实体识别结果;其中,所述命名实体识别模型包括至少两层隐藏层,通过所述命名实体识别模型进行逐层运算时,将上一层隐藏层输出的隐藏节点输入下一层隐藏层。
  7. 根据权利要求6所述的装置,其特征在于,还包括:
    获取模块,用于在所述识别模块将所述待识别文本的词向量赋予注意力权重之前,根据所述待识别文本的上下文语义,获取所述待识别文本的词向量的注意力权重。
  8. 根据权利要求6所述的装置,其特征在于,还包括:标注模块和训练模块;
    所述分词模块,还用于在所述识别模块将所述待识别文本的词向量赋予注意力权重,并将赋予注意力权重的词向量输入命名实体识别模型进行逐层运算之前,获取训练文本,并对所述训练文本进行分词;
    所述标注模块,用于对所述分词模块进行分词后的训练文本中的命名实体进行标注;
    所述映射模块,还用于将所述训练文本的分词映射为向量,得到所述训练文本的词向量;
    所述训练模块,用于将所述映射模块得到的所述训练文本的词向量输入待训练的命名实体识别模型进行逐层运算,以对所述待训练的命名实体模型进行训练。
  9. 根据权利要求6所述的装置,其特征在于,
    所述训练模块还用于获得上述待训练的命名实体模型输出的训练文本的命名实体识别结果;将上述训练文本的命名实体识别结果与上述训练文本中标注的命名实体进行对比;根据对比结果,调整下次训练过程中赋予词向量的注意力权重;如果训练文本的命名实体识别结果与上述训练文本中标注的命名实体的误差小于预定的误差阈值,获得训练好的命名实体识别模型。
  10. 根据权利要求8所述的装置,其特征在于,
    所述标注模块具体用于对所述训练文本的分词是否属于命名实体、 所述训练文本的分词在其所属命名实体中的位置和/或所述训练文本的分词所属命名实体的类型进行标注。
  11. 一种计算机设备,其特征在于,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时,实现以下步骤:
    对待识别文本进行分词,并将所述待识别文本的分词映射为向量,得到所述待识别文本的词向量;
    将所述待识别文本的词向量赋予注意力权重,并将赋予注意力权重的词向量输入命名实体识别模型进行逐层运算,获得所述待识别文本的命名实体识别结果;其中,所述命名实体识别模型包括至少两层隐藏层,通过所述命名实体识别模型进行逐层运算时,将上一层隐藏层输出的隐藏节点输入下一层隐藏层。
  12. 根据权利要求11所述的计算机设备,其特征在于,所述处理器执行所述计算机程序时,还实现以下步骤:
    根据所述待识别文本的上下文语义,获取所述待识别文本的词向量的注意力权重。
  13. 根据权利要求11所述的计算机设备,其特征在于,所述处理器执行所述计算机程序时,还实现以下步骤:
    获取训练文本,并对所述训练文本进行分词;
    对进行分词后的训练文本中的命名实体进行标注;
    将所述训练文本的分词映射为向量,得到所述训练文本的词向量;
    将所述训练文本的词向量输入待训练的命名实体识别模型进行逐层运算,以对所述待训练的命名实体模型进行训练。
  14. 根据权利要求13所述的计算机设备,其特征在于,所述处理器执行所述计算机程序时,还实现以下步骤:
    在本次训练过程结束之后,获得所述待训练的命名实体模型输出的训练文本的命名实体识别结果;
    将所述训练文本的命名实体识别结果与所述训练文本中标注的命名实体进行对比;
    根据对比结果,调整下次训练过程中赋予词向量的注意力权重;
    如果训练文本的命名实体识别结果与所述训练文本中标注的命名实体的误差小于预定的误差阈值,获得训练好的命名实体识别模型。
  15. 根据权利要求13所述的计算机设备,其特征在于,所述处理器执行所述计算机程序时,还实现以下步骤:
    对所述训练文本的分词是否属于命名实体、所述训练文本的分词在其 所属命名实体中的位置和/或所述训练文本的分词所属命名实体的类型进行标注。
  16. 一种计算机非易失性可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现以下步骤:
    对待识别文本进行分词,并将所述待识别文本的分词映射为向量,得到所述待识别文本的词向量;
    将所述待识别文本的词向量赋予注意力权重,并将赋予注意力权重的词向量输入命名实体识别模型进行逐层运算,获得所述待识别文本的命名实体识别结果;其中,所述命名实体识别模型包括至少两层隐藏层,通过所述命名实体识别模型进行逐层运算时,将上一层隐藏层输出的隐藏节点输入下一层隐藏层。
  17. 根据权利要求16所述的计算机非易失性可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时还实现以下步骤:
    根据所述待识别文本的上下文语义,获取所述待识别文本的词向量的注意力权重。
  18. 根据权利要求16所述的计算机非易失性可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时还实现以下步骤:
    获取训练文本,并对所述训练文本进行分词;
    对进行分词后的训练文本中的命名实体进行标注;
    将所述训练文本的分词映射为向量,得到所述训练文本的词向量;
    将所述训练文本的词向量输入待训练的命名实体识别模型进行逐层运算,以对所述待训练的命名实体模型进行训练。
  19. 根据权利要求18所述的计算机非易失性可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时还实现以下步骤:
    在本次训练过程结束之后,获得所述待训练的命名实体模型输出的训练文本的命名实体识别结果;
    将所述训练文本的命名实体识别结果与所述训练文本中标注的命名实体进行对比;
    根据对比结果,调整下次训练过程中赋予词向量的注意力权重;
    如果训练文本的命名实体识别结果与所述训练文本中标注的命名实体的误差小于预定的误差阈值,获得训练好的命名实体识别模型。
  20. 根据权利要求18所述的计算机非易失性可读存储介质,其上存 储有计算机程序,其特征在于,所述计算机程序被处理器执行时还实现以下步骤:
    对所述训练文本的分词是否属于命名实体、所述训练文本的分词在其所属命名实体中的位置和/或所述训练文本的分词所属命名实体的类型进行标注。
PCT/CN2019/091305 2019-01-07 2019-06-14 基于注意力机制的命名实体识别方法、装置和计算机设备 WO2020143163A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910012152.6 2019-01-07
CN201910012152.6A CN109885825A (zh) 2019-01-07 2019-01-07 基于注意力机制的命名实体识别方法、装置和计算机设备

Publications (1)

Publication Number Publication Date
WO2020143163A1 true WO2020143163A1 (zh) 2020-07-16

Family

ID=66925613

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/091305 WO2020143163A1 (zh) 2019-01-07 2019-06-14 基于注意力机制的命名实体识别方法、装置和计算机设备

Country Status (2)

Country Link
CN (1) CN109885825A (zh)
WO (1) WO2020143163A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022127124A1 (zh) * 2020-12-15 2022-06-23 深圳壹账通智能科技有限公司 基于元学习的实体类别识别方法、装置、设备和存储介质

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110298043B (zh) * 2019-07-03 2023-04-07 吉林大学 一种车辆命名实体识别方法及系统
CN110704633B (zh) * 2019-09-04 2023-07-21 平安科技(深圳)有限公司 命名实体识别方法、装置、计算机设备及存储介质
CN110750992B (zh) * 2019-10-09 2023-07-04 吉林大学 命名实体识别方法、装置、电子设备及介质
CN110825875B (zh) * 2019-11-01 2022-12-06 科大讯飞股份有限公司 文本实体类型识别方法、装置、电子设备和存储介质
CN111145914B (zh) * 2019-12-30 2023-08-04 四川大学华西医院 一种确定肺癌临床病种库文本实体的方法及装置
CN111325033B (zh) * 2020-03-20 2023-07-11 中国建设银行股份有限公司 实体识别方法、装置、电子设备及计算机可读存储介质
CN112749561B (zh) * 2020-04-17 2023-11-03 腾讯科技(深圳)有限公司 一种实体识别方法及设备
CN111597816A (zh) * 2020-05-22 2020-08-28 北京慧闻科技(集团)有限公司 一种自注意力命名实体识别方法、装置、设备及存储介质
CN112699684A (zh) * 2020-12-30 2021-04-23 北京明朝万达科技股份有限公司 命名实体识别方法和装置、计算机可读存储介质及处理器
CN112733540A (zh) * 2020-12-31 2021-04-30 三维通信股份有限公司 生物医学命名实体的检测方法、装置、计算机设备和介质
CN113743121B (zh) * 2021-09-08 2023-11-21 平安科技(深圳)有限公司 长文本实体关系抽取方法、装置、计算机设备及存储介质
CN113987173A (zh) * 2021-10-22 2022-01-28 北京明略软件系统有限公司 短文本分类方法、系统、电子设备及介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108388559A (zh) * 2018-02-26 2018-08-10 中译语通科技股份有限公司 地理空间应用下的命名实体识别方法及系统、计算机程序
CN108536679A (zh) * 2018-04-13 2018-09-14 腾讯科技(成都)有限公司 命名实体识别方法、装置、设备及计算机可读存储介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682220A (zh) * 2017-01-04 2017-05-17 华南理工大学 一种基于深度学习的在线中医文本命名实体识别方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108388559A (zh) * 2018-02-26 2018-08-10 中译语通科技股份有限公司 地理空间应用下的命名实体识别方法及系统、计算机程序
CN108536679A (zh) * 2018-04-13 2018-09-14 腾讯科技(成都)有限公司 命名实体识别方法、装置、设备及计算机可读存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022127124A1 (zh) * 2020-12-15 2022-06-23 深圳壹账通智能科技有限公司 基于元学习的实体类别识别方法、装置、设备和存储介质

Also Published As

Publication number Publication date
CN109885825A (zh) 2019-06-14

Similar Documents

Publication Publication Date Title
WO2020143163A1 (zh) 基于注意力机制的命名实体识别方法、装置和计算机设备
WO2019085779A1 (zh) 机器处理及文本纠错方法和装置、计算设备以及存储介质
CN111931506B (zh) 一种基于图信息增强的实体关系抽取方法
CN104636466B (zh) 一种面向开放网页的实体属性抽取方法和系统
CN108763510A (zh) 意图识别方法、装置、设备及存储介质
CN108932226A (zh) 一种对无标点文本添加标点符号的方法
WO2021190259A1 (zh) 一种槽位识别方法及电子设备
CN112069826B (zh) 融合主题模型和卷积神经网络的垂直域实体消歧方法
CN109284400A (zh) 一种基于Lattice LSTM和语言模型的命名实体识别方法
CN105068997B (zh) 平行语料的构建方法及装置
CN113255320A (zh) 基于句法树和图注意力机制的实体关系抽取方法及装置
CN112364623A (zh) 基于Bi-LSTM-CRF的三位一体字标注汉语词法分析方法
CN113282701B (zh) 作文素材生成方法、装置、电子设备及可读存储介质
CN106980620A (zh) 一种对中文字串进行匹配的方法及装置
CN113590784A (zh) 三元组信息抽取方法、装置、电子设备、及存储介质
CN108959630A (zh) 一种面向英文无结构文本的人物属性抽取方法
US20230004798A1 (en) Intent recognition model training and intent recognition method and apparatus
WO2022242074A1 (zh) 一种多特征融合的中文医疗文本命名实体识别方法
CN111553157A (zh) 一种基于实体替换的对话意图识别方法
CN111328416B (zh) 用于自然语言处理中的模糊匹配的语音模式
TWI659411B (zh) 一種多語言混合語音識別方法
CN112686040B (zh) 一种基于图循环神经网络的事件事实性检测方法
CN117290515A (zh) 文本标注模型的训练方法、文生图方法及装置
WO2023130688A1 (zh) 一种自然语言处理方法、装置、设备及可读存储介质
WO2023137903A1 (zh) 基于粗糙语义的回复语句确定方法、装置及电子设备

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19908224

Country of ref document: EP

Kind code of ref document: A1