CN111291565A - Method and device for named entity recognition - Google Patents

Method and device for named entity recognition Download PDF

Info

Publication number
CN111291565A
CN111291565A CN202010054650.XA CN202010054650A CN111291565A CN 111291565 A CN111291565 A CN 111291565A CN 202010054650 A CN202010054650 A CN 202010054650A CN 111291565 A CN111291565 A CN 111291565A
Authority
CN
China
Prior art keywords
word
vector
named entity
value
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010054650.XA
Other languages
Chinese (zh)
Inventor
宋彦
田元贺
王咏刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Innovation Workshop (guangzhou) Artificial Intelligence Research Co Ltd
Original Assignee
Innovation Workshop (guangzhou) Artificial Intelligence Research Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innovation Workshop (guangzhou) Artificial Intelligence Research Co Ltd filed Critical Innovation Workshop (guangzhou) Artificial Intelligence Research Co Ltd
Priority to CN202010054650.XA priority Critical patent/CN111291565A/en
Publication of CN111291565A publication Critical patent/CN111291565A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Machine Translation (AREA)
  • Character Discrimination (AREA)

Abstract

The invention aims to provide a named entity identification method and a named entity identification device. Acquiring context characteristics of each word and corresponding syntactic knowledge thereof according to the input word sequence; for each word, mapping each context feature and corresponding syntactic knowledge of the word into a key vector and a corresponding value vector respectively; determining a weighted sum vector of all value vectors of each word; and conducting named entity prediction on the word vector of each word in the input word sequence and the vector formed by connecting the weighted sum vector in series so as to obtain a corresponding recognition result, wherein the recognition result indicates the named entity label of each word. Compared with the prior art, the method introduces weighted syntactic knowledge into a general deep learning named entity recognition system based on sequence labeling. Therefore, the context characteristics can be effectively utilized in a deep learning framework to weight the corresponding syntactic knowledge, and the performance of the named entity recognition system is further improved.

Description

Method and device for named entity recognition
Technical Field
The invention relates to the technical field of Natural Language Processing (NLP), in particular to a Named Entity Recognition (NER) technology.
Background
Named entities refer to person names, organization names, place names, and all other entities identified by name. For example, "zhang san" is a person name, "beijing" is a place name, etc. In scientific text, common named entities are also disease names (e.g., "congenital heart disease"), terms of expertise ("simple harmonic vibration"), and so on.
Named entity recognition refers to a natural language processing task for recognizing a named entity in an input word sequence. For example, for an input word sequence (word-by-word separated by "/") zhang/suffering/congenital/heart disease, the task of named entity recognition is to identify the named entity therein, i.e., the name "zhang" and the name of the disease "congenital heart disease".
And the named entity labeler assigns a label to each word in the input word sequence so as to represent the result of the named entity recognition. Currently the mainstream named entity tags share 3 classes: "B-X" ("X" denotes a tag for a named entity category, such as Disease name "Disease") means that the word is the first word of a named entity, "I-X" denotes that the word is a non-first word of a named entity, "E-X" denotes that the word is a last word of a named entity, and "0" denotes that the word is not a component of a named entity. For example, the named entity tags for each word in the word sequence "Zhang San/suffers from/congenital/heart Disease" are "B-Person", "0", "B-Disease", "E-Disease" in that order.
Techniques for named entity recognition can be divided into feature-based traditional methods and deep learning methods.
The feature-based method is to extract features of an input word sequence by a method of manually designing and selecting the features, and judge the named entity label of the current word based on the features. Common features include current, preceding, succeeding, and the like. However, the effectiveness of this method is highly dependent on the quality of the artificially designed, extracted features, and it is very difficult to design a high-quality feature extraction method.
In addition, considering that scientific texts have the characteristics of formal language, normative expression and long sentences, the traditional method also tries to improve the performance of the named entity recognition system by using external syntactic knowledge acquired by an automatic method. The syntactic knowledge of noun phrases often implies that a named entity may exist in the phrase, for example, "congenital heart disease" is a noun phrase, and is itself a named entity. However, the conventional method utilizes external syntactic knowledge to train the model by considering the syntactic knowledge as a correct reference (gold reference), so that the wrong knowledge generated by the performance problem of the external automatic tool will have negative influence on the conventional method based on the characteristics.
In recent years, a deep learning method is gradually applied to named entity recognition, and can automatically extract text features according to the characteristics of specific tasks, so that the huge cost of manual design and feature extraction is avoided. The recognition effect of deep learning far exceeds that of a simple traditional method.
Disclosure of Invention
The invention aims to provide a named entity identification method, a named entity identification device, a computer readable storage medium and a computer program product.
According to an aspect of the present invention, there is provided a named entity recognition method, wherein the method comprises the steps of:
acquiring context characteristics of each word and corresponding syntactic knowledge thereof according to the input word sequence;
for each word, mapping each context feature and corresponding syntactic knowledge of the word into a key vector and a corresponding value vector respectively;
determining a weighted sum vector of all value vectors of each word, wherein each value vector is weighted according to the word vector of the word and a key vector corresponding to the value vector;
and conducting named entity prediction on the word vector of each word in the input word sequence and the vector formed by connecting the weighted sum vector in series so as to obtain a corresponding recognition result, wherein the recognition result indicates the named entity label of each word.
According to an aspect of the present invention, there is also provided a named entity recognition method, wherein the named entity recognition model includes an input embedding layer, a context information encoding layer, a key-value memory neural network layer, and a decoding output layer;
wherein, the method comprises the following steps:
acquiring context characteristics of each word and corresponding syntactic knowledge thereof according to the input word sequence;
inputting the context feature of each word and the corresponding syntactic knowledge thereof to the key-value memory neural network layer so as to map each context feature of each word and the corresponding syntactic knowledge thereof into a key vector and a corresponding value vector respectively;
inputting a word vector output by each word in the input word sequence through the input embedding layer and the context information coding layer into the key-value memory neural network layer to obtain a weighted sum vector of all value vectors of each word, wherein each value vector is weighted according to the word vector of the word and a key vector corresponding to the value vector;
and inputting the word vector of each word in the input word sequence and the vector formed by connecting the weighted sum vector in series into the decoding output layer to obtain a corresponding recognition result, wherein the recognition result indicates the named entity label of each word.
According to an aspect of the present invention, there is also provided a named entity recognition apparatus, wherein the apparatus includes:
the acquisition module is used for acquiring the context characteristics of each word and the corresponding syntactic knowledge thereof according to the input word sequence;
a mapping module for mapping each context feature and corresponding syntactic knowledge of each word into a key vector and a corresponding value vector, respectively;
a weighting module for determining a weighted sum vector of all value vectors of each word, wherein each value vector is weighted according to the word vector of the word and a key vector corresponding to the value vector;
and the prediction module is used for conducting named entity prediction on the word vector of each word in the input word sequence and the vector formed by connecting the weighted sum vector in series so as to obtain a corresponding recognition result, and the recognition result indicates the named entity label of each word.
According to an aspect of the present invention, there is also provided a named entity recognition apparatus, wherein the named entity recognition apparatus is coupled with a named entity recognition model, and the named entity recognition model includes an input embedding layer, a context information encoding layer, a key-value memory neural network layer, and a decoding output layer;
wherein, the device includes:
the acquisition module is used for acquiring the context characteristics of each word and the corresponding syntactic knowledge thereof according to the input word sequence;
a mapping module, configured to input the context feature of each word and the syntax knowledge corresponding to the context feature to the key-value memory neural network layer, so as to map each context feature of each word and the syntax knowledge corresponding to the context feature of each word into a key vector and a corresponding value vector, respectively;
a weighting module, configured to input a word vector output by each word in the input word sequence via the input embedding layer and the context information encoding layer to the key-value memory neural network layer, so as to obtain a weighted sum vector of all value vectors of each word, where each value vector is weighted according to the word vector of the word and a key vector corresponding to the value vector;
and the prediction module is used for inputting the word vector of each word in the input word sequence and the vector formed by connecting the weighted sum vector in series into the decoding output layer so as to obtain a corresponding recognition result, and the recognition result indicates the named entity label of each word.
According to an aspect of the present invention, there is also provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the named entity recognition method according to an aspect of the present invention when executing the computer program.
According to an aspect of the invention, there is also provided a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements a named entity recognition method according to an aspect of the invention.
According to an aspect of the invention, there is also provided a computer program product which, when executed by a computing device, implements a named entity recognition method according to an aspect of the invention.
Compared with the prior art, the method and the device have the advantages that the context characteristics and the syntactic knowledge related to each word in the input word sequence are obtained, the syntactic knowledge is weighted by mapping the context characteristics into key vectors, mapping the syntactic knowledge corresponding to the context characteristics into value vectors and converting between key values, and the weighted syntactic knowledge is introduced into a general deep learning named entity recognition system based on sequence labeling. Therefore, the context characteristics can be effectively utilized in a deep learning framework to weight the corresponding syntactic knowledge, and the performance of the named entity recognition system is further improved.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
FIG. 1 illustrates a framework diagram of an existing named entity recognition model;
FIG. 2 illustrates a flow diagram of a named entity recognition method according to one embodiment of the invention;
FIG. 3 illustrates a framework diagram of a named entity recognition model according to one example of the invention;
FIG. 4 illustrates a block diagram of a key-value mnemonic neural network layer according to an example of the present invention;
FIG. 5 illustrates a flow diagram for training a named entity recognition model, according to one embodiment of the invention;
fig. 6 shows a schematic diagram of a named entity recognition apparatus according to another embodiment of the present invention.
The same or similar reference numbers in the drawings identify the same or similar elements.
Detailed Description
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments of the present invention are described as an apparatus represented by a block diagram and a process or method represented by a flow diagram. Although a flowchart depicts a sequence of process steps in the present invention, many of the operations can be performed in parallel, concurrently, or simultaneously. In addition, the order of the operations may be re-arranged. The process of the present invention may be terminated when its operations are performed, but may include additional steps not shown in the flowchart. The processes of the present invention may correspond to methods, functions, procedures, subroutines, and the like.
The methods illustrated by the flow diagrams and apparatus illustrated by the block diagrams discussed below may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine or computer readable medium such as storage medium. The processor(s) may perform the necessary tasks.
Similarly, it will be further appreciated that any flow charts, flow diagrams, state transition diagrams, and the like represent various processes which may be substantially described as program code stored in computer readable media and so executed by a computing device or processor, whether or not such computing device or processor is explicitly shown.
As used herein, the term "storage medium" may refer to one or more devices for storing data, including Read Only Memory (ROM), Random Access Memory (RAM), magnetic RAM, kernel memory, magnetic disk storage media, optical storage media, flash memory devices, and/or other machine-readable media for storing information. The term "computer-readable medium" can include, but is not limited to portable or fixed storage devices, optical storage devices, and various other mediums capable of storing and/or containing instructions and/or data.
A code segment may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program descriptions. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, information passing, token passing, network transmission, etc.
The term "computer device" in this context refers to an electronic device that can perform predetermined processes such as numerical calculation and/or logic calculation by executing predetermined programs or instructions, and may at least include a processor and a memory, wherein the predetermined processes are performed by the processor executing program instructions prestored in the memory, or performed by hardware such as ASIC, FPGA, DSP, or implemented by a combination of the two.
The "computer device" is typically embodied in the form of a general-purpose computer device, and its components may include, but are not limited to: one or more processors or processing units, system memory. The system memory may include computer readable media in the form of volatile memory, such as Random Access Memory (RAM) and/or cache memory. The "computer device" may further include other removable/non-removable, volatile/nonvolatile computer-readable storage media. The memory may include at least one computer program product having a set (e.g., at least one) of program modules that are configured to perform the functions and/or methods of embodiments of the present invention. The processor executes various functional applications and data processing by executing programs stored in the memory.
For example, a computer program for executing the functions and processes of the present invention is stored in the memory, and the NER scheme of the present invention is implemented when the processor executes the corresponding computer program.
Typically, the computer devices include, for example, user equipment and network devices. Wherein the user equipment includes but is not limited to a Personal Computer (PC), a notebook computer, a mobile terminal, etc., and the mobile terminal includes but is not limited to a smart phone, a tablet computer, etc.; the network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of computers or network servers, wherein Cloud Computing is one of distributed Computing, a super virtual computer consisting of a collection of loosely coupled computers. Wherein the computer device can be operated alone to implement the invention, or can be accessed to a network and implement the invention through interoperation with other computer devices in the network. The network in which the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.
It should be noted that the user devices, network devices, networks, etc. are merely examples, and other existing or future computing devices or networks may be suitable for the present invention, and are included in the scope of the present invention and are incorporated by reference herein.
Specific structural and functional details disclosed herein are merely representative and are provided for purposes of describing example embodiments of the present invention. The present invention may, however, be embodied in many alternate forms and should not be construed as limited to only the embodiments set forth herein.
It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element may be termed a second element, and, similarly, a second element may be termed a first element, without departing from the scope of example embodiments. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be noted that, in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed substantially concurrently, or the figures may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
The present invention is described in further detail below with reference to the attached drawing figures.
Referring to FIG. 1, FIG. 1 illustrates a basic framework of a named entity recognition model based on deep learning. The named entity recognition model comprises 3 modules: an input embedding layer 101, a context information encoding layer 102, and a decoding output layer 103.
The purpose of the input embedding layer 101 is to map each word in the input word sequence into a word vector in a high-dimensional continuous space to represent the characteristics of this word. The word vectors are typically derived from language models that are pre-trained on large-scale unlabeled corpus. The implementation of the input embedding layer 101 is based on fixed word vector expression mapping (partitioned embedding), which correspondingly converts each word of the input word sequence into a word vector in a word vector library by using a word vector library acquired by other external methods.
The purpose of the context information coding layer 102 is to extract context information of each word based on the word vector and calculate the influence of the word vectors of other words on the context information. The input to this layer is the output of the input embedding layer (i.e., the word vector for each word in a sentence), and the output is a context-coded word vector for each word that is different from the input vector. There are three main ways to implement this layer: one is Convolutional Neural Network (CNN); the second is a Recurrent Neural Network (RNN), typically a Long Short term memory Network (LSTM). The former is characterized by fast operation speed, and the latter is characterized by more considered context information. And thirdly, the transformer structure directly takes text as input (namely, an embedded layer is not required to be input), and meanwhile, the capability of coding context information is strongest. The named entity recognition system based on the method achieves the best effect at present.
The decoding output layer 103 is used for decoding each word vector after context information extraction and outputting a predicted named entity tag. The implementation of this layer is mainly Softmax.
The general named entity identification process is as follows:
1. the input word sequence is input to the input embedding layer 101, and each word in the input word sequence is converted into an input word vector.
2. All word vectors corresponding to the converted word sequence are input to the context information coding layer 102, and the context information coding layer 102 outputs a context-coded word vector for each word in the word sequence.
3. The word vector output in the previous step is input to the decoding output layer 103, and the decoding output layer 103 outputs the predicted named entity tag.
4. Comparing the predicted label with the manually marked label, and calculating a target function; network parameters of the named entity recognition model are updated by optimizing the objective function.
5. Repeating the steps 1-4 until the expected effect is achieved.
Furthermore, external syntactic knowledge is also utilized in deep learning-based models, since the amount of labeled text in the scientific field is often insufficient to support sufficient training of deep learning models, and the effectiveness of introducing external syntactic knowledge on named entity recognition tasks in conventional methods has been demonstrated. The method for adding syntactic knowledge to the deep learning method based on sequence labeling generally comprises the steps of mapping the syntactic knowledge acquired by an automatic method into a syntactic knowledge vector of a high-dimensional continuous space at an input embedding layer, and directly connecting the syntactic knowledge vector and a word vector in series (concatenate). However, this method of directly concatenating the syntactic knowledge vector with the word vector does not take into account the differences in the contributions of different knowledge to the named entity tag of the word, and may cause inaccurate knowledge with little contribution or acquired by automatic methods to mislead the named recognition model, thereby predicting the wrong named entity tag. Thus, this inaccurate knowledge negatively impacts the named entity recognition system.
In order to model the weight of the syntactic knowledge according to the contribution of the external syntactic knowledge to named entity recognition and further effectively integrate the syntactic knowledge with large contribution into a deep learning system framework based on sequence labeling, the invention innovates between a context information coding layer and a decoding output layer and provides a module based on a key-value memory neural network. More specifically, in the invention, for each word in an input word sequence, a key-value memory neural network module extracts context characteristics and syntactic knowledge related to the word from context characteristics and syntactic knowledge acquired by an automatic method, weights the syntactic knowledge by mapping the context characteristics into key vectors, mapping the syntactic knowledge corresponding to the context characteristics into value vectors and converting between key values, and introduces the weighted syntactic knowledge into a general deep learning named entity recognition system based on sequence labeling. Therefore, the context characteristics can be effectively utilized in a deep learning framework to weight the corresponding syntactic knowledge, and the performance of the named entity recognition system is further improved.
FIG. 2 illustrates a method flow diagram that specifically illustrates a process for named entity identification, according to one embodiment of the invention.
Typically, the invention is implemented by a computer device. When a general-purpose computer device is configured with program modules to implement the present invention, it will be the specialized named entity recognition device rather than any general-purpose computer or processor. However, those skilled in the art will appreciate that the foregoing description is intended only to illustrate that the present invention may be applied to any general purpose computing device, which becomes a specific named entity recognition device, when the present invention is applied to a general purpose computing device.
As shown in fig. 2, in step S210, the named entity recognition device obtains context characteristics of each word and corresponding syntactic knowledge thereof according to the input word sequence; in step S220, the named entity recognition device maps each context feature and corresponding syntactic knowledge of each word into a key vector and a corresponding value vector, respectively, for each word; in step S230, the named entity recognition device determines a weighted sum vector of all value vectors of each word, wherein each value vector is weighted according to the word vector of the word and a key vector corresponding to the value vector; in step S240, the named entity recognition apparatus performs named entity prediction on the word vector of each word in the input word sequence and the vector obtained by concatenating the weighted sum vector to obtain a corresponding recognition result, where the recognition result indicates a named entity tag of each word.
Referring collectively to fig. 2 and 3, wherein fig. 3 illustrates a framework diagram of a named entity recognition model in accordance with an example of the present invention.
Specifically, in step S210, the named entity recognition device obtains the context feature of each word and its corresponding syntactic knowledge according to the input word sequence.
Here, the input word sequence is tagged
Figure BDA0002372388180000111
Wherein x isiIndicating a word in the sequence of input words, l indicating the length of the sequence of input words.
According to one example of the present invention, for a sequence of input words
Figure BDA0002372388180000112
The named entity recognition device obtains the context characteristics of the whole input word sequence and the corresponding syntactic knowledge through an automatic tool. The context features and their associated syntactic knowledge are two lists of the same length, one for each
Figure BDA0002372388180000113
And
Figure BDA0002372388180000114
is shown in which
Figure BDA0002372388180000115
The context feature of a certain position t, the corresponding knowledge of which is in
Figure BDA0002372388180000116
Again, the position in (a) is t.
The automatic tool can be any existing Chinese automatic analysis tool, such as Chinese processing tool (https:// stanfordlp. github. io/CoreNLP/index. html) released by Stanfordlp university, which includes multiple language analysis functions such as Chinese part-of-speech tagging, sentence component analysis, dependency syntax analysis, and the like. Taking the input word sequence "zhang san/has/congenital/heart disease" as an example, the chinese processing tool is applied to the input word sequence, so as to obtain various information of the word sequence, such as part-of-speech tagging, sentence component analysis, dependency syntactic analysis, etc., for example, part-of-speech information is "zhang san _ NR/has _ VV/congenital _ NN/heart disease _ NN", and sentence component information is (S (NP zhang san) (VP (has) (NP (congenital heart disease))) (where S is a syntax tree, S is a root node, which indicates a sentence, NP and VP are intermediate nodes, which respectively indicate a noun phrase and a verb phrase, and a chinese word is a root node.). For a word in an input word sequence, for example, "congenital", the "context feature" refers to word information of the word context, and the "syntactic knowledge" refers to part-of-speech information, constituent sentence information, and the like. It should be noted that although a variety of syntactic knowledge is available through the Chinese processing tool, the named entity recognition model of the present invention focuses on and utilizes only one of the syntactic knowledge, e.g., only part-of-speech information. Specifically, if the context characteristic is defined as one word before and after the word and the syntactic knowledge concerned is part-of-speech knowledge, the context characteristic of the word "congenital" is [ "congenital", "suffering from", "heart disease" ], and the corresponding syntactic knowledge is [ "congenital _ NN", "suffering from _ VV", "heart disease _ NN" ]. Since each context feature (e.g., "heart disease") has a knowledge instance (e.g., "heart disease _ NN") corresponding to it, the context feature and syntax knowledge occur in pairs.
In addition to the above-described automated tools, the methods of obtaining "contextual characteristics" and "syntactic knowledge" to which the present invention is applicable also include, but are not limited to, methods of manual tagging, querying dictionaries, knowledge bases, and the like.
For each word x in the input word sequenceiNamed entity recognition device from
Figure BDA0002372388180000121
And
Figure BDA0002372388180000122
to extract contextual features related to the word
Figure BDA0002372388180000123
And its corresponding syntax knowledge
Figure BDA0002372388180000124
For example, the named entity recognition device extracts the presence word xiContext characteristics and corresponding syntactic knowledge in the range of front and back 2 words are marked respectively
Figure BDA0002372388180000125
And
Figure BDA0002372388180000126
accordingly, the word xiOne contextual feature of (a) may be labeled as key ki,j(wherein j is 1,2, … mi) The context feature ki,jThe corresponding syntactic knowledge may be labeled as a value
Figure BDA0002372388180000127
In step S220, the named entity recognition device maps each of its contextual characteristics and corresponding syntactic knowledge to a key vector and a corresponding value vector, respectively, for each word in the sequence of input words.
Here, the mapping of the key vector and the mapping of the value vector are performed by corresponding embedding functions, respectively. The function of both embedding functions is to convert each instance into a vector representing the instance.
This is similar to the input embedding layer 301 converting each word in the input word sequence into a word vector through an embedding function.
For example, the input word sequence x ═ x1x2…xlIs input to the input embedding layer 301, via the word embedding function ExInputting each word x in the sequence of wordsiIs converted into an input word vector
Figure BDA0002372388180000128
Likewise, x for each word in the input word sequenceiA set of key-values (i.e. a context feature k)i,jAnd its corresponding syntax knowledge
Figure BDA0002372388180000129
) The named entity recognition device can embed a function E through a keykSum value embedding function EvRespectively mapped as feature embedded vectors (key vectors)
Figure BDA00023723881800001210
And knowledge embedding vector (value vector)
Figure BDA00023723881800001211
Thus, for each word xiThe named entity recognition device obtains all of itKey vector
Figure BDA00023723881800001212
And all value vectors
Figure BDA00023723881800001213
In step S230, the named entity recognition device determines a weighted sum vector of all value vectors of each word in the input word sequence, wherein each value vector is weighted according to the word vector of the word and the key vector corresponding to the value vector.
Each word x in the input word sequencei(i-1, 2, …, l) is converted into a word vector via the input embedding layer 301
Figure BDA0002372388180000131
After the context information encoding layer 302, a word vector h containing context information is obtainedi=Encoder(xi)。
The above steps of converting and encoding the word vector need only be performed before determining the weight of the value vector in step S230, and may be performed in parallel with steps S210 and S220, or before or after steps S210 and S220.
Subsequently, in step 230, the word xiWord vector hiAnd all key vectors
Figure BDA0002372388180000132
And all value vectors
Figure BDA0002372388180000133
Is input to the key-value mnemonic neural network layer 303.
Specifically, in the key-value memory neural network layer 303, each word x in the sequence of words is inputiEach value vector of
Figure BDA0002372388180000134
May be combined with a word vector h of the word containing context informationiAnd the value vector
Figure BDA0002372388180000135
Corresponding key vector
Figure BDA0002372388180000136
To be determined.
According to one example of the invention, each word xiEach value vector of
Figure BDA0002372388180000137
According to the word vector h of the wordiAnd the value vector
Figure BDA0002372388180000138
Corresponding key vector
Figure BDA0002372388180000139
The inner product of (2) is determined.
For example, each word xiEach value vector of
Figure BDA00023723881800001310
Weight p ofi,jAccording to the word vector hiAnd key vector
Figure BDA00023723881800001311
Inner product of (2) in the word vector hiAnd the sum of the inner products of the key vectors, as follows:
Figure BDA00023723881800001312
wherein,
Figure BDA00023723881800001313
is the word xiA word vector h obtained after inputting an embedding layer and a context information coding layeriVector of sum values
Figure BDA00023723881800001314
Corresponding key vector
Figure BDA00023723881800001315
The inner product of (d). Here, for the ith word xiUsing contextual characteristics (keys) ki,jCalculating the corresponding knowledge (value) v assigned to iti,jWeight p ofi,j
Then, according to the weight pi,jThe weighted sum of the syntactic knowledge vectors is computed as follows:
Figure BDA00023723881800001316
thereby completing each context feature vector
Figure BDA00023723881800001317
For its corresponding syntactic knowledge vector
Figure BDA00023723881800001318
The weighting of (2). This may also be understood as each context feature ki,jKnowledge of the syntax corresponding thereto
Figure BDA00023723881800001319
The weighting of (2).
Alternatively, the mapping of the key vectors and value vectors in step 220 is also implemented by the key-value memory neural network layer 303, i.e. the key embedding function EkSum value embedding function
Figure BDA0002372388180000141
Is integrated in the key-value mnemonic neural network layer 303 to input each word xiContext feature k ofi,jAnd its corresponding syntax knowledge
Figure BDA0002372388180000142
Respectively mapped as feature embedded vectors (key vectors)
Figure BDA0002372388180000143
And knowledge embedding vector (value vector)
Figure BDA0002372388180000144
Further, alternatively, each word x is processed in step 210iContext feature k ofi,jAnd its corresponding syntax knowledge
Figure BDA0002372388180000145
The obtaining of (2) can also be realized by the key-value mnemonic neural network layer 303, that is, the key-value mnemonic neural network layer 303 obtains the context characteristics of the whole input word sequence from the outside
Figure BDA0002372388180000146
And corresponding syntax knowledge
Figure BDA0002372388180000147
Then, for each word x thereiniExtracting contextual features
Figure BDA0002372388180000148
And its corresponding syntax knowledge
Figure BDA0002372388180000149
In step S240, the named entity recognition device performs named entity prediction after concatenating the word vector corresponding to each word in the input word sequence and the weighted sum vector of the value vector of the word, so as to obtain a corresponding recognition result, where the recognition result indicates a named entity tag of each word in the word sequence.
Here, the weighted sum vector a output by the key-value mnemonic neural network layer 303iAnd the word vector h output by the context information coding layer 302iAfter concatenation, the data is input to the decoding output layer 304 to obtain the output prediction label
Figure BDA00023723881800001410
By introducing the key-value memory neural network module, the context characteristics are mapped into key vectors, and the syntactic knowledge corresponding to the context characteristics is mapped into value vectors.
The process of training a named entity recognition model according to one embodiment of the present invention is further described below, in which the process of training a named entity recognition model is described.
Herein, the named entity recognition model for recognizing the named entity of the input word sequence according to the present invention is obtained by introducing a key-value memory neural network layer into the existing named entity recognition model. Thus, the named entity recognition model for model training described below is shown in fig. 3 and includes an input embedding layer 301, a context information encoding layer 302, a key-value memorizing layer 303, and a decoding output layer 304.
Referring to fig. 3 and 5, in the first round of training:
in step S510, the named entity recognition device obtains context features of each word and syntax knowledge corresponding to the context features according to the input word sequence; in step S520, the named entity recognition device maps each context feature and corresponding syntactic knowledge of each word into a key vector and a corresponding value vector, respectively, for each word; in step S530, the named entity recognition device determines a weighted sum vector of all value vectors of each word, wherein each value vector is weighted according to the word vector of the word and a key vector corresponding to the value vector; in step S540, the named entity recognition device performs named entity prediction on the word vector of each word in the input word sequence and the vector obtained by concatenating the weighted sum vector to obtain a corresponding recognition result, where the recognition result indicates a named entity tag of each word; in step S550, the named entity recognition device compares the recognition result with the actual named entity tag to update the relevant parameters of the named entity recognition model.
Subsequently, based on the updated parameters of the named entity recognition model, the above steps S510-S550 are repeatedly performed until the objective function of the named entity recognition model converges.
Specifically, in step S510, the named entity recognition device obtains the context feature of each word and its corresponding syntactic knowledge according to the input word sequence.
Here, the input word sequence is tagged
Figure BDA0002372388180000151
Wherein x isiIndicating a word in the sequence of input words, l indicating the length of the sequence of input words.
According to one example of the present invention, for a sequence of input words
Figure BDA0002372388180000152
Named entity recognition equipment obtains context characteristics of whole input word sequence through automatic tool
Figure BDA0002372388180000153
And its corresponding syntax knowledge
Figure BDA0002372388180000154
For each word x in the input word sequenceiNamed entity recognition device from
Figure BDA0002372388180000155
And
Figure BDA0002372388180000156
to extract contextual features related to the word
Figure BDA0002372388180000157
And its corresponding syntax knowledge
Figure BDA0002372388180000158
Accordingly, the word xiOne contextual feature of (a) may be labeled as key ki,j(wherein j is 1,2, … mi) The context feature ki,jThe corresponding syntactic knowledge may be labeled as a value
Figure BDA0002372388180000159
In step S520, the named entity recognition device maps each of its contextual characteristics and corresponding syntactic knowledge to a key vector and a corresponding value vector, respectively, for each word in the sequence of input words.
Here, the mapping of the key vector and the mapping of the value vector are performed by corresponding embedding functions, respectively. This is similar to the input embedding layer 301 converting each word in the input word sequence into a word vector through an embedding function.
For example, the input word sequence x ═ x1x2…xlIs input to the input embedding layer 301, via the word embedding function ExInputting each word x in the sequence of wordsiIs converted into an input word vector
Figure BDA0002372388180000161
Likewise, x for each word in the input word sequenceiA context feature k ofi,jAnd its corresponding syntactic knowledge vi,jThe named entity recognition device can embed a function E through a keykSum value embedding function EvRespectively mapped as feature embedded vectors (key vectors)
Figure BDA0002372388180000162
And knowledge embedding vector (value vector)
Figure BDA0002372388180000163
Thus, for each word xiThe named entity recognition device obtains all its key vectors
Figure BDA0002372388180000164
And all value vectors
Figure BDA0002372388180000165
In step S530, the named entity recognition device determines a weighted sum vector of all value vectors of each word in the input word sequence, wherein each value vector is weighted according to the word vector of the word and a key vector corresponding to the value vector.
Each word x in the input word sequencei(i-1, 2, …, l) is converted into a word vector via the input embedding layer 301
Figure BDA0002372388180000166
After the context information encoding layer 302, a word vector h containing context information is obtainedi=Encoder(xi)。
Then, in step 530, the word xiWord vector hiAnd all key vectors
Figure BDA0002372388180000167
And all value vectors
Figure BDA0002372388180000168
Is input to the key-value mnemonic neural network layer 303.
Specifically, in the key-value memory neural network layer 303, each word x in the sequence of words is inputiEach value vector of
Figure BDA0002372388180000169
May be combined with a word vector h of the word containing context informationiAnd the value vector
Figure BDA00023723881800001610
Corresponding key vector
Figure BDA00023723881800001611
To be determined.
According to one example of the invention, each word xiEach value vector of
Figure BDA00023723881800001612
According to the word vector h of the wordiAnd the value vector
Figure BDA00023723881800001613
Corresponding key vector
Figure BDA00023723881800001614
The inner product of (2) is determined.
For example, each word xiWeight p of each value vector ofi,jAccording to the word vector hiAnd key vector
Figure BDA00023723881800001615
Is determined at the word vector hiAnd the sum of the inner products of the key vectors, as specified in equation (1) above.
Then, according to the weight p as in the above formula (2)i,jA weighted sum of all syntactic knowledge vectors is computed.
Thereby completing the weighting of the context feature to the syntactic knowledge.
In step S540, the named entity recognition device performs named entity prediction after concatenating the word vector corresponding to each word in the input word sequence and the weighted sum vector of the value vector of the word, so as to obtain a corresponding recognition result, where the recognition result indicates a named entity tag of each word in the word sequence.
Here, the weighted sum vector a output by the key-value mnemonic neural network layer 303iAnd the word vector h output by the context information coding layer 302iAfter concatenation, the data is input to the decoding output layer 304 to obtain the output prediction label
Figure BDA0002372388180000171
In step S550, the named entity recognition device compares the recognition result output in step S540 with the actual named entity tag to update the relevant parameters of the named entity recognition model.
According to an example of the present invention, the named entity recognition device calculates an objective function, which may be, for example, a "cross entropy function" (cross entropy), and adjusts relevant parameters of the named entity recognition model according to the calculation result, for example, by updating the input embedding layer 301, the context information encoding layer 302, the key-value memory neural network layer 303, the decoding output using a back propagation algorithmAll parameters of layer 304, including key vectors
Figure BDA0002372388180000172
Value vector
Figure BDA0002372388180000173
Inputting word vectors of an embedding layer
Figure BDA0002372388180000174
And the like. I.e. in addition to the word xiContext feature ki,jAnd syntactic knowledge
Figure BDA0002372388180000175
In addition, all the parameters that occur are updated. When the calculation result of the objective function converges, the training ends.
And when the calculation result of the target function does not reach convergence, the named entity recognition equipment enters the next round of training process of the named entity recognition model after updating the relevant parameters of the named entity recognition model. And starting from the second round of training, repeatedly acquiring the context characteristics of each word and the corresponding syntactic knowledge for the input word sequence. That is, the above steps S520 to S550 are repeatedly performed only until the calculation result of the objective function converges.
Fig. 6 shows a schematic diagram of an apparatus according to an embodiment of the invention, in which a named entity recognition apparatus is specifically shown.
As shown in fig. 6, the named entity recognition arrangement 60 comprises an obtaining module 61, a mapping module 62, a weighting module 63 and a prediction module 64. Referring to fig. 3, the named entity recognition apparatus 60 is further coupled with a named entity recognition model, which includes an input embedding layer 301, a context information encoding layer 302, a key-value memory neural network layer 303, and a decoding output layer 304.
The obtaining module 61 obtains the context characteristics of each word and the corresponding syntactic knowledge thereof according to the input word sequence; mapping module 62 maps each context feature and corresponding syntactic knowledge of the word to a key vector and a corresponding value vector, respectively, for the word; the weighting module 63 determines a weighted sum vector of all value vectors of each word, wherein each value vector is weighted according to the word vector of the word and a key vector corresponding to the value vector; the prediction module 64 performs named entity prediction on the word vector of each word in the input word sequence and the vector obtained by concatenating the weighted sum vector to obtain a corresponding recognition result, where the recognition result indicates the named entity tag of each word.
Specifically, the obtaining module 61 obtains the context feature of each word and the corresponding syntactic knowledge thereof according to the input word sequence.
Here, the input word sequence is tagged
Figure BDA0002372388180000181
Wherein x isiIndicating a word in the sequence of input words, l indicating the length of the sequence of input words.
According to one example of the present invention, for a sequence of input words
Figure BDA0002372388180000182
The obtaining module 61 obtains the context characteristics of the whole input word sequence through an automatic tool
Figure BDA0002372388180000183
And its corresponding syntax knowledge
Figure BDA0002372388180000184
For each word x in the input word sequenceiAn acquisition module 61 from
Figure BDA0002372388180000185
And
Figure BDA0002372388180000186
to extract contextual features related to the word
Figure BDA0002372388180000187
And its corresponding syntax knowledge
Figure BDA0002372388180000188
Accordingly, the word xiOne contextual feature of (a) may be labeled as key ki,j(wherein j is 1,2, … mi) The context feature ki,jThe corresponding syntactic knowledge may be labeled as a value
Figure BDA0002372388180000189
Mapping module 62 maps each contextual feature and corresponding syntactic knowledge of each word in the sequence of input words to a key vector and a corresponding value vector, respectively.
Here, the mapping of the key vector and the mapping of the value vector are performed by corresponding embedding functions, respectively. This is similar to the input embedding layer 301 converting each word in the input word sequence into a word vector through an embedding function.
For example, the input word sequence x ═ x1x2…xlIs input to the input embedding layer 301, via the word embedding function ExInputting each word x in the sequence of wordsiIs converted into an input word vector
Figure BDA00023723881800001810
Likewise, x for each word in the input word sequenceiA context feature k ofi,jAnd its corresponding syntax knowledge
Figure BDA00023723881800001811
Mapping module 62 may embed function E by a keykSum value embedding function
Figure BDA00023723881800001812
Respectively mapping them into feature embedded vectors (key vectors)
Figure BDA00023723881800001813
And knowledge embedding vector (value vector)
Figure BDA00023723881800001814
Thus, for each word xiMapping module 62 obtains all of its key vectors
Figure BDA00023723881800001815
And all value vectors
Figure BDA0002372388180000191
The weighting module 63 determines a weighted sum vector of all value vectors of each word in the sequence of input words, wherein the weight of each value vector is determined in combination with the word vector of the word containing context information and the key vector corresponding to the value vector.
Each word x in the input word sequencei(i-1, 2, …, l) is converted into a word vector via the input embedding layer 301
Figure BDA0002372388180000192
After the context information encoding layer 302, a word vector h containing context information is obtainedi=Encoder(xi)。
Then, the word xiWord vector hiAnd all key vectors
Figure BDA0002372388180000193
And all value vectors
Figure BDA0002372388180000194
Is input to the key-value mnemonic neural network layer 303.
Specifically, in the key-value memory neural network layer 303, each word x in the sequence of words is inputiEach value vector of
Figure BDA0002372388180000195
May be combined with a word vector h of the word containing context informationiAnd the value vector
Figure BDA0002372388180000196
Corresponding key vector
Figure BDA0002372388180000197
To be determined.
According to one example of the invention, each word xiEach value vector of
Figure BDA0002372388180000198
According to the word vector h of the wordiAnd the value vector
Figure BDA0002372388180000199
Corresponding key vector
Figure BDA00023723881800001910
The inner product of (2) is determined.
For example, each word xiWeight p of each value vector ofi,jAccording to the word vector hiAnd key vector
Figure BDA00023723881800001911
Is determined at the word vector hiAnd the sum of the inner products of the key vectors, as specified in equation (1) above.
Then, according to the weight p as in the above formula (2)i,jA weighted sum of all syntactic knowledge vectors is computed.
Thereby completing the weighting of the context feature to the syntactic knowledge.
The prediction module 64 performs named entity prediction after concatenating the word vector corresponding to each word in the input word sequence with the weighted sum vector of the value vectors of the words to obtain a corresponding recognition result, where the recognition result indicates the named entity tag of each word in the word sequence.
Here, the weighted sum vector a output by the key-value mnemonic neural network layer 303iAnd the word vector h output by the context information coding layer 302iAfter concatenation, the data is input to the decoding output layer 304 to obtain the output prediction label
Figure BDA00023723881800001912
According to an example of the present invention, the named entity recognition apparatus 60 further comprises a comparison module (not shown in fig. 6) when the named entity recognition model is in the training process.
In the first round of training, after the obtaining module 61, the mapping module 62, the weighting module 63 and the prediction module 64 sequentially perform their corresponding operations, the comparison module compares the recognition result output by the prediction module 64 with the actual named entity tag to update the relevant parameters of the named entity recognition model. Then, the next round of training is started, namely, the mapping module 62, the weighting module 63 and the prediction module 64 are triggered again to execute the corresponding operations; and circularly executing the operation until the target function of the named entity recognition model converges.
According to an example of the present invention, the comparison module calculates an objective function, which may be, for example, a "cross entropy function" (cross entropy), and adjusts relevant parameters of the named entity recognition model according to the calculation result, for example, by using a back propagation algorithm to update all parameters of the input embedding layer 301, the context information encoding layer 302, the key-value memory neural network layer 303, and the decoding output layer 304, including the key vector
Figure BDA0002372388180000201
Value vector
Figure BDA0002372388180000202
Inputting word vectors of an embedding layer
Figure BDA0002372388180000203
And the like. I.e. in addition to the word xiContext feature ki,jAnd syntactic knowledge vi,jIn addition, all the parameters that occur are updated. When the calculation result of the objective function converges, the training ends.
When the calculation result of the objective function has not reached convergence, the comparison module re-triggers the mapping module 62 to enter the next round of training process for the named entity recognition model after updating the relevant parameters of the named entity recognition model.
It should be noted that the present invention may be implemented in software and/or in a combination of software and hardware, for example, as an Application Specific Integrated Circuit (ASIC), a general purpose computer or any other similar hardware device. In one embodiment, the software program of the present invention may be executed by a processor to implement the steps or functions described above. Also, the software programs (including associated data structures) of the present invention can be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Further, some of the steps or functions of the present invention may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.
In addition, at least a portion of the present invention may be implemented as a computer program product, such as computer program instructions, which, when executed by a computing device, may invoke or provide methods and/or aspects in accordance with the present invention through operation of the computing device. Program instructions which invoke/provide the methods of the present invention may be stored on fixed or removable recording media and/or transmitted via a data stream over a broadcast or other signal-bearing medium, and/or stored in a working memory of a computing device operating in accordance with the program instructions.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims (15)

1. A named entity recognition method, wherein the method comprises the steps of:
acquiring context characteristics of each word and corresponding syntactic knowledge thereof according to the input word sequence;
for each word, mapping each context feature and corresponding syntactic knowledge of the word into a key vector and a corresponding value vector respectively;
determining a weighted sum vector of all value vectors of each word, wherein each value vector is weighted according to the word vector containing the context information of the word and the corresponding key vector of the value vector;
and conducting named entity prediction on the word vector of each word in the input word sequence and the vector formed by connecting the weighted sum vector in series so as to obtain a corresponding recognition result, wherein the recognition result indicates the named entity label of each word.
2. The method of claim 1, wherein the weight of each value vector of each word is further determined by an inner product of the word vector of the word and the key vector corresponding to the value vector.
3. The method according to claim 1 or 2, wherein the step of obtaining specifically comprises:
acquiring context characteristics matched with the input word sequence and corresponding syntactic knowledge thereof according to the input word sequence;
and acquiring the context characteristics of each word and the corresponding syntactic knowledge from the matched context characteristics and the corresponding syntactic knowledge.
4. The method according to any of claims 1 to 3, wherein the method is performed by a named entity recognition model,
wherein, when the named entity recognition model is in the training process, the method further comprises:
comparing the recognition result with an actual named entity tag to update relevant parameters of the named entity recognition model;
and after the mapping step, the weighting step and the predicting step are executed again, comparing the obtained identification result with the actual named entity label, updating the related parameters of the named entity identification model according to the identification result, and executing the steps in a circulating way until the target function of the named entity identification model converges.
5. A named entity recognition method, wherein, the named entity recognition model includes inputting and imbedding the layer, context information coding layer, key-value memory neural network layer and decoding the output layer;
wherein, the method comprises the following steps:
acquiring context characteristics of each word and corresponding syntactic knowledge thereof according to the input word sequence;
inputting the context feature of each word and the corresponding syntactic knowledge thereof to the key-value memory neural network layer so as to map each context feature of each word and the corresponding syntactic knowledge thereof into a key vector and a corresponding value vector respectively;
inputting a word vector output by each word in the input word sequence through the input embedding layer and the context information coding layer into the key-value memory neural network layer to obtain a weighted sum vector of all value vectors of each word, wherein each value vector is weighted according to the word vector of the word and a key vector corresponding to the value vector;
and inputting the word vector of each word in the input word sequence and the vector formed by connecting the weighted sum vector in series into the decoding output layer to obtain a corresponding recognition result, wherein the recognition result indicates the named entity label of each word.
6. The method of claim 5, wherein when the named entity recognition model is in a training process, the method further comprises:
comparing the recognition result with an actual named entity tag to update relevant parameters of the named entity recognition model;
and after the mapping step, the weighting step and the predicting step are executed again, comparing the obtained identification result with the actual named entity label, updating the related parameters of the named entity identification model according to the identification result, and executing the steps in a circulating way until the target function of the named entity identification model converges.
7. A named entity recognition apparatus, wherein the apparatus comprises:
the acquisition module is used for acquiring the context characteristics of each word and the corresponding syntactic knowledge thereof according to the input word sequence;
a mapping module for mapping each context feature and corresponding syntactic knowledge of each word into a key vector and a corresponding value vector, respectively;
a weighting module for determining a weighted sum vector of all value vectors of each word, wherein each value vector is weighted according to a word vector of the word containing context information and a key vector corresponding to the value vector;
and the prediction module is used for conducting named entity prediction on the word vector of each word in the input word sequence and the vector formed by connecting the weighted sum vector in series so as to obtain a corresponding recognition result, and the recognition result indicates the named entity label of each word.
8. The apparatus of claim 7, wherein the weight of each value vector of each word is further determined according to an inner product of the word vector of the word and a key vector corresponding to the value vector.
9. The apparatus according to claim 7 or 8, wherein the obtaining module is specifically configured to:
acquiring context characteristics matched with the input word sequence and corresponding syntactic knowledge thereof according to the input word sequence;
and acquiring the context characteristics of each word and the corresponding syntactic knowledge from the matched context characteristics and the corresponding syntactic knowledge.
10. The apparatus according to any of claims 7 to 9, wherein the apparatus is coupled with a named entity recognition model,
when the named entity recognition model is in a training process, the device further comprises:
and the comparison module is used for comparing the identification result with an actual named entity label so as to update the related parameters of the named entity identification model, triggering the mapping module, the weighting module and the prediction module again to execute the corresponding operations, and circularly executing the operations until the target function of the named entity identification model is converged.
11. A named entity recognition device, wherein the named entity recognition device is coupled with a named entity recognition model, and the named entity recognition model comprises an input embedding layer, a context information coding layer, a key-value memory neural network layer and a decoding output layer;
wherein, the device includes:
the acquisition module is used for acquiring the context characteristics of each word and the corresponding syntactic knowledge thereof according to the input word sequence;
a mapping module, configured to input the context feature of each word and the syntax knowledge corresponding to the context feature to the key-value memory neural network layer, so as to map each context feature of each word and the syntax knowledge corresponding to the context feature of each word into a key vector and a corresponding value vector, respectively;
a weighting module, configured to input a word vector output by each word in the input word sequence via the input embedding layer and the context information encoding layer to the key-value memory neural network layer, so as to obtain a weighted sum vector of all value vectors of each word, where each value vector is weighted according to the word vector of the word and a key vector corresponding to the value vector;
and the prediction module is used for inputting the word vector of each word in the input word sequence and the vector formed by connecting the weighted sum vector in series into the decoding output layer so as to obtain a corresponding recognition result, and the recognition result indicates the named entity label of each word.
12. The apparatus of claim 11, wherein the apparatus further comprises:
and the comparison module is used for comparing the identification result with an actual named entity label so as to update the related parameters of the named entity identification model, triggering the mapping module, the weighting module and the prediction module again to execute the corresponding operations, and circularly executing the operations until the target function of the named entity identification model is converged.
13. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 6 when executing the computer program.
14. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the method of any of claims 1 to 6.
15. A computer program product implementing the method of any one of claims 1 to 6 when executed by a computer device.
CN202010054650.XA 2020-01-17 2020-01-17 Method and device for named entity recognition Pending CN111291565A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010054650.XA CN111291565A (en) 2020-01-17 2020-01-17 Method and device for named entity recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010054650.XA CN111291565A (en) 2020-01-17 2020-01-17 Method and device for named entity recognition

Publications (1)

Publication Number Publication Date
CN111291565A true CN111291565A (en) 2020-06-16

Family

ID=71021220

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010054650.XA Pending CN111291565A (en) 2020-01-17 2020-01-17 Method and device for named entity recognition

Country Status (1)

Country Link
CN (1) CN111291565A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738007A (en) * 2020-07-03 2020-10-02 北京邮电大学 A Data Augmentation Algorithm for Chinese Named Entity Recognition Based on Sequence Generative Adversarial Networks
CN116724305A (en) * 2021-01-20 2023-09-08 甲骨文国际公司 Integration of context labels with named entity recognition models
WO2023226292A1 (en) * 2022-05-27 2023-11-30 苏州思萃人工智能研究所有限公司 Method for extracting relation from text, relation extraction model, and medium
WO2024021343A1 (en) * 2022-07-29 2024-02-01 苏州思萃人工智能研究所有限公司 Natural language processing method, computer device, readable storage medium, and program product

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval
CN102654866A (en) * 2011-03-02 2012-09-05 北京百度网讯科技有限公司 Method and device for establishing example sentence index and method and device for indexing example sentences
CN110008480A (en) * 2018-12-05 2019-07-12 中国科学院自动化研究所 Small data vocabulary dendrography learning method and system and relevant device based on prototype memory

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval
CN102654866A (en) * 2011-03-02 2012-09-05 北京百度网讯科技有限公司 Method and device for establishing example sentence index and method and device for indexing example sentences
CN110008480A (en) * 2018-12-05 2019-07-12 中国科学院自动化研究所 Small data vocabulary dendrography learning method and system and relevant device based on prototype memory

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738007A (en) * 2020-07-03 2020-10-02 北京邮电大学 A Data Augmentation Algorithm for Chinese Named Entity Recognition Based on Sequence Generative Adversarial Networks
CN116724305A (en) * 2021-01-20 2023-09-08 甲骨文国际公司 Integration of context labels with named entity recognition models
CN116724305B (en) * 2021-01-20 2024-07-19 甲骨文国际公司 Integration of context labels with named entity recognition models
WO2023226292A1 (en) * 2022-05-27 2023-11-30 苏州思萃人工智能研究所有限公司 Method for extracting relation from text, relation extraction model, and medium
WO2024021343A1 (en) * 2022-07-29 2024-02-01 苏州思萃人工智能研究所有限公司 Natural language processing method, computer device, readable storage medium, and program product

Similar Documents

Publication Publication Date Title
CN108363790B (en) Method, device, equipment and storage medium for evaluating comments
CN109033068B (en) Method and device for reading and understanding based on attention mechanism and electronic equipment
US11544474B2 (en) Generation of text from structured data
CN112836514B (en) Nested entity identification method, apparatus, electronic device and storage medium
CN108932342A (en) A kind of method of semantic matches, the learning method of model and server
CN111291565A (en) Method and device for named entity recognition
US20230244704A1 (en) Sequenced data processing method and device, and text processing method and device
CN112149386B (en) Event extraction method, storage medium and server
CN112052684A (en) Named entity identification method, device, equipment and storage medium for power metering
JP7121819B2 (en) Image processing method and apparatus, electronic device, computer-readable storage medium, and computer program
CN110334186A (en) Data query method, apparatus, computer equipment and computer readable storage medium
CN113704416A (en) Word sense disambiguation method and device, electronic equipment and computer-readable storage medium
JP2022145623A (en) METHOD AND APPARATUS FOR PROVIDING HINT INFORMATION AND COMPUTER PROGRAM
CN110968725B (en) Image content description information generation method, electronic device and storage medium
CN111651985A (en) Method and device for Chinese word segmentation
JP5441937B2 (en) Language model learning device, language model learning method, language analysis device, and program
CN111241843B (en) Semantic relation inference system and method based on composite neural network
CN114298048B (en) Named entity recognition method and device
WO2019163752A1 (en) Morpheme analysis learning device, morpheme analysis device, method, and program
CN113343692B (en) Search intention recognition method, model training method, device, medium and equipment
CN114758330A (en) Text recognition method and device, electronic equipment and storage medium
CN111339287B (en) Abstract generation method and device
CN112380861A (en) Model training method and device and intention identification method and device
CN117828024A (en) Plug-in retrieval method, device, storage medium and equipment
CN115248846B (en) Text recognition method, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200616

WD01 Invention patent application deemed withdrawn after publication