CN111291565A - Method and device for named entity recognition - Google Patents
Method and device for named entity recognition Download PDFInfo
- Publication number
- CN111291565A CN111291565A CN202010054650.XA CN202010054650A CN111291565A CN 111291565 A CN111291565 A CN 111291565A CN 202010054650 A CN202010054650 A CN 202010054650A CN 111291565 A CN111291565 A CN 111291565A
- Authority
- CN
- China
- Prior art keywords
- word
- vector
- named entity
- value
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 75
- 239000013598 vector Substances 0.000 claims abstract description 313
- 238000013507 mapping Methods 0.000 claims abstract description 35
- 230000006870 function Effects 0.000 claims description 43
- 230000015654 memory Effects 0.000 claims description 41
- 238000013528 artificial neural network Methods 0.000 claims description 34
- 230000008569 process Effects 0.000 claims description 17
- 238000012549 training Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 16
- 238000013135 deep learning Methods 0.000 abstract description 13
- 238000002372 labelling Methods 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 15
- 230000001343 mnemonic effect Effects 0.000 description 10
- 238000004364 calculation method Methods 0.000 description 9
- 208000019622 heart disease Diseases 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 6
- 201000010099 disease Diseases 0.000 description 6
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 6
- 239000000284 extract Substances 0.000 description 6
- 206010010356 Congenital anomaly Diseases 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 208000002330 Congenital Heart Defects Diseases 0.000 description 4
- 208000028831 congenital heart disease Diseases 0.000 description 4
- 208000027205 Congenital disease Diseases 0.000 description 3
- 238000007796 conventional method Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000005291 magnetic effect Effects 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003631 expected effect Effects 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 230000003936 working memory Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Machine Translation (AREA)
- Character Discrimination (AREA)
Abstract
The invention aims to provide a named entity identification method and a named entity identification device. Acquiring context characteristics of each word and corresponding syntactic knowledge thereof according to the input word sequence; for each word, mapping each context feature and corresponding syntactic knowledge of the word into a key vector and a corresponding value vector respectively; determining a weighted sum vector of all value vectors of each word; and conducting named entity prediction on the word vector of each word in the input word sequence and the vector formed by connecting the weighted sum vector in series so as to obtain a corresponding recognition result, wherein the recognition result indicates the named entity label of each word. Compared with the prior art, the method introduces weighted syntactic knowledge into a general deep learning named entity recognition system based on sequence labeling. Therefore, the context characteristics can be effectively utilized in a deep learning framework to weight the corresponding syntactic knowledge, and the performance of the named entity recognition system is further improved.
Description
Technical Field
The invention relates to the technical field of Natural Language Processing (NLP), in particular to a Named Entity Recognition (NER) technology.
Background
Named entities refer to person names, organization names, place names, and all other entities identified by name. For example, "zhang san" is a person name, "beijing" is a place name, etc. In scientific text, common named entities are also disease names (e.g., "congenital heart disease"), terms of expertise ("simple harmonic vibration"), and so on.
Named entity recognition refers to a natural language processing task for recognizing a named entity in an input word sequence. For example, for an input word sequence (word-by-word separated by "/") zhang/suffering/congenital/heart disease, the task of named entity recognition is to identify the named entity therein, i.e., the name "zhang" and the name of the disease "congenital heart disease".
And the named entity labeler assigns a label to each word in the input word sequence so as to represent the result of the named entity recognition. Currently the mainstream named entity tags share 3 classes: "B-X" ("X" denotes a tag for a named entity category, such as Disease name "Disease") means that the word is the first word of a named entity, "I-X" denotes that the word is a non-first word of a named entity, "E-X" denotes that the word is a last word of a named entity, and "0" denotes that the word is not a component of a named entity. For example, the named entity tags for each word in the word sequence "Zhang San/suffers from/congenital/heart Disease" are "B-Person", "0", "B-Disease", "E-Disease" in that order.
Techniques for named entity recognition can be divided into feature-based traditional methods and deep learning methods.
The feature-based method is to extract features of an input word sequence by a method of manually designing and selecting the features, and judge the named entity label of the current word based on the features. Common features include current, preceding, succeeding, and the like. However, the effectiveness of this method is highly dependent on the quality of the artificially designed, extracted features, and it is very difficult to design a high-quality feature extraction method.
In addition, considering that scientific texts have the characteristics of formal language, normative expression and long sentences, the traditional method also tries to improve the performance of the named entity recognition system by using external syntactic knowledge acquired by an automatic method. The syntactic knowledge of noun phrases often implies that a named entity may exist in the phrase, for example, "congenital heart disease" is a noun phrase, and is itself a named entity. However, the conventional method utilizes external syntactic knowledge to train the model by considering the syntactic knowledge as a correct reference (gold reference), so that the wrong knowledge generated by the performance problem of the external automatic tool will have negative influence on the conventional method based on the characteristics.
In recent years, a deep learning method is gradually applied to named entity recognition, and can automatically extract text features according to the characteristics of specific tasks, so that the huge cost of manual design and feature extraction is avoided. The recognition effect of deep learning far exceeds that of a simple traditional method.
Disclosure of Invention
The invention aims to provide a named entity identification method, a named entity identification device, a computer readable storage medium and a computer program product.
According to an aspect of the present invention, there is provided a named entity recognition method, wherein the method comprises the steps of:
acquiring context characteristics of each word and corresponding syntactic knowledge thereof according to the input word sequence;
for each word, mapping each context feature and corresponding syntactic knowledge of the word into a key vector and a corresponding value vector respectively;
determining a weighted sum vector of all value vectors of each word, wherein each value vector is weighted according to the word vector of the word and a key vector corresponding to the value vector;
and conducting named entity prediction on the word vector of each word in the input word sequence and the vector formed by connecting the weighted sum vector in series so as to obtain a corresponding recognition result, wherein the recognition result indicates the named entity label of each word.
According to an aspect of the present invention, there is also provided a named entity recognition method, wherein the named entity recognition model includes an input embedding layer, a context information encoding layer, a key-value memory neural network layer, and a decoding output layer;
wherein, the method comprises the following steps:
acquiring context characteristics of each word and corresponding syntactic knowledge thereof according to the input word sequence;
inputting the context feature of each word and the corresponding syntactic knowledge thereof to the key-value memory neural network layer so as to map each context feature of each word and the corresponding syntactic knowledge thereof into a key vector and a corresponding value vector respectively;
inputting a word vector output by each word in the input word sequence through the input embedding layer and the context information coding layer into the key-value memory neural network layer to obtain a weighted sum vector of all value vectors of each word, wherein each value vector is weighted according to the word vector of the word and a key vector corresponding to the value vector;
and inputting the word vector of each word in the input word sequence and the vector formed by connecting the weighted sum vector in series into the decoding output layer to obtain a corresponding recognition result, wherein the recognition result indicates the named entity label of each word.
According to an aspect of the present invention, there is also provided a named entity recognition apparatus, wherein the apparatus includes:
the acquisition module is used for acquiring the context characteristics of each word and the corresponding syntactic knowledge thereof according to the input word sequence;
a mapping module for mapping each context feature and corresponding syntactic knowledge of each word into a key vector and a corresponding value vector, respectively;
a weighting module for determining a weighted sum vector of all value vectors of each word, wherein each value vector is weighted according to the word vector of the word and a key vector corresponding to the value vector;
and the prediction module is used for conducting named entity prediction on the word vector of each word in the input word sequence and the vector formed by connecting the weighted sum vector in series so as to obtain a corresponding recognition result, and the recognition result indicates the named entity label of each word.
According to an aspect of the present invention, there is also provided a named entity recognition apparatus, wherein the named entity recognition apparatus is coupled with a named entity recognition model, and the named entity recognition model includes an input embedding layer, a context information encoding layer, a key-value memory neural network layer, and a decoding output layer;
wherein, the device includes:
the acquisition module is used for acquiring the context characteristics of each word and the corresponding syntactic knowledge thereof according to the input word sequence;
a mapping module, configured to input the context feature of each word and the syntax knowledge corresponding to the context feature to the key-value memory neural network layer, so as to map each context feature of each word and the syntax knowledge corresponding to the context feature of each word into a key vector and a corresponding value vector, respectively;
a weighting module, configured to input a word vector output by each word in the input word sequence via the input embedding layer and the context information encoding layer to the key-value memory neural network layer, so as to obtain a weighted sum vector of all value vectors of each word, where each value vector is weighted according to the word vector of the word and a key vector corresponding to the value vector;
and the prediction module is used for inputting the word vector of each word in the input word sequence and the vector formed by connecting the weighted sum vector in series into the decoding output layer so as to obtain a corresponding recognition result, and the recognition result indicates the named entity label of each word.
According to an aspect of the present invention, there is also provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the named entity recognition method according to an aspect of the present invention when executing the computer program.
According to an aspect of the invention, there is also provided a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements a named entity recognition method according to an aspect of the invention.
According to an aspect of the invention, there is also provided a computer program product which, when executed by a computing device, implements a named entity recognition method according to an aspect of the invention.
Compared with the prior art, the method and the device have the advantages that the context characteristics and the syntactic knowledge related to each word in the input word sequence are obtained, the syntactic knowledge is weighted by mapping the context characteristics into key vectors, mapping the syntactic knowledge corresponding to the context characteristics into value vectors and converting between key values, and the weighted syntactic knowledge is introduced into a general deep learning named entity recognition system based on sequence labeling. Therefore, the context characteristics can be effectively utilized in a deep learning framework to weight the corresponding syntactic knowledge, and the performance of the named entity recognition system is further improved.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
FIG. 1 illustrates a framework diagram of an existing named entity recognition model;
FIG. 2 illustrates a flow diagram of a named entity recognition method according to one embodiment of the invention;
FIG. 3 illustrates a framework diagram of a named entity recognition model according to one example of the invention;
FIG. 4 illustrates a block diagram of a key-value mnemonic neural network layer according to an example of the present invention;
FIG. 5 illustrates a flow diagram for training a named entity recognition model, according to one embodiment of the invention;
fig. 6 shows a schematic diagram of a named entity recognition apparatus according to another embodiment of the present invention.
The same or similar reference numbers in the drawings identify the same or similar elements.
Detailed Description
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments of the present invention are described as an apparatus represented by a block diagram and a process or method represented by a flow diagram. Although a flowchart depicts a sequence of process steps in the present invention, many of the operations can be performed in parallel, concurrently, or simultaneously. In addition, the order of the operations may be re-arranged. The process of the present invention may be terminated when its operations are performed, but may include additional steps not shown in the flowchart. The processes of the present invention may correspond to methods, functions, procedures, subroutines, and the like.
The methods illustrated by the flow diagrams and apparatus illustrated by the block diagrams discussed below may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine or computer readable medium such as storage medium. The processor(s) may perform the necessary tasks.
Similarly, it will be further appreciated that any flow charts, flow diagrams, state transition diagrams, and the like represent various processes which may be substantially described as program code stored in computer readable media and so executed by a computing device or processor, whether or not such computing device or processor is explicitly shown.
As used herein, the term "storage medium" may refer to one or more devices for storing data, including Read Only Memory (ROM), Random Access Memory (RAM), magnetic RAM, kernel memory, magnetic disk storage media, optical storage media, flash memory devices, and/or other machine-readable media for storing information. The term "computer-readable medium" can include, but is not limited to portable or fixed storage devices, optical storage devices, and various other mediums capable of storing and/or containing instructions and/or data.
A code segment may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program descriptions. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, information passing, token passing, network transmission, etc.
The term "computer device" in this context refers to an electronic device that can perform predetermined processes such as numerical calculation and/or logic calculation by executing predetermined programs or instructions, and may at least include a processor and a memory, wherein the predetermined processes are performed by the processor executing program instructions prestored in the memory, or performed by hardware such as ASIC, FPGA, DSP, or implemented by a combination of the two.
The "computer device" is typically embodied in the form of a general-purpose computer device, and its components may include, but are not limited to: one or more processors or processing units, system memory. The system memory may include computer readable media in the form of volatile memory, such as Random Access Memory (RAM) and/or cache memory. The "computer device" may further include other removable/non-removable, volatile/nonvolatile computer-readable storage media. The memory may include at least one computer program product having a set (e.g., at least one) of program modules that are configured to perform the functions and/or methods of embodiments of the present invention. The processor executes various functional applications and data processing by executing programs stored in the memory.
For example, a computer program for executing the functions and processes of the present invention is stored in the memory, and the NER scheme of the present invention is implemented when the processor executes the corresponding computer program.
Typically, the computer devices include, for example, user equipment and network devices. Wherein the user equipment includes but is not limited to a Personal Computer (PC), a notebook computer, a mobile terminal, etc., and the mobile terminal includes but is not limited to a smart phone, a tablet computer, etc.; the network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of computers or network servers, wherein Cloud Computing is one of distributed Computing, a super virtual computer consisting of a collection of loosely coupled computers. Wherein the computer device can be operated alone to implement the invention, or can be accessed to a network and implement the invention through interoperation with other computer devices in the network. The network in which the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.
It should be noted that the user devices, network devices, networks, etc. are merely examples, and other existing or future computing devices or networks may be suitable for the present invention, and are included in the scope of the present invention and are incorporated by reference herein.
Specific structural and functional details disclosed herein are merely representative and are provided for purposes of describing example embodiments of the present invention. The present invention may, however, be embodied in many alternate forms and should not be construed as limited to only the embodiments set forth herein.
It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element may be termed a second element, and, similarly, a second element may be termed a first element, without departing from the scope of example embodiments. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be noted that, in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed substantially concurrently, or the figures may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
The present invention is described in further detail below with reference to the attached drawing figures.
Referring to FIG. 1, FIG. 1 illustrates a basic framework of a named entity recognition model based on deep learning. The named entity recognition model comprises 3 modules: an input embedding layer 101, a context information encoding layer 102, and a decoding output layer 103.
The purpose of the input embedding layer 101 is to map each word in the input word sequence into a word vector in a high-dimensional continuous space to represent the characteristics of this word. The word vectors are typically derived from language models that are pre-trained on large-scale unlabeled corpus. The implementation of the input embedding layer 101 is based on fixed word vector expression mapping (partitioned embedding), which correspondingly converts each word of the input word sequence into a word vector in a word vector library by using a word vector library acquired by other external methods.
The purpose of the context information coding layer 102 is to extract context information of each word based on the word vector and calculate the influence of the word vectors of other words on the context information. The input to this layer is the output of the input embedding layer (i.e., the word vector for each word in a sentence), and the output is a context-coded word vector for each word that is different from the input vector. There are three main ways to implement this layer: one is Convolutional Neural Network (CNN); the second is a Recurrent Neural Network (RNN), typically a Long Short term memory Network (LSTM). The former is characterized by fast operation speed, and the latter is characterized by more considered context information. And thirdly, the transformer structure directly takes text as input (namely, an embedded layer is not required to be input), and meanwhile, the capability of coding context information is strongest. The named entity recognition system based on the method achieves the best effect at present.
The decoding output layer 103 is used for decoding each word vector after context information extraction and outputting a predicted named entity tag. The implementation of this layer is mainly Softmax.
The general named entity identification process is as follows:
1. the input word sequence is input to the input embedding layer 101, and each word in the input word sequence is converted into an input word vector.
2. All word vectors corresponding to the converted word sequence are input to the context information coding layer 102, and the context information coding layer 102 outputs a context-coded word vector for each word in the word sequence.
3. The word vector output in the previous step is input to the decoding output layer 103, and the decoding output layer 103 outputs the predicted named entity tag.
4. Comparing the predicted label with the manually marked label, and calculating a target function; network parameters of the named entity recognition model are updated by optimizing the objective function.
5. Repeating the steps 1-4 until the expected effect is achieved.
Furthermore, external syntactic knowledge is also utilized in deep learning-based models, since the amount of labeled text in the scientific field is often insufficient to support sufficient training of deep learning models, and the effectiveness of introducing external syntactic knowledge on named entity recognition tasks in conventional methods has been demonstrated. The method for adding syntactic knowledge to the deep learning method based on sequence labeling generally comprises the steps of mapping the syntactic knowledge acquired by an automatic method into a syntactic knowledge vector of a high-dimensional continuous space at an input embedding layer, and directly connecting the syntactic knowledge vector and a word vector in series (concatenate). However, this method of directly concatenating the syntactic knowledge vector with the word vector does not take into account the differences in the contributions of different knowledge to the named entity tag of the word, and may cause inaccurate knowledge with little contribution or acquired by automatic methods to mislead the named recognition model, thereby predicting the wrong named entity tag. Thus, this inaccurate knowledge negatively impacts the named entity recognition system.
In order to model the weight of the syntactic knowledge according to the contribution of the external syntactic knowledge to named entity recognition and further effectively integrate the syntactic knowledge with large contribution into a deep learning system framework based on sequence labeling, the invention innovates between a context information coding layer and a decoding output layer and provides a module based on a key-value memory neural network. More specifically, in the invention, for each word in an input word sequence, a key-value memory neural network module extracts context characteristics and syntactic knowledge related to the word from context characteristics and syntactic knowledge acquired by an automatic method, weights the syntactic knowledge by mapping the context characteristics into key vectors, mapping the syntactic knowledge corresponding to the context characteristics into value vectors and converting between key values, and introduces the weighted syntactic knowledge into a general deep learning named entity recognition system based on sequence labeling. Therefore, the context characteristics can be effectively utilized in a deep learning framework to weight the corresponding syntactic knowledge, and the performance of the named entity recognition system is further improved.
FIG. 2 illustrates a method flow diagram that specifically illustrates a process for named entity identification, according to one embodiment of the invention.
Typically, the invention is implemented by a computer device. When a general-purpose computer device is configured with program modules to implement the present invention, it will be the specialized named entity recognition device rather than any general-purpose computer or processor. However, those skilled in the art will appreciate that the foregoing description is intended only to illustrate that the present invention may be applied to any general purpose computing device, which becomes a specific named entity recognition device, when the present invention is applied to a general purpose computing device.
As shown in fig. 2, in step S210, the named entity recognition device obtains context characteristics of each word and corresponding syntactic knowledge thereof according to the input word sequence; in step S220, the named entity recognition device maps each context feature and corresponding syntactic knowledge of each word into a key vector and a corresponding value vector, respectively, for each word; in step S230, the named entity recognition device determines a weighted sum vector of all value vectors of each word, wherein each value vector is weighted according to the word vector of the word and a key vector corresponding to the value vector; in step S240, the named entity recognition apparatus performs named entity prediction on the word vector of each word in the input word sequence and the vector obtained by concatenating the weighted sum vector to obtain a corresponding recognition result, where the recognition result indicates a named entity tag of each word.
Referring collectively to fig. 2 and 3, wherein fig. 3 illustrates a framework diagram of a named entity recognition model in accordance with an example of the present invention.
Specifically, in step S210, the named entity recognition device obtains the context feature of each word and its corresponding syntactic knowledge according to the input word sequence.
Here, the input word sequence is taggedWherein x isiIndicating a word in the sequence of input words, l indicating the length of the sequence of input words.
According to one example of the present invention, for a sequence of input wordsThe named entity recognition device obtains the context characteristics of the whole input word sequence and the corresponding syntactic knowledge through an automatic tool. The context features and their associated syntactic knowledge are two lists of the same length, one for eachAndis shown in whichThe context feature of a certain position t, the corresponding knowledge of which is inAgain, the position in (a) is t.
The automatic tool can be any existing Chinese automatic analysis tool, such as Chinese processing tool (https:// stanfordlp. github. io/CoreNLP/index. html) released by Stanfordlp university, which includes multiple language analysis functions such as Chinese part-of-speech tagging, sentence component analysis, dependency syntax analysis, and the like. Taking the input word sequence "zhang san/has/congenital/heart disease" as an example, the chinese processing tool is applied to the input word sequence, so as to obtain various information of the word sequence, such as part-of-speech tagging, sentence component analysis, dependency syntactic analysis, etc., for example, part-of-speech information is "zhang san _ NR/has _ VV/congenital _ NN/heart disease _ NN", and sentence component information is (S (NP zhang san) (VP (has) (NP (congenital heart disease))) (where S is a syntax tree, S is a root node, which indicates a sentence, NP and VP are intermediate nodes, which respectively indicate a noun phrase and a verb phrase, and a chinese word is a root node.). For a word in an input word sequence, for example, "congenital", the "context feature" refers to word information of the word context, and the "syntactic knowledge" refers to part-of-speech information, constituent sentence information, and the like. It should be noted that although a variety of syntactic knowledge is available through the Chinese processing tool, the named entity recognition model of the present invention focuses on and utilizes only one of the syntactic knowledge, e.g., only part-of-speech information. Specifically, if the context characteristic is defined as one word before and after the word and the syntactic knowledge concerned is part-of-speech knowledge, the context characteristic of the word "congenital" is [ "congenital", "suffering from", "heart disease" ], and the corresponding syntactic knowledge is [ "congenital _ NN", "suffering from _ VV", "heart disease _ NN" ]. Since each context feature (e.g., "heart disease") has a knowledge instance (e.g., "heart disease _ NN") corresponding to it, the context feature and syntax knowledge occur in pairs.
In addition to the above-described automated tools, the methods of obtaining "contextual characteristics" and "syntactic knowledge" to which the present invention is applicable also include, but are not limited to, methods of manual tagging, querying dictionaries, knowledge bases, and the like.
For each word x in the input word sequenceiNamed entity recognition device fromAndto extract contextual features related to the wordAnd its corresponding syntax knowledgeFor example, the named entity recognition device extracts the presence word xiContext characteristics and corresponding syntactic knowledge in the range of front and back 2 words are marked respectivelyAndaccordingly, the word xiOne contextual feature of (a) may be labeled as key ki,j(wherein j is 1,2, … mi) The context feature ki,jThe corresponding syntactic knowledge may be labeled as a value
In step S220, the named entity recognition device maps each of its contextual characteristics and corresponding syntactic knowledge to a key vector and a corresponding value vector, respectively, for each word in the sequence of input words.
Here, the mapping of the key vector and the mapping of the value vector are performed by corresponding embedding functions, respectively. The function of both embedding functions is to convert each instance into a vector representing the instance.
This is similar to the input embedding layer 301 converting each word in the input word sequence into a word vector through an embedding function.
For example, the input word sequence x ═ x1x2…xlIs input to the input embedding layer 301, via the word embedding function ExInputting each word x in the sequence of wordsiIs converted into an input word vector
Likewise, x for each word in the input word sequenceiA set of key-values (i.e. a context feature k)i,jAnd its corresponding syntax knowledge) The named entity recognition device can embed a function E through a keykSum value embedding function EvRespectively mapped as feature embedded vectors (key vectors)And knowledge embedding vector (value vector)
Thus, for each word xiThe named entity recognition device obtains all of itKey vectorAnd all value vectors
In step S230, the named entity recognition device determines a weighted sum vector of all value vectors of each word in the input word sequence, wherein each value vector is weighted according to the word vector of the word and the key vector corresponding to the value vector.
Each word x in the input word sequencei(i-1, 2, …, l) is converted into a word vector via the input embedding layer 301After the context information encoding layer 302, a word vector h containing context information is obtainedi=Encoder(xi)。
The above steps of converting and encoding the word vector need only be performed before determining the weight of the value vector in step S230, and may be performed in parallel with steps S210 and S220, or before or after steps S210 and S220.
Subsequently, in step 230, the word xiWord vector hiAnd all key vectorsAnd all value vectorsIs input to the key-value mnemonic neural network layer 303.
Specifically, in the key-value memory neural network layer 303, each word x in the sequence of words is inputiEach value vector ofMay be combined with a word vector h of the word containing context informationiAnd the value vectorCorresponding key vectorTo be determined.
According to one example of the invention, each word xiEach value vector ofAccording to the word vector h of the wordiAnd the value vectorCorresponding key vectorThe inner product of (2) is determined.
For example, each word xiEach value vector ofWeight p ofi,jAccording to the word vector hiAnd key vectorInner product of (2) in the word vector hiAnd the sum of the inner products of the key vectors, as follows:
wherein,is the word xiA word vector h obtained after inputting an embedding layer and a context information coding layeriVector of sum valuesCorresponding key vectorThe inner product of (d). Here, for the ith word xiUsing contextual characteristics (keys) ki,jCalculating the corresponding knowledge (value) v assigned to iti,jWeight p ofi,j。
Then, according to the weight pi,jThe weighted sum of the syntactic knowledge vectors is computed as follows:
thereby completing each context feature vectorFor its corresponding syntactic knowledge vectorThe weighting of (2). This may also be understood as each context feature ki,jKnowledge of the syntax corresponding theretoThe weighting of (2).
Alternatively, the mapping of the key vectors and value vectors in step 220 is also implemented by the key-value memory neural network layer 303, i.e. the key embedding function EkSum value embedding functionIs integrated in the key-value mnemonic neural network layer 303 to input each word xiContext feature k ofi,jAnd its corresponding syntax knowledgeRespectively mapped as feature embedded vectors (key vectors)And knowledge embedding vector (value vector)
Further, alternatively, each word x is processed in step 210iContext feature k ofi,jAnd its corresponding syntax knowledgeThe obtaining of (2) can also be realized by the key-value mnemonic neural network layer 303, that is, the key-value mnemonic neural network layer 303 obtains the context characteristics of the whole input word sequence from the outsideAnd corresponding syntax knowledgeThen, for each word x thereiniExtracting contextual featuresAnd its corresponding syntax knowledge
In step S240, the named entity recognition device performs named entity prediction after concatenating the word vector corresponding to each word in the input word sequence and the weighted sum vector of the value vector of the word, so as to obtain a corresponding recognition result, where the recognition result indicates a named entity tag of each word in the word sequence.
Here, the weighted sum vector a output by the key-value mnemonic neural network layer 303iAnd the word vector h output by the context information coding layer 302iAfter concatenation, the data is input to the decoding output layer 304 to obtain the output prediction label
By introducing the key-value memory neural network module, the context characteristics are mapped into key vectors, and the syntactic knowledge corresponding to the context characteristics is mapped into value vectors.
The process of training a named entity recognition model according to one embodiment of the present invention is further described below, in which the process of training a named entity recognition model is described.
Herein, the named entity recognition model for recognizing the named entity of the input word sequence according to the present invention is obtained by introducing a key-value memory neural network layer into the existing named entity recognition model. Thus, the named entity recognition model for model training described below is shown in fig. 3 and includes an input embedding layer 301, a context information encoding layer 302, a key-value memorizing layer 303, and a decoding output layer 304.
Referring to fig. 3 and 5, in the first round of training:
in step S510, the named entity recognition device obtains context features of each word and syntax knowledge corresponding to the context features according to the input word sequence; in step S520, the named entity recognition device maps each context feature and corresponding syntactic knowledge of each word into a key vector and a corresponding value vector, respectively, for each word; in step S530, the named entity recognition device determines a weighted sum vector of all value vectors of each word, wherein each value vector is weighted according to the word vector of the word and a key vector corresponding to the value vector; in step S540, the named entity recognition device performs named entity prediction on the word vector of each word in the input word sequence and the vector obtained by concatenating the weighted sum vector to obtain a corresponding recognition result, where the recognition result indicates a named entity tag of each word; in step S550, the named entity recognition device compares the recognition result with the actual named entity tag to update the relevant parameters of the named entity recognition model.
Subsequently, based on the updated parameters of the named entity recognition model, the above steps S510-S550 are repeatedly performed until the objective function of the named entity recognition model converges.
Specifically, in step S510, the named entity recognition device obtains the context feature of each word and its corresponding syntactic knowledge according to the input word sequence.
Here, the input word sequence is taggedWherein x isiIndicating a word in the sequence of input words, l indicating the length of the sequence of input words.
According to one example of the present invention, for a sequence of input wordsNamed entity recognition equipment obtains context characteristics of whole input word sequence through automatic toolAnd its corresponding syntax knowledgeFor each word x in the input word sequenceiNamed entity recognition device fromAndto extract contextual features related to the wordAnd its corresponding syntax knowledgeAccordingly, the word xiOne contextual feature of (a) may be labeled as key ki,j(wherein j is 1,2, … mi) The context feature ki,jThe corresponding syntactic knowledge may be labeled as a value
In step S520, the named entity recognition device maps each of its contextual characteristics and corresponding syntactic knowledge to a key vector and a corresponding value vector, respectively, for each word in the sequence of input words.
Here, the mapping of the key vector and the mapping of the value vector are performed by corresponding embedding functions, respectively. This is similar to the input embedding layer 301 converting each word in the input word sequence into a word vector through an embedding function.
For example, the input word sequence x ═ x1x2…xlIs input to the input embedding layer 301, via the word embedding function ExInputting each word x in the sequence of wordsiIs converted into an input word vector
Likewise, x for each word in the input word sequenceiA context feature k ofi,jAnd its corresponding syntactic knowledge vi,jThe named entity recognition device can embed a function E through a keykSum value embedding function EvRespectively mapped as feature embedded vectors (key vectors)And knowledge embedding vector (value vector)
Thus, for each word xiThe named entity recognition device obtains all its key vectorsAnd all value vectors
In step S530, the named entity recognition device determines a weighted sum vector of all value vectors of each word in the input word sequence, wherein each value vector is weighted according to the word vector of the word and a key vector corresponding to the value vector.
Each word x in the input word sequencei(i-1, 2, …, l) is converted into a word vector via the input embedding layer 301After the context information encoding layer 302, a word vector h containing context information is obtainedi=Encoder(xi)。
Then, in step 530, the word xiWord vector hiAnd all key vectorsAnd all value vectorsIs input to the key-value mnemonic neural network layer 303.
Specifically, in the key-value memory neural network layer 303, each word x in the sequence of words is inputiEach value vector ofMay be combined with a word vector h of the word containing context informationiAnd the value vectorCorresponding key vectorTo be determined.
According to one example of the invention, each word xiEach value vector ofAccording to the word vector h of the wordiAnd the value vectorCorresponding key vectorThe inner product of (2) is determined.
For example, each word xiWeight p of each value vector ofi,jAccording to the word vector hiAnd key vectorIs determined at the word vector hiAnd the sum of the inner products of the key vectors, as specified in equation (1) above.
Then, according to the weight p as in the above formula (2)i,jA weighted sum of all syntactic knowledge vectors is computed.
Thereby completing the weighting of the context feature to the syntactic knowledge.
In step S540, the named entity recognition device performs named entity prediction after concatenating the word vector corresponding to each word in the input word sequence and the weighted sum vector of the value vector of the word, so as to obtain a corresponding recognition result, where the recognition result indicates a named entity tag of each word in the word sequence.
Here, the weighted sum vector a output by the key-value mnemonic neural network layer 303iAnd the word vector h output by the context information coding layer 302iAfter concatenation, the data is input to the decoding output layer 304 to obtain the output prediction label
In step S550, the named entity recognition device compares the recognition result output in step S540 with the actual named entity tag to update the relevant parameters of the named entity recognition model.
According to an example of the present invention, the named entity recognition device calculates an objective function, which may be, for example, a "cross entropy function" (cross entropy), and adjusts relevant parameters of the named entity recognition model according to the calculation result, for example, by updating the input embedding layer 301, the context information encoding layer 302, the key-value memory neural network layer 303, the decoding output using a back propagation algorithmAll parameters of layer 304, including key vectorsValue vectorInputting word vectors of an embedding layerAnd the like. I.e. in addition to the word xiContext feature ki,jAnd syntactic knowledgeIn addition, all the parameters that occur are updated. When the calculation result of the objective function converges, the training ends.
And when the calculation result of the target function does not reach convergence, the named entity recognition equipment enters the next round of training process of the named entity recognition model after updating the relevant parameters of the named entity recognition model. And starting from the second round of training, repeatedly acquiring the context characteristics of each word and the corresponding syntactic knowledge for the input word sequence. That is, the above steps S520 to S550 are repeatedly performed only until the calculation result of the objective function converges.
Fig. 6 shows a schematic diagram of an apparatus according to an embodiment of the invention, in which a named entity recognition apparatus is specifically shown.
As shown in fig. 6, the named entity recognition arrangement 60 comprises an obtaining module 61, a mapping module 62, a weighting module 63 and a prediction module 64. Referring to fig. 3, the named entity recognition apparatus 60 is further coupled with a named entity recognition model, which includes an input embedding layer 301, a context information encoding layer 302, a key-value memory neural network layer 303, and a decoding output layer 304.
The obtaining module 61 obtains the context characteristics of each word and the corresponding syntactic knowledge thereof according to the input word sequence; mapping module 62 maps each context feature and corresponding syntactic knowledge of the word to a key vector and a corresponding value vector, respectively, for the word; the weighting module 63 determines a weighted sum vector of all value vectors of each word, wherein each value vector is weighted according to the word vector of the word and a key vector corresponding to the value vector; the prediction module 64 performs named entity prediction on the word vector of each word in the input word sequence and the vector obtained by concatenating the weighted sum vector to obtain a corresponding recognition result, where the recognition result indicates the named entity tag of each word.
Specifically, the obtaining module 61 obtains the context feature of each word and the corresponding syntactic knowledge thereof according to the input word sequence.
Here, the input word sequence is taggedWherein x isiIndicating a word in the sequence of input words, l indicating the length of the sequence of input words.
According to one example of the present invention, for a sequence of input wordsThe obtaining module 61 obtains the context characteristics of the whole input word sequence through an automatic toolAnd its corresponding syntax knowledgeFor each word x in the input word sequenceiAn acquisition module 61 fromAndto extract contextual features related to the wordAnd its corresponding syntax knowledgeAccordingly, the word xiOne contextual feature of (a) may be labeled as key ki,j(wherein j is 1,2, … mi) The context feature ki,jThe corresponding syntactic knowledge may be labeled as a value
Mapping module 62 maps each contextual feature and corresponding syntactic knowledge of each word in the sequence of input words to a key vector and a corresponding value vector, respectively.
Here, the mapping of the key vector and the mapping of the value vector are performed by corresponding embedding functions, respectively. This is similar to the input embedding layer 301 converting each word in the input word sequence into a word vector through an embedding function.
For example, the input word sequence x ═ x1x2…xlIs input to the input embedding layer 301, via the word embedding function ExInputting each word x in the sequence of wordsiIs converted into an input word vector
Likewise, x for each word in the input word sequenceiA context feature k ofi,jAnd its corresponding syntax knowledgeMapping module 62 may embed function E by a keykSum value embedding functionRespectively mapping them into feature embedded vectors (key vectors)And knowledge embedding vector (value vector)
The weighting module 63 determines a weighted sum vector of all value vectors of each word in the sequence of input words, wherein the weight of each value vector is determined in combination with the word vector of the word containing context information and the key vector corresponding to the value vector.
Each word x in the input word sequencei(i-1, 2, …, l) is converted into a word vector via the input embedding layer 301After the context information encoding layer 302, a word vector h containing context information is obtainedi=Encoder(xi)。
Then, the word xiWord vector hiAnd all key vectorsAnd all value vectorsIs input to the key-value mnemonic neural network layer 303.
Specifically, in the key-value memory neural network layer 303, each word x in the sequence of words is inputiEach value vector ofMay be combined with a word vector h of the word containing context informationiAnd the value vectorCorresponding key vectorTo be determined.
According to one example of the invention, each word xiEach value vector ofAccording to the word vector h of the wordiAnd the value vectorCorresponding key vectorThe inner product of (2) is determined.
For example, each word xiWeight p of each value vector ofi,jAccording to the word vector hiAnd key vectorIs determined at the word vector hiAnd the sum of the inner products of the key vectors, as specified in equation (1) above.
Then, according to the weight p as in the above formula (2)i,jA weighted sum of all syntactic knowledge vectors is computed.
Thereby completing the weighting of the context feature to the syntactic knowledge.
The prediction module 64 performs named entity prediction after concatenating the word vector corresponding to each word in the input word sequence with the weighted sum vector of the value vectors of the words to obtain a corresponding recognition result, where the recognition result indicates the named entity tag of each word in the word sequence.
Here, the weighted sum vector a output by the key-value mnemonic neural network layer 303iAnd the word vector h output by the context information coding layer 302iAfter concatenation, the data is input to the decoding output layer 304 to obtain the output prediction label
According to an example of the present invention, the named entity recognition apparatus 60 further comprises a comparison module (not shown in fig. 6) when the named entity recognition model is in the training process.
In the first round of training, after the obtaining module 61, the mapping module 62, the weighting module 63 and the prediction module 64 sequentially perform their corresponding operations, the comparison module compares the recognition result output by the prediction module 64 with the actual named entity tag to update the relevant parameters of the named entity recognition model. Then, the next round of training is started, namely, the mapping module 62, the weighting module 63 and the prediction module 64 are triggered again to execute the corresponding operations; and circularly executing the operation until the target function of the named entity recognition model converges.
According to an example of the present invention, the comparison module calculates an objective function, which may be, for example, a "cross entropy function" (cross entropy), and adjusts relevant parameters of the named entity recognition model according to the calculation result, for example, by using a back propagation algorithm to update all parameters of the input embedding layer 301, the context information encoding layer 302, the key-value memory neural network layer 303, and the decoding output layer 304, including the key vectorValue vectorInputting word vectors of an embedding layerAnd the like. I.e. in addition to the word xiContext feature ki,jAnd syntactic knowledge vi,jIn addition, all the parameters that occur are updated. When the calculation result of the objective function converges, the training ends.
When the calculation result of the objective function has not reached convergence, the comparison module re-triggers the mapping module 62 to enter the next round of training process for the named entity recognition model after updating the relevant parameters of the named entity recognition model.
It should be noted that the present invention may be implemented in software and/or in a combination of software and hardware, for example, as an Application Specific Integrated Circuit (ASIC), a general purpose computer or any other similar hardware device. In one embodiment, the software program of the present invention may be executed by a processor to implement the steps or functions described above. Also, the software programs (including associated data structures) of the present invention can be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Further, some of the steps or functions of the present invention may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.
In addition, at least a portion of the present invention may be implemented as a computer program product, such as computer program instructions, which, when executed by a computing device, may invoke or provide methods and/or aspects in accordance with the present invention through operation of the computing device. Program instructions which invoke/provide the methods of the present invention may be stored on fixed or removable recording media and/or transmitted via a data stream over a broadcast or other signal-bearing medium, and/or stored in a working memory of a computing device operating in accordance with the program instructions.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Claims (15)
1. A named entity recognition method, wherein the method comprises the steps of:
acquiring context characteristics of each word and corresponding syntactic knowledge thereof according to the input word sequence;
for each word, mapping each context feature and corresponding syntactic knowledge of the word into a key vector and a corresponding value vector respectively;
determining a weighted sum vector of all value vectors of each word, wherein each value vector is weighted according to the word vector containing the context information of the word and the corresponding key vector of the value vector;
and conducting named entity prediction on the word vector of each word in the input word sequence and the vector formed by connecting the weighted sum vector in series so as to obtain a corresponding recognition result, wherein the recognition result indicates the named entity label of each word.
2. The method of claim 1, wherein the weight of each value vector of each word is further determined by an inner product of the word vector of the word and the key vector corresponding to the value vector.
3. The method according to claim 1 or 2, wherein the step of obtaining specifically comprises:
acquiring context characteristics matched with the input word sequence and corresponding syntactic knowledge thereof according to the input word sequence;
and acquiring the context characteristics of each word and the corresponding syntactic knowledge from the matched context characteristics and the corresponding syntactic knowledge.
4. The method according to any of claims 1 to 3, wherein the method is performed by a named entity recognition model,
wherein, when the named entity recognition model is in the training process, the method further comprises:
comparing the recognition result with an actual named entity tag to update relevant parameters of the named entity recognition model;
and after the mapping step, the weighting step and the predicting step are executed again, comparing the obtained identification result with the actual named entity label, updating the related parameters of the named entity identification model according to the identification result, and executing the steps in a circulating way until the target function of the named entity identification model converges.
5. A named entity recognition method, wherein, the named entity recognition model includes inputting and imbedding the layer, context information coding layer, key-value memory neural network layer and decoding the output layer;
wherein, the method comprises the following steps:
acquiring context characteristics of each word and corresponding syntactic knowledge thereof according to the input word sequence;
inputting the context feature of each word and the corresponding syntactic knowledge thereof to the key-value memory neural network layer so as to map each context feature of each word and the corresponding syntactic knowledge thereof into a key vector and a corresponding value vector respectively;
inputting a word vector output by each word in the input word sequence through the input embedding layer and the context information coding layer into the key-value memory neural network layer to obtain a weighted sum vector of all value vectors of each word, wherein each value vector is weighted according to the word vector of the word and a key vector corresponding to the value vector;
and inputting the word vector of each word in the input word sequence and the vector formed by connecting the weighted sum vector in series into the decoding output layer to obtain a corresponding recognition result, wherein the recognition result indicates the named entity label of each word.
6. The method of claim 5, wherein when the named entity recognition model is in a training process, the method further comprises:
comparing the recognition result with an actual named entity tag to update relevant parameters of the named entity recognition model;
and after the mapping step, the weighting step and the predicting step are executed again, comparing the obtained identification result with the actual named entity label, updating the related parameters of the named entity identification model according to the identification result, and executing the steps in a circulating way until the target function of the named entity identification model converges.
7. A named entity recognition apparatus, wherein the apparatus comprises:
the acquisition module is used for acquiring the context characteristics of each word and the corresponding syntactic knowledge thereof according to the input word sequence;
a mapping module for mapping each context feature and corresponding syntactic knowledge of each word into a key vector and a corresponding value vector, respectively;
a weighting module for determining a weighted sum vector of all value vectors of each word, wherein each value vector is weighted according to a word vector of the word containing context information and a key vector corresponding to the value vector;
and the prediction module is used for conducting named entity prediction on the word vector of each word in the input word sequence and the vector formed by connecting the weighted sum vector in series so as to obtain a corresponding recognition result, and the recognition result indicates the named entity label of each word.
8. The apparatus of claim 7, wherein the weight of each value vector of each word is further determined according to an inner product of the word vector of the word and a key vector corresponding to the value vector.
9. The apparatus according to claim 7 or 8, wherein the obtaining module is specifically configured to:
acquiring context characteristics matched with the input word sequence and corresponding syntactic knowledge thereof according to the input word sequence;
and acquiring the context characteristics of each word and the corresponding syntactic knowledge from the matched context characteristics and the corresponding syntactic knowledge.
10. The apparatus according to any of claims 7 to 9, wherein the apparatus is coupled with a named entity recognition model,
when the named entity recognition model is in a training process, the device further comprises:
and the comparison module is used for comparing the identification result with an actual named entity label so as to update the related parameters of the named entity identification model, triggering the mapping module, the weighting module and the prediction module again to execute the corresponding operations, and circularly executing the operations until the target function of the named entity identification model is converged.
11. A named entity recognition device, wherein the named entity recognition device is coupled with a named entity recognition model, and the named entity recognition model comprises an input embedding layer, a context information coding layer, a key-value memory neural network layer and a decoding output layer;
wherein, the device includes:
the acquisition module is used for acquiring the context characteristics of each word and the corresponding syntactic knowledge thereof according to the input word sequence;
a mapping module, configured to input the context feature of each word and the syntax knowledge corresponding to the context feature to the key-value memory neural network layer, so as to map each context feature of each word and the syntax knowledge corresponding to the context feature of each word into a key vector and a corresponding value vector, respectively;
a weighting module, configured to input a word vector output by each word in the input word sequence via the input embedding layer and the context information encoding layer to the key-value memory neural network layer, so as to obtain a weighted sum vector of all value vectors of each word, where each value vector is weighted according to the word vector of the word and a key vector corresponding to the value vector;
and the prediction module is used for inputting the word vector of each word in the input word sequence and the vector formed by connecting the weighted sum vector in series into the decoding output layer so as to obtain a corresponding recognition result, and the recognition result indicates the named entity label of each word.
12. The apparatus of claim 11, wherein the apparatus further comprises:
and the comparison module is used for comparing the identification result with an actual named entity label so as to update the related parameters of the named entity identification model, triggering the mapping module, the weighting module and the prediction module again to execute the corresponding operations, and circularly executing the operations until the target function of the named entity identification model is converged.
13. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 6 when executing the computer program.
14. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the method of any of claims 1 to 6.
15. A computer program product implementing the method of any one of claims 1 to 6 when executed by a computer device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010054650.XA CN111291565A (en) | 2020-01-17 | 2020-01-17 | Method and device for named entity recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010054650.XA CN111291565A (en) | 2020-01-17 | 2020-01-17 | Method and device for named entity recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111291565A true CN111291565A (en) | 2020-06-16 |
Family
ID=71021220
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010054650.XA Pending CN111291565A (en) | 2020-01-17 | 2020-01-17 | Method and device for named entity recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111291565A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111738007A (en) * | 2020-07-03 | 2020-10-02 | 北京邮电大学 | A Data Augmentation Algorithm for Chinese Named Entity Recognition Based on Sequence Generative Adversarial Networks |
CN116724305A (en) * | 2021-01-20 | 2023-09-08 | 甲骨文国际公司 | Integration of context labels with named entity recognition models |
WO2023226292A1 (en) * | 2022-05-27 | 2023-11-30 | 苏州思萃人工智能研究所有限公司 | Method for extracting relation from text, relation extraction model, and medium |
WO2024021343A1 (en) * | 2022-07-29 | 2024-02-01 | 苏州思萃人工智能研究所有限公司 | Natural language processing method, computer device, readable storage medium, and program product |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101510221A (en) * | 2009-02-17 | 2009-08-19 | 北京大学 | Enquiry statement analytical method and system for information retrieval |
CN102654866A (en) * | 2011-03-02 | 2012-09-05 | 北京百度网讯科技有限公司 | Method and device for establishing example sentence index and method and device for indexing example sentences |
CN110008480A (en) * | 2018-12-05 | 2019-07-12 | 中国科学院自动化研究所 | Small data vocabulary dendrography learning method and system and relevant device based on prototype memory |
-
2020
- 2020-01-17 CN CN202010054650.XA patent/CN111291565A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101510221A (en) * | 2009-02-17 | 2009-08-19 | 北京大学 | Enquiry statement analytical method and system for information retrieval |
CN102654866A (en) * | 2011-03-02 | 2012-09-05 | 北京百度网讯科技有限公司 | Method and device for establishing example sentence index and method and device for indexing example sentences |
CN110008480A (en) * | 2018-12-05 | 2019-07-12 | 中国科学院自动化研究所 | Small data vocabulary dendrography learning method and system and relevant device based on prototype memory |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111738007A (en) * | 2020-07-03 | 2020-10-02 | 北京邮电大学 | A Data Augmentation Algorithm for Chinese Named Entity Recognition Based on Sequence Generative Adversarial Networks |
CN116724305A (en) * | 2021-01-20 | 2023-09-08 | 甲骨文国际公司 | Integration of context labels with named entity recognition models |
CN116724305B (en) * | 2021-01-20 | 2024-07-19 | 甲骨文国际公司 | Integration of context labels with named entity recognition models |
WO2023226292A1 (en) * | 2022-05-27 | 2023-11-30 | 苏州思萃人工智能研究所有限公司 | Method for extracting relation from text, relation extraction model, and medium |
WO2024021343A1 (en) * | 2022-07-29 | 2024-02-01 | 苏州思萃人工智能研究所有限公司 | Natural language processing method, computer device, readable storage medium, and program product |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108363790B (en) | Method, device, equipment and storage medium for evaluating comments | |
CN109033068B (en) | Method and device for reading and understanding based on attention mechanism and electronic equipment | |
US11544474B2 (en) | Generation of text from structured data | |
CN112836514B (en) | Nested entity identification method, apparatus, electronic device and storage medium | |
CN108932342A (en) | A kind of method of semantic matches, the learning method of model and server | |
CN111291565A (en) | Method and device for named entity recognition | |
US20230244704A1 (en) | Sequenced data processing method and device, and text processing method and device | |
CN112149386B (en) | Event extraction method, storage medium and server | |
CN112052684A (en) | Named entity identification method, device, equipment and storage medium for power metering | |
JP7121819B2 (en) | Image processing method and apparatus, electronic device, computer-readable storage medium, and computer program | |
CN110334186A (en) | Data query method, apparatus, computer equipment and computer readable storage medium | |
CN113704416A (en) | Word sense disambiguation method and device, electronic equipment and computer-readable storage medium | |
JP2022145623A (en) | METHOD AND APPARATUS FOR PROVIDING HINT INFORMATION AND COMPUTER PROGRAM | |
CN110968725B (en) | Image content description information generation method, electronic device and storage medium | |
CN111651985A (en) | Method and device for Chinese word segmentation | |
JP5441937B2 (en) | Language model learning device, language model learning method, language analysis device, and program | |
CN111241843B (en) | Semantic relation inference system and method based on composite neural network | |
CN114298048B (en) | Named entity recognition method and device | |
WO2019163752A1 (en) | Morpheme analysis learning device, morpheme analysis device, method, and program | |
CN113343692B (en) | Search intention recognition method, model training method, device, medium and equipment | |
CN114758330A (en) | Text recognition method and device, electronic equipment and storage medium | |
CN111339287B (en) | Abstract generation method and device | |
CN112380861A (en) | Model training method and device and intention identification method and device | |
CN117828024A (en) | Plug-in retrieval method, device, storage medium and equipment | |
CN115248846B (en) | Text recognition method, device and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200616 |
|
WD01 | Invention patent application deemed withdrawn after publication |