CN115906855A - Word information fused Chinese address named entity recognition method and device - Google Patents

Word information fused Chinese address named entity recognition method and device Download PDF

Info

Publication number
CN115906855A
CN115906855A CN202211690568.1A CN202211690568A CN115906855A CN 115906855 A CN115906855 A CN 115906855A CN 202211690568 A CN202211690568 A CN 202211690568A CN 115906855 A CN115906855 A CN 115906855A
Authority
CN
China
Prior art keywords
vector
label
word
character
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211690568.1A
Other languages
Chinese (zh)
Inventor
汪陈笑
鲍迪恩
蒋炜
邓静
陈盼盼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Bangrui Technology Co ltd
Zhejiang Bangsheng Technology Co ltd
Original Assignee
Hangzhou Bangrui Technology Co ltd
Zhejiang Bangsheng Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Bangrui Technology Co ltd, Zhejiang Bangsheng Technology Co ltd filed Critical Hangzhou Bangrui Technology Co ltd
Priority to CN202211690568.1A priority Critical patent/CN115906855A/en
Publication of CN115906855A publication Critical patent/CN115906855A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a method and a device for identifying a named entity of a Chinese address by fusing word information. The method mainly comprises the following three parts: the method comprises the steps of vocabulary information generation network construction, label distribution learning network construction and character label learning network construction. The invention aims to acquire and integrate vocabulary information in text representation, represents vocabularies through n-gram fragments and aims at overcoming the defect that a character model lacks enough context information. The invention ensures that the information of the merged words is based on the original data, accelerates the speed of obtaining the specific words by the model and improves the precision of the model.

Description

Word information fused Chinese address named entity recognition method and device
Technical Field
The invention relates to the field of recognition of named entities of Chinese addresses, in particular to a method and a device for recognizing named entities of Chinese addresses with fused word information.
Background
With the rapid development of informatization, fields highly related to addresses, such as take-out, postal service, financial pneumatic control and the like, are also stepped on with digitalization steps. The Chinese address named entity recognition refers to the recognition of various entities related to addresses from texts, and the subsequent related work is developed based on the entities, so that the entity recognition efficiency can greatly influence the subsequent task execution. Especially in the chinese domain, a single word has no special semantic information due to the lack of distinct spacing like spaces in the english domain. In the task of identifying the named entities in Chinese, firstly, correct word segmentation can be carried out on a Chinese sentence, and the method is very difficult in the absence of human priori knowledge. For example, the sentence "she says really is" is true, and the sentence "does", "really is" is also a correct word from the word segmentation point of view, but in the actual context, the sentence should be broken "she/say/really/ideal" based on the prior knowledge of a person. The named entity recognition also needs to recognize words and classify the words according to the context and the attributes of the words.
At present, the methods for integrating additional information into character vectors in the recognition of Chinese named entities are mainly divided into three types: firstly, searching words taking the current character as the ending word in a word list, and inputting all the found words as extra information and the characters into a model; secondly, searching vocabulary vectors containing the characters in a word list, integrating the vocabulary vectors according to a certain rule, fusing the obtained word vectors to the character vectors, and inputting the character vectors into the model; thirdly, integrating the label probability of the current character in all data, fusing the probability vector to the character vector and inputting the character vector into the model.
The first method needs to search words for each character, so that the number of word units required to be added to each piece of data is different, which causes the problem that the models cannot be trained in batch and the training speed is slow. The second approach described above searches for words in the vocabulary, but the words found do not necessarily conform to the text character vocabulary, and it is highly likely that erroneous vocabulary noise is introduced. The third method only adds label information to character information simply and roughly, and lacks the most critical vocabulary information.
In summary, the conventional time series data query method cannot simultaneously satisfy the following requirements:
1) The vocabulary information is quickly acquired, and the information of the vocabulary, the characters and the label probability is blended into the input character information.
2) The method has the advantages that more information is provided for the model, the recognition accuracy of the named entity of the model is improved, meanwhile, the model training and on-line prediction efficiency is guaranteed, and the model precision and the model speed are well balanced.
Disclosure of Invention
In the field of Chinese addresses, starting from text data, vocabulary information is effectively introduced into a model, character information and vocabulary information are integrated to carry out named entity recognition, and n-gram segment representation selected based on character lexeme information is selected as a generation source of the vocabulary information to provide sufficient information for the model to carry out a named entity recognition task.
The purpose of the invention is realized by the following technical scheme: in a first aspect, the invention provides a method for identifying a named entity of a Chinese address with fused word information, which comprises the following steps:
(1) The n-gram fragment vector for obtaining the Chinese address is expressed as X = (X) 1 ,x 2 ,...,x n ) And obtaining a corresponding real vocabulary fragment Y = (Y) 1 ,y 2 ,...,y m ) (ii) a Where n is the number of characters in the n-gram fragment and m is the number of characters in the real vocabulary fragment;
(2) Constructing a vocabulary information generation network, and adopting a structure of a double-tower model, wherein the network specifically operates as follows:
(2.1) inputting the n-gram fragments and the real vocabulary fragments into a vocabulary information generation network, and acquiring random character vector codes through an Embedding layer;
(2.2) encoding the character vector to learn the character vector representation via the ELMO layer and the Dense layer;
(2.3) the character vector represents that after passing through an average pooling layer (means), the text segment is characterized as a word vector;
(2.4) splicing the word vectors of the n-gram fragments and the word vectors of the real word fragments in a classification learning device, then continuously splicing the difference value and the point multiplication between the two word vectors to obtain the relation characteristics between the words, mapping the vector dimension into a two-dimensional space after passing through a full connection layer, and judging the similarity between the two vectors;
(3) Constructing a vocabulary information acquisition network, which comprises a label distribution learning network and a character label learning network;
the label distribution learning network obtains the character vector representation of the n-gram segment in the same way as the vocabulary information generation network, extracts the text characteristic code, uses the full connection layer as a decoder, and obtains the probability distribution P of the labels corresponding to the vocabulary label As a state matrix of the conditional random field, label inference is carried out through the conditional random field;
the character tag learning network specifically operates as follows:
(3.1) selecting a character vector E output by a label distribution learning network through an Embedding layer C As part of the embedding layer output;
(3.2) generating a network through vocabulary information according to different positions of the current word in the n-gram and the lexeme mark type q, and acquiring a word vector set before the last Dense layer
Figure BDA0004021144540000021
Figure BDA0004021144540000022
A word vector of a label type q;
(3.3) probability distribution P of the label obtained from the label distribution learning network label, Learning the probability P that each character label belongs to each lexeme label pos
(3.4) the word vector set E obtained according to the step (3.2) τ And (4) the lexeme tagging probability T obtained in the step (3.3) pos By tensor product
Figure BDA0004021144540000023
Obtaining vocabulary information E in an embedding layer W
(3.5) join character vector E C And vocabulary information E in the embedding layer W Inputting WP-LSTM model, then using Dense layer and conditional random field as decoder and label to push fault, outputting Z = (Z) 1 ,z 2 ,...,z n ) And finally learning the character relation in the Chinese address named entity recognition for the predicted label to realize the Chinese address named entity recognition.
Further, the ELMO is a network structure composed of two Bidirectional LSTM (Bidirectional LSTM); the ELMO layer final vector is expressed as:
Figure BDA0004021144540000031
wherein
Figure BDA0004021144540000032
Is the character vector of the ith position, gamma task is the coefficient related to the pre-training task, L is the number of layers, and/or is greater than or equal to>
Figure BDA0004021144540000033
Is a weight coefficient of the normalized correlation layer, is combined with the value of the preceding correlation layer>
Figure BDA0004021144540000034
For an output vector of layer j BilSTM>
Figure BDA0004021144540000035
Containing the preceding information->
Figure BDA0004021144540000036
Including postamble information.
Further, during the training process, the losses after and before the ELMO integration are the training target, i.e. the following loss is optimized:
Figure BDA0004021144540000037
/>
wherein theta is x A vector representing the input of a character is shown,
Figure BDA0004021144540000038
represents a forward LSTM parameter, <' > is selected>
Figure BDA0004021144540000039
Representing inverse LSTM parameter, θ S Denotes the softmax layer, p denotes the probability, t k Representing the text at position k.
Further, a text segment may be characterized as a word vector specifically being:
E vector =mean(sum(H vector ))
wherein H vector Representing the output vector of the previous layer, wherein the vector is X or Y, namely, encoding X and Y into the word vector feature E with the same dimension through a formula X And E Y
Further, the splicing operation of the step (2.4) is specifically as follows:
E=[E X ,E Y ,E X -E Y ,E X ⊙E Y ]
wherein E is the spliced vector.
Further, the conditional random field passes through the label probability distribution Pl abel And a genuine label Pg old Learning transition probabilities among the labels, and inferring the probabilities of all labels by the following equation:
Figure BDA00040211445400000310
wherein
Figure BDA00040211445400000311
Representing all possible label orders of the text input, O = WoH + bo representing the current character label probability, wo, b o Respectively representing a parameter matrix and a parameter vector, wherein H is the output of the previous layer; />
Figure BDA00040211445400000312
Is directed to a tag pair (y) i-1 ,y i ) Represents transition probabilities between them, p (y | s) represents the probability of label y under the condition of weight s; the goal of conditional random fields is to obtain the label order y that scores the greatest given the text input conditions *
Further, in step (3.2), for the word position labels appearing many times, one word vector is selected from all corresponding positions according to the equal probability to be used as the word vector of the word position label.
In a second aspect, the present invention provides a word information fused chinese address named entity recognition apparatus, including a memory and one or more processors, where the memory stores executable codes, and when the processors execute the executable codes, the word information fused chinese address named entity recognition method is implemented.
In a third aspect, the present invention provides a computer-readable storage medium, on which a program is stored, where the program, when executed by a processor, implements the method for recognizing a named entity in a chinese address with fused word information.
The invention has the beneficial effects that: the invention has the following technical characteristics:
1. the vocabulary information has less noise and high searching speed: in the vocabulary information generation stage, inputting each n-gram segment containing specific characters and the vocabulary thereof into a vocabulary information generation network together, so that the n-gram segment and the corresponding real vocabulary have similar word vector representation, and the association can be learned. Through the network, the model can obtain the vocabulary vector corresponding to each character in batch in the training data, and the obtained vocabulary vector is based on the real vocabulary in the text and does not need to depend on an additional external large vocabulary for searching.
2. The coding information is rich: in the vocabulary information acquisition stage, the vocabulary vectors are merged into the vocabulary vectors based on the lexeme distribution probability and the corresponding n-gram segments. In the training process, the model firstly predicts and outputs a classification soft label and a character vector through a label distribution learning network, and trains the network classification accuracy through a real label; then, the lexical information acquisition network obtains the lexical position distribution probability of the characters according to the soft labels, generates a network based on the lexical position distribution probability and lexical information to acquire lexical vectors, and predicts the final classification of the network after combining the character vectors. The network not only can combine the vocabulary vectors of the characters, but also can learn the weights of different vocabulary vectors through the distribution of the word position labels of the characters, finally obtains the vocabulary information, and provides richer word segmentation and labeling knowledge for the recognition task of the Chinese named entity.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram of a method for identifying a named entity of a Chinese address with fused word information according to the present invention;
FIG. 2 is a schematic diagram of a lexical information generation network;
FIG. 3 is a schematic diagram of a vocabulary information acquisition network;
FIG. 4 is a schematic view of the WP-LSTM model structure;
fig. 5 is a structural diagram of a recognition apparatus for a named entity of chinese address with fused word information according to the present invention.
Detailed Description
The following description will explain embodiments of the present invention in further detail with reference to the accompanying drawings.
The invention provides a method for fusing character and vocabulary information in a Chinese text to assist in accurately identifying named entities. The invention is mainly suitable for the technical fields of financial wind control, user marketing and the like.
The invention provides a Chinese address named entity recognition method with word information fusion, which is divided into two parts, wherein the first part is a vocabulary information generation network for generating vocabulary information, the second part is a vocabulary information acquisition network for acquiring vocabulary information, and the relationship between the two parts is shown in figure 1:
in the vocabulary information generation network, the training target is to learn the similarity between n-gram fragments and real vocabulary vectors, and a structure of a double-tower model is adopted. The specific structure is shown in fig. 2:
in this network the input consists of two parts, the n-gram fragment vector is denoted x = (x) 1 ,x 2 ,...,x n ) Where n is the number of characters in the segment, the real vocabulary vector is denoted as Y = (Y) 1 ,y 2 ,...,y m ) Where m is the number of characters in the real vocabulary. The learning steps are as follows:
1) And acquiring the random character vector code through an Embedding layer.
2) Character vectors are learned through an ELMO layer and a Dense layer to indicate that the ELMO is a network structure consisting of two Bidirectional LSTMs (Bidirectional LSTMs). The ELMO layer final vector is expressed as:
Figure BDA0004021144540000051
wherein
Figure BDA0004021144540000059
Is the character vector of the ith position, gamma task is the coefficient related to the pre-training task, and L is the layerNumber or more>
Figure BDA0004021144540000052
Is a weight coefficient of the normalized correlation layer, is based on the normalized correlation value>
Figure BDA0004021144540000053
For an output vector of layer j BilSTM>
Figure BDA0004021144540000054
Including preceding information, in>
Figure BDA0004021144540000055
Including the postamble information.
During training, the losses before and after ELMO synthesis are used as training targets, i.e. the following l is optimized oss
Figure BDA0004021144540000056
Wherein theta is x A vector representing the input of a character,
Figure BDA0004021144540000057
represents a forward LSTM parameter, <' > is selected>
Figure BDA0004021144540000058
Representing inverse LSTM parameter, θ s Denotes the softmax layer, p denotes the probability, t k Representing the text at position k.
3) After an average pooling layer (meanpooling), a text fragment can be characterized as a word vector, i.e.:
E vector =mean(sum(H vector ))
wherein H vector Representing the output vector of the previous layer, wherein the vector is X or Y, namely, encoding the X and the Y into the word vector characteristic E with the same dimension through a formula X And E Y
4) In a classification learning device, on the basis of splicing two word vectors of an n-gram segment and a real word segment, splicing a difference value and a point multiplication between the two word vectors to obtain a relational feature between words, namely:
E=[E X ,E Y ,E X -E Y ,E X ⊙E Y ]
wherein E is the spliced vector.
The final vector constructed in the mode contains direct characteristics and indirect characteristics between words, and the similarity between the two vectors is judged.
Finally, after passing through a full connection layer, vector dimensions are mapped into a two-dimensional space, parameters are updated based on cross entropy loss of two classes, and finally the trained class learner can accurately judge the similarity degree of two vocabulary segments and can also consider that word vector results taken out from an encoder are also similar. And generating a network based on the vocabulary information, wherein the word vector of the n-gram can have similar performance with the real vocabulary vector in subsequent tasks without searching the real vocabulary in the vocabulary.
In the vocabulary information acquisition network, the merging mode of the main learning generation vocabulary information comprises (a) a part-label distribution learning network and (b) a part-character label learning network, and the structure is shown in fig. 3:
in a part of a vocabulary information acquisition network (a), namely a label distribution learning network, the learning steps are as follows:
1) After acquiring character vector code, passing through ELMO layer and D e n se And the layer uses the BilSTM as a text characteristic encoder to extract text characteristic codes.
2) Obtaining probability distribution P of label by using full connection layer as decoder label The state matrix is used as a conditional random field, label inference is carried out through the conditional random field, and label transition probability in the transition matrix is learned. Conditional random field passing label probability distribution P label And a genuine label P gold Learning transition probabilities among the labels, and inferring the probabilities of all labels by the following equation:
Figure BDA0004021144540000061
wherein
Figure BDA0004021144540000062
Representing all possible label orders of text input C, O = WoH + bo representing the current character label probability, wo, b o Respectively representing a parameter matrix and a parameter vector, wherein H is the output of the previous layer; />
Figure BDA0004021144540000063
Is directed to a tag pair (y) i-1 ,y i ) Represents the transition probability between them, and p (y | s) represents the probability of label y under the condition of weight s. The goal of conditional random fields is to obtain the label order y that scores the greatest given the text input C *
In the part of the vocabulary information acquisition network (b) -the character label learning network, the learning steps are as follows:
1) Selecting character vector E output by label distribution learning network through Embedding layer C As part of the embedded layer output.
2) According to different positions of the current word in the n-gram and the lexeme mark type q, a network is generated through lexical information, and a word vector set before the last Dense layer is obtained
Figure BDA0004021144540000064
Figure BDA0004021144540000065
A word vector of a label type q; and q is the category of the word position label, and for the word position labels which appear for many times, a word vector is selected from all corresponding positions according to the equal probability to be used as the word vector of the word position label.
3) Probability distribution P of labels obtained according to label distribution learning network label, Learning the probability P that each character label belongs to each lexeme label pos
4) According to the word vector set E obtained in the step 2) τ And the lexeme labeling probability P obtained in the step 3) pos By tensor product
Figure BDA0004021144540000066
Obtaining vocabulary information E in an embedding layer W
5) Combined character vector E C And vocabulary information E in the embedding layer w I.e., E = [, ] E C,E w ]As input to WP-LSTM. The WP-LSTM model structure is shown in FIG. 4, where < pad in FIG. 4>Filling symbols for characters, and filling a fixed vector initialized at random, wherein when the acquired position of the n-gram segment exceeds the length of the text, the distance is less than pad>Characters pointing to c in the fill map, such as "Zhe", "Jiang" and "province", are encoded as E c Segments pointing to the w part are coded as E w And finally fuse together.
6) Using the Dense layer and conditional random field as decoder and label-push layer, output Z = (Z =) 1 ,z 2 ,...,z n ) And finally learning the character relation in the Chinese address named entity recognition for the predicted label to realize the Chinese address named entity recognition.
Corresponding to the embodiment of the method for identifying the named entity of the Chinese address fused with the word information, the invention also provides an embodiment of the device for identifying the named entity of the Chinese address fused with the word information.
Referring to fig. 5, the apparatus for identifying a named entity with a chinese address fused with word information according to an embodiment of the present invention includes a memory and one or more processors, where the memory stores executable codes, and the processors execute the executable codes to implement the method for identifying a named entity with a chinese address fused with word information according to the above embodiment.
The embodiment of the Chinese address named entity recognition device with word information fusion can be applied to any equipment with data processing capability, and the equipment with data processing capability can be equipment or devices such as computers. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for running through the processor of any device with data processing capability. In terms of hardware, as shown in fig. 5, a hardware structure diagram of any device with data processing capability where the apparatus for identifying a named entity of a chinese address with fused word and word information according to the present invention is located is shown, where in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 5, any device with data processing capability where the apparatus is located in the embodiment may generally include other hardware according to the actual function of the any device with data processing capability, and details thereof are not repeated.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiment, since it basically corresponds to the method embodiment, reference may be made to the partial description of the method embodiment for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.
The embodiment of the invention also provides a computer-readable storage medium, on which a program is stored, and when the program is executed by a processor, the method for identifying a named entity of a chinese address based on word information fusion in the above embodiment is implemented.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium may also be any external storage device of a device with data processing capabilities, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), etc. provided on the device. Further, the computer readable storage medium may include both an internal storage unit and an external storage device of any data processing capable device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing-capable device, and may also be used for temporarily storing data that has been output or is to be output.
The above-described embodiments are intended to illustrate rather than to limit the invention, and any modifications and variations of the present invention are within the spirit of the invention and the scope of the appended claims.

Claims (9)

1. A method for identifying a named entity of a Chinese address with fused word information is characterized by comprising the following steps:
(1) The n-gram fragment vector for obtaining the Chinese address is expressed as X = (X) 1 ,x 2 ,…,x n ) And obtaining a corresponding real vocabulary fragment Y = (Y) 1 ,y 2 ,…,y m ) (ii) a Wherein n is the number of characters in the n-gram fragment, and m is the number of characters in the real vocabulary fragment;
(2) Constructing a vocabulary information generation network, adopting a structure of a double-tower model, and specifically operating the network as follows:
(2.1) inputting the n-gram fragments and the real vocabulary fragments into a vocabulary information generation network, and acquiring random character vector codes through an Embedding layer;
(2.2) encoding the character vector to learn the character vector representation via the ELMO layer and the Dense layer;
(2.3) the character vector represents that after passing through an average pooling layer (mean pooling), the text segment is characterized as a word vector;
(2.4) splicing the word vectors of the n-gram fragments and the word vectors of the real word fragments in a classification learning device, then continuously splicing the difference value and the point multiplication between the two word vectors to obtain the relation characteristics between the words, mapping the vector dimension into a two-dimensional space after passing through a full connection layer, and judging the similarity between the two vectors;
(3) Constructing a vocabulary information acquisition network, which comprises a label distribution learning network and a character label learning network;
the label distribution learning network obtains the character vector representation of the n-gram segment in the same way as the vocabulary information generation network, extracts the text characteristic code, and obtains the probability distribution P of the label corresponding to the vocabulary by using the full connection layer as a decoder label Performing label inference through the conditional random field as a state matrix of the conditional random field;
the character tag learning network specifically operates as follows:
(3.1) selecting a character vector E output by a label distribution learning network through an Embedding layer C As part of the embedded layer output;
(3.2) according to different positions of the current word in the n-gram and the lexeme mark type q, generating a network through lexical information to obtain a word vector set before the last Dense layer
Figure FDA0004021144530000011
Figure FDA0004021144530000012
A word vector of a label type q;
(3.3) probability distribution P of the label obtained from the label distribution learning network label Learning the probability P that each character label belongs to each lexeme label pos
(3.4) the word vector set E obtained according to the step (3.2) τ And the lexeme tagging probability P obtained in the step (3.3) pos By tensor product
Figure FDA0004021144530000013
Obtaining vocabulary information E in an embedding layer W
(3.5) combining character vector E C And vocabulary information E in the embedding layer W Inputting WP-LSTM model, then using Dense layer and conditional random field as decoder and label to push fault, outputting Z = (Z) 1 ,z 2 ,…,z n ) And finally learning the character relation in the Chinese address named entity recognition for the predicted label to realize the Chinese address named entity recognition.
2. The method of claim 1, wherein the ELMO is a network structure consisting of two Bidirectional LSTM (Bidirectional LSTM); the ELMO layer final vector is expressed as:
Figure FDA0004021144530000021
wherein
Figure FDA0004021144530000022
Is the character vector of the ith position, gamma task For coefficients associated with a pre-training task, L is the number of layers, R is greater than or equal to>
Figure FDA0004021144530000023
Is a weight coefficient of the normalized correlation layer, is based on the normalized correlation value>
Figure FDA0004021144530000024
For an output vector of layer j BilSTM>
Figure FDA0004021144530000025
Including preceding information, in>
Figure FDA0004021144530000026
Including the postamble information.
3. The method of claim 2, wherein losses of the ELMO before and after synthesis are a training objective during training, that is, the following loss is optimized:
Figure FDA0004021144530000027
wherein theta is x A vector representing the input of a character is shown,
Figure FDA0004021144530000028
represents a forward LSTM parameter, <' > is selected>
Figure FDA0004021144530000029
Representing inverse LSTM parameter, θ s Denotes the softmax layer, p denotes the probability, t k Representing the text at position k.
4. The method for identifying a named entity of a chinese address with fused word information according to claim 1, wherein the text segment can be characterized as a word vector specifically comprising:
E vector =mean(sum(H vector ))
wherein H vector Representing the output vector of the previous layer, vector is X or Y, namely, X and Y are coded into word vector characteristics E with the same dimension through a formula X And E Y
5. The method for identifying a named entity of a chinese address with fused word information according to claim 1, wherein the concatenation operation in step (2.4) is specifically as follows:
E=[E X ,E Y ,E X -E Y ,E X ⊙E Y ]
wherein E is the spliced vector.
6. The method as claimed in claim 1, wherein the conditional random field passes through a label probability distribution P label And a genuine label P gold Learning transition probabilities among the labels, and inferring the probabilities of all labels by the following equation:
Figure FDA00040211445300000210
wherein
Figure FDA00040211445300000211
Representing all possible label orders of text input, O = W o H+b o Representing the current character label probability, W o 、b o Respectively representing a parameter matrix and a parameter vector, wherein H is the output of the previous layer; />
Figure FDA00040211445300000212
Is directed to a tag pair (y) i-1 ,y i ) Represents transition probability between them, p (y | s) represents probability of label y under weight s; the goal of conditional random fields is to obtain the label order y that scores the greatest given the text input conditions *
7. The method as claimed in claim 1, wherein in step (3.2), for the token labels appearing multiple times, a word vector is selected from all corresponding positions according to the same probability as the token labeled word vector.
8. A word information fused chinese address named entity recognition apparatus, comprising a memory and one or more processors, wherein the memory stores executable code, and the processors execute the executable code to implement a word information fused chinese address named entity recognition method according to any one of claims 1 to 7.
9. A computer-readable storage medium, on which a program is stored, which, when being executed by a processor, carries out a method for recognition of a chinese address-named entity with fusion of word information according to any one of claims 1 to 7.
CN202211690568.1A 2022-12-27 2022-12-27 Word information fused Chinese address named entity recognition method and device Pending CN115906855A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211690568.1A CN115906855A (en) 2022-12-27 2022-12-27 Word information fused Chinese address named entity recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211690568.1A CN115906855A (en) 2022-12-27 2022-12-27 Word information fused Chinese address named entity recognition method and device

Publications (1)

Publication Number Publication Date
CN115906855A true CN115906855A (en) 2023-04-04

Family

ID=86496938

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211690568.1A Pending CN115906855A (en) 2022-12-27 2022-12-27 Word information fused Chinese address named entity recognition method and device

Country Status (1)

Country Link
CN (1) CN115906855A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117852974A (en) * 2024-03-04 2024-04-09 禾辰纵横信息技术有限公司 Online evaluation score assessment method based on artificial intelligence

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117852974A (en) * 2024-03-04 2024-04-09 禾辰纵横信息技术有限公司 Online evaluation score assessment method based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN109388793B (en) Entity marking method, intention identification method, corresponding device and computer storage medium
Kang et al. Convolve, attend and spell: An attention-based sequence-to-sequence model for handwritten word recognition
WO2021082953A1 (en) Machine reading understanding method and apparatus, storage medium, and device
CN109062893B (en) Commodity name identification method based on full-text attention mechanism
CN110866401A (en) Chinese electronic medical record named entity identification method and system based on attention mechanism
CN114861600B (en) NER-oriented Chinese clinical text data enhancement method and device
CN111460115A (en) Intelligent man-machine conversation model training method, model training device and electronic equipment
CN111651974A (en) Implicit discourse relation analysis method and system
CN115438674B (en) Entity data processing method, entity linking method, entity data processing device, entity linking device and computer equipment
CN111695053A (en) Sequence labeling method, data processing device and readable storage medium
CN116151256A (en) Small sample named entity recognition method based on multitasking and prompt learning
CN113887229A (en) Address information identification method and device, computer equipment and storage medium
CN113836992A (en) Method for identifying label, method, device and equipment for training label identification model
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN115600597A (en) Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium
CN111859940A (en) Keyword extraction method and device, electronic equipment and storage medium
CN113486178A (en) Text recognition model training method, text recognition device and medium
CN116304307A (en) Graph-text cross-modal retrieval network training method, application method and electronic equipment
CN112036189A (en) Method and system for recognizing gold semantic
Schaback et al. Multi-level feature extraction for spelling correction
CN111368066B (en) Method, apparatus and computer readable storage medium for obtaining dialogue abstract
CN113705222B (en) Training method and device for slot identification model and slot filling method and device
CN115906855A (en) Word information fused Chinese address named entity recognition method and device
CN111353295A (en) Sequence labeling method and device, storage medium and computer equipment
CN116562291A (en) Chinese nested named entity recognition method based on boundary detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination