CN115374784A

CN115374784A - Chinese named entity recognition method based on multi-mode information selective fusion

Info

Publication number: CN115374784A
Application number: CN202210810750.XA
Authority: CN
Inventors: 蔡佳豪; 张华平; 商建云
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2022-07-11
Filing date: 2022-07-11
Publication date: 2022-11-22

Abstract

The invention relates to a Chinese named entity recognition method based on multi-mode information selective fusion, and belongs to the technical field of natural language processing. The invention effectively solves the problem of how to effectively integrate two important information, namely the character pronunciation and the character pattern, into the named entity recognition, and endows stronger semantic information to the vector by adding the pinyin and the radical sequence of the character carrying the semantic information into the input of the named entity recognition. By adopting selective fusion, the weights occupied by the character pronunciation and character pattern features can be dynamically controlled, and the named entity identification performance is effectively improved. The invention can provide effective support for natural language processing tasks such as machine translation, question answering system, reading understanding and the like.

Description

Chinese named entity recognition method based on multi-mode information selective fusion

Technical Field

The invention relates to a Chinese named entity recognition method, in particular to a Chinese named entity recognition method selectively fusing multi-modal information such as vocabularies, pronunciations, fonts and the like, and belongs to the technical field of natural language processing.

Background

Named Entity Recognition (NER) technology, which aims to detect the boundaries mentioned by an Entity from a given text sequence and to determine the category to which it belongs. The technology is the basis of a plurality of natural language processing downstream tasks such as machine translation, question answering system, reading and understanding and the like. The research difficulty of the NER mainly lies in the difficulty of semantic analysis of Chinese texts, which is shown in the aspects of difficult fusion of vocabulary information, difficult introduction of Chinese character semantic information and the like.

Unlike English text, there are no spaces between words in Chinese as boundaries, and the semantics that can be represented by a single word are limited. Therefore, how to integrate the vocabulary information into the model becomes a research hotspot of a plurality of Chinese natural language processing tasks.

In order to fully utilize vocabulary information with richer semantics in a Chinese named entity recognition task, technical personnel design a neural network model based on multiple resources such as an automatic word segmenter, multi-character representation and an external dictionary. The word segmentation-entity recognition framework operated in a pipeline mode is limited by the correctness of Chinese word segmentation and is poor in performance in the low-resource field. The multi-character representation contains co-occurrence information among characters, and context information is supplemented to a certain extent, but most multi-character combinations do not have actual word senses and cannot well simulate vocabulary information.

In order to avoid the problem of error propagation of a word segmentation device and the problem of insufficient semantics of multi-character representation, an external dictionary and a word granularity NER model are combined to form a mainstream word information fusion mode at present. The Lattice-LSTM injects all potential matching word information in the input text into single word representation, so that the performance improvement is achieved on Chinese NER data sets in multiple fields, and meanwhile, the research drives the research enthusiasm of word list information fusion. Particularly, PLTE and FLAT respectively design two word information fusion modes based on a Transformer framework. PLTE is based on a transformer encoder, and can batch process modeling of all characters and matching lexicon word information in parallel. In addition, the method adds position relation expression, introduces a porous mechanism to enhance the local modeling and maintain the capability of capturing long-term dependence, gives richer semantic information to the word vector, and achieves performance improvement on a plurality of data sets. FLAT is that after all potential words matched with the current input text are spliced to the input text, the original single word sequence input text is expanded into a word sequence, namely, a grid structure is flattened into a span, and then a self-attention mechanism based on a Transformer and the relative position coding of the span enable each character to be directly interacted with the potential matching words, so that the performance of an NER model is improved. MECT is based on FLAT, aiming at the characteristic that Chinese characters are pictographic characters, radial (radicals) with Chinese character semantic information are also merged into a word vector, and a Cross-transform network module is used for interactively merging Lattice embedding and radial embedding so as to enhance the semantic information.

Despite the above various efforts to incorporate vocabulary and enhance semantic information, the pronunciation and font of chinese are two very important pieces of information that carry important syntactic and semantic information in the language understanding task. To date, there is no method to integrate these two important pieces of information completely into the NER method.

Disclosure of Invention

The invention aims to creatively provide a Chinese named entity recognition method with multi-mode information selective fusion aiming at the technical problem that the two important information of the character pronunciation and the character pattern in a Chinese character are often ignored in the current NER work and how to effectively integrate the two important information of the character pronunciation and the character pattern into an NER method. The semantic information among the characters, the pronunciation and the font can be fused better and dynamically.

The innovation points of the invention are as follows: in the input of NER (named entity recognition), the Pinyin and the radical sequence of the character carrying semantic information are added. The method comprises the steps of coding the pronunciations (Pinyin) and the fonts (Radical) of Chinese characters by using a CNN (conditional Neural Network) Network, carrying out interactive embedding by using a Cross-Transformer Network, and then adopting selective fusion to dynamically generate fusion weights of different modal information.

First, the original text is used to obtain semantic (Lattice) information of words using a dictionary matching method, and the relative position codes converted from head-to-tail positions are used to adapt to word boundary information.

Then, for each character of the text, the pinyin and radical information of the chinese character is extracted using the CNN network. After the pronunciation (Pinyin), font (Radical) and semantic (Lattice) information of the Chinese character are obtained, cross transform is used to enhance the semantic information of Lattice and Pinyin, lattice and Radical.

These vectors are then integrated using selective fusion.

And finally, shielding the word part, and transmitting the fused information to the conditional random field to obtain the final label prediction probability so as to finish the Chinese named entity recognition.

Advantageous effects

Compared with the prior art, the invention has the following advantages:

1. according to the method, the pronunciation and font information carrying important semantic information is added on the basis of integrating the vocabulary information, the vector is endowed with stronger semantic information, and the designed selective integration module can dynamically integrate information of various modes, so that the effect of named entity recognition is effectively improved.

2. The invention uses a selective fusion mode, can dynamically control the weight occupied by the character pronunciation and the character pattern information, and can effectively improve the recognition performance of the named entity. And effective support is provided for natural language tasks such as machine translation, question answering system, reading understanding and the like.

Drawings

FIG. 1 is a general architecture diagram of the process of the present invention;

FIG. 2 is a diagram of step 1 of the method and a structure of a Lattice Embedding model according to the embodiment of the present invention;

FIG. 3 is a structural diagram of the model Pinyin and radial Embedding in step 1 of the method and the embodiment of the invention;

FIG. 4 is a Cross-Transformer network diagram of step 2 of the method of the present invention and an embodiment thereof;

FIG. 5 is a selectively fused layer of step 3 of the method of the invention and an embodiment.

Detailed Description

The method of the present invention will be described in further detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The purpose of the invention is realized by the following technical scheme:

a multi-modal information selective fusion Chinese named entity recognition method comprises the following steps:

step 1: and (4) multi-modal information input.

Semantic (Lattice) information, pronouncing (Pinyin) information and font (radial) information of the Chinese character are obtained.

Step 1-1: and obtaining semantic information.

First, semantic information of a Chinese character is acquired using a dictionary matching method, and word boundary information is adapted using a relative position code converted from head-to-tail positions.

Then, lattice Embedding is obtained by using the pre-trained word vector initialization.

Step 1-2: and acquiring character and sound information.

The pronunciation of a Chinese character (which may be obtained, for example, using python library tool pypinyin) is obtained, including the initial, final and tone of the Chinese character.

Combining the pronunciation of each Chinese character according to the sequence of initial consonant, vowel and tone, and then inputting the combination into a CNN convolution network according to the formula 1 to carry out Embedding expression:

x _i ＝f(w·e _i:i+h-1 +b) (1)

wherein w and b are convolution layer parameters, h is the convolution kernel width, and x _i The representation of the ith character Fu Pinyin in the text after the convolutional layer, e represents the initialized representation of the character, and f is the activation function.

Step 1-3: and obtaining font information.

The Chinese character structure main body is taken as the Radical of the Chinese character, the structure main body of the Chinese character is firstly crawled, the structure main body of the Chinese character is sequentially combined, and then CNN is used for coding the Chinese character structure main body into radial Embedding representation.

Specifically, the main body of the chinese character structure of the xinhua dictionary can be used as the radicals of the chinese characters.

Step 2: cross-transform-based interactive Embedding comprises Embedding vector initialization and Embedding interaction Embedding.

Step 2-1: the embed vector is initialized.

In step 1, lattice, pinyin and radial Embedding of Chinese characters are obtained.

Through a linear Transformer module, cross-Transformer inputs were obtained:

wherein, E _L1/L2 、E _P 、E _R The Chinese characters obtained in the step 1 are Lattice, pinyin and radial Embedding; i is an identity matrix, and each linear mapping matrix W is a learnable parameter. L1 and L2 respectively represent Lattice1 and Lattice2 which are interactively calculated with Pinyin and Radical. P represents Pinyin and R represents Radial. T denotes the transpose of the matrix. Q, K, V represents the three inputs, embedding, of Cross-Transformer, respectively.

Step 2-2: embedding Embedding interactively to obtain the input of Cross-Transformer. Then, performing interactive embedding calculation of Lattice and Pinyin and Lattice and Radical by using a Cross-Transformer network, which specifically comprises the following steps:

Att _P (Q _L1 ,K _P ,V _P )＝Softmax(Q _L1 ,K _P )V _P (3)

Att _L1 (Q _P ,K _L1 ,V _L1 )＝Softmax(Q _P ,K _L1 )V _L1 (4)

Att _L2 (Q _R ,K _L2 ,V _L2 )＝Softmax(Q _R ,K _L2 )V _L2 (5)

Att _R (Q _L2 ,K _R ,V _R )＝Softmax(Q _L2 ,K _R )V _R (6)

wherein, att is the last obtained Attention Embedding. L1 and L2 respectively represent Lattice1 and Lattice2 which are interactively calculated with Pinyin and Radical, P represents Pinyin, and R represents Radical. Q, K, V represents the three inputs, embedding, of Cross-Transformer, respectively.

And step 3: and (4) selective fusion. The method comprises two levels of fusion information of character granularity and sentence granularity.

Step 3-1: character granularity weight selection.

Step 2 obtains the Attention vector in each mode, and h is used in the invention _i To represent the Attention vector for the ith character. An optional gate unit is used to control how much information can flow to the hybrid Embedding representation, the gate values being calculated from a full link layer and a sigmoid function. Inputs include a Cross Attention representation after Cross-Transformer. The gate values for Pinyin, the two Lattice and the Radical modes are denoted as g ^p 、g ^l1 、g ^l2 、g ^r Then, the fused Embedding of the ith character represents the following calculation:

wherein, W ^p ,W ^l1 ,W ^l2 ,W ^r ,b ^p ,b ^l1 ,b ^l2 ,b ^r Are all learnable parameters. σ is a sigmoid function.

Individual watchThe Attention vectors for the ith character in Pinyin, lattice1, lattice2, and radial are shown.

Step 3-2: fuse Embedding representation.

After the gate values of the Embedding are obtained in the step 3-1, the weighted summation is carried out on the gate values of the extension Embedding to obtain a fusion Embedding representation

Step 3-3: and sentence granularity learning.

The transform Layer is applied to fully learn the Lattice, pinyin and radial information at the sentence level. The mixed representation of all characters is packed as:

wherein H ₀ A hybrid representation of all the characters is represented,

a fused representation of the nth character is represented,

representing the fused representation of the mth character.

The final hybrid representation is calculated as follows:

H＝Transformer(H ₀ ) (13)

where H represents the hidden layer output through the transform layer.

And 4, step 4: and outputting the final label prediction probability.

And 3, after the fusion process is finished, shielding the word part, and transmitting the fused information to a Conditional Random Field (CRF) to obtain the final label prediction probability.

Examples

As shown in fig. 1, the present invention includes four modules, the first layer is an input layer, which converts an input text into Lattice, pinyin and radial embedding through a pre-training word vector and an encoder. And then interactively performing Attention calculation on Lattice, pinyin and radio embedding through a Cross-Transformer network of a second layer. And then the obtained Attention score is passed through a selective fusion module to obtain a mixed vector. Finally, the probability of the final label is obtained through an output layer and CRF (conditional random field).

The method specifically comprises the following steps:

step 1: converting the input text into a representation of Lattice, pinyin and radial embedding;

step 1 comprises the following substeps:

step 1-1: and (5) Lattice embedding generation.

As shown in fig. 2, word Lattice information is first obtained by a dictionary matching method, such as word Lattice information of "Nanjing", "Nanjing city", "Changjiang bridge", "bridge", and the like, can be obtained. And then using the relative position codes converted from the head and tail positions and the pre-training word vectors to obtain Lattice embedding.

Step 1-2: pinyin embedding is generated.

As shown in FIG. 3, the pypinyin library is used to obtain the initial consonant, vowel and tone sequence of each Chinese character, and then the information is extracted by CNN coding to obtain the Pinyin embedding representation of the Chinese character.

Step 1-3: generating radial embedding; similar to the step 1-2, as shown in fig. 3, a chinese character structure subject obtained by crawling in advance is used as font information to obtain a radial sequence, and then a radial embedding representation is obtained through CNN coding.

Step 2: cross-transform-based interactive Attention calculation;

Cross-Transformer-based interactive Attention calculation in the invention is shown in fig. 4, and Cross-Attention calculation is performed on Lattice and Pinyin, and Lattice and radial pairwise through four Transformer encoder layers, mainly aiming at obtaining Chinese characters and Pinyin, and the interactive Attention of the Chinese characters and the radial is expressed as follows:

step 2 comprises the following substeps:

step 2-1: initializing embedding; for the embedding obtained by the input layer, the input of Cross-Transformer is initialized by a linear Transformer module.

Step 2-2: attention calculation; after Cross-Transformer input is obtained, 4 transform layers are designed as shown in fig. 4, and the Q values of Lattice and Pinyin, lattice and radial are exchanged for the Attention calculation, so as to obtain 4 Attention embeddings.

And step 3: and a selective fusion module.

The selective fusion module in the present invention is shown in fig. 5:

the selective fusion module mainly aims at each Attention embedding obtained through a Cross-Transformer network in the step 2 to perform dynamic fusion representation, firstly, the number of information in each Attention embedding is controlled through four gate values to flow into a mixed vector, after the mixed vector representation is obtained, the Attention calculation of sentence granularity is obtained through a Transformer Layer, and finally, a fusion embedding is obtained.

Step 3 comprises the following substeps:

step 3-1: and calculating the gate value of each modal information.

An optional gate unit is used to control how much information can flow to the imbedding representation of the hybrid. If the current text is more verbalized, more information in the Pinyin Attention will flow into the fusion embedding. Conversely, if the current text is more written, more information in the radial Attention will flow into the fusion embedding.

Step 3-2: fusing Embedding based on character granularity.

And 3-1, obtaining the gate value of each modal information, and then using the gate value as a weight to weight and sum the Attention as fusion embedding of character granularity.

Step 3-3: sentence-granularity based learning.

In order to enable the invention to fully learn the Lattice, pinyin and radial information at sentence level, a Transformer Layer is applied for learning later.

And 4, step 4: and (5) a model output layer.

And 3, obtaining fused embedding, then sending the word part to MASK, transmitting the MASK to a Conditional Random Field (CRF) module, and outputting the final label prediction probability.

The following tables are the experimental results of the proposed method on four public data sets. VisPhone is the method provided by the invention, and other models are some current classical or latest NER models, and it can be found that the method provided by the invention has obvious improvement on four data sets compared with the current best model.

Table 1 experimental results of the method proposed by the present invention on weibo data sets

Table 2 experimental results of the method proposed by the invention on the resume data set

Table 3 experimental results of the method proposed by the invention on ontanotes data set

Table 4 experimental results of the method proposed by the present invention on msra data set

This specification presents a specific embodiment for the purpose of illustrating the context and method of practicing the invention. The details introduced in the examples are not intended to limit the scope of the claims but to aid in the understanding of the process described herein. Those skilled in the art will understand that: various modifications, changes or substitutions to the preferred embodiment steps are possible without departing from the spirit and scope of the invention and its appended claims. Therefore, the present invention should not be limited to the disclosure of the preferred embodiments and the accompanying drawings.

Claims

1. A multi-modal information selective fusion Chinese named entity recognition method is characterized in that firstly, a dictionary matching method is used for an original text to obtain semantic information of words, and relative position codes converted from head and tail positions are used for adapting to word boundary information;

then, for each character of the text, extracting pinyin and radical information of the Chinese characters by using a CNN network; after obtaining the Pinyin, font Radiica and semantic Lattice information of the Chinese character, using a Cross transform to enhance the semantic information of Lattice and Pinyin, lattice and Radiica;

then, integrating the vectors by adopting a selective fusion mode, wherein the fusion information comprises two layers of character granularity and sentence granularity;

and finally, shielding the word part, and transmitting the fused information to a conditional random field to obtain the final label prediction probability so as to finish the Chinese named entity recognition.

2. The method of claim 1, wherein the method comprises the following steps:

the method for acquiring the semantic Lattice information comprises the following steps: firstly, obtaining semantic information of Chinese characters by using a dictionary matching method, and adapting word boundary information by using relative position codes converted from head and tail positions; then, initializing and obtaining Lattice Embedding by using a word vector trained in advance;

acquiring Pinyin information including initial consonants, vowels and tones of Chinese characters, combining Pinyin of each Chinese character according to the sequence of the initial consonants, the vowels and the tones, and inputting the Pinyin of each Chinese character into a CNN convolution network according to a formula 1 to carry out Embedding expression:

x _i ＝f(w·e _i:i+h-1 +b) (1)

wherein w and b are convolution layer parameters, h is the convolution kernel width, and x _i The method is characterized in that the representation of the ith character Fu Pinyin in the text after the convolutional layer, e represents the initialized representation of the character, and f is an activation function;

obtaining character pattern information, taking a Chinese character structure main body as a Radical of a Chinese character, crawling the Chinese character structure main body, sequentially combining the Chinese character structure main body, and then coding the Chinese character structure main body into a radial Embedding expression by using CNN.

3. The method as claimed in claim 1, wherein Cross Transformer is used to enhance semantic information of Lattice and Pinyin, lattice and radiology as follows:

cross-transform-based interactive Embedding, including Embedding vector initialization and Embedding;

a, step a: initializing an Embedding vector;

according to Lattice, pinyin and radial Embedding of Chinese characters, obtaining the input of Cross-transform through a linear transform module:

wherein, the first and the second end of the pipe are connected with each other,

E _P 、E _R is a Chinese character Lattice,Pinyin and radial Embedding; i is a unit matrix, and each linear mapping matrix W is a learnable parameter; l1 and L2 respectively represent Lattice1 and Lattice2 which are interactively calculated with Pinyin and Radical; p represents Pinyin, R represents Radial; t represents the transpose of the matrix; q, K, V represents three inputs, embedding, of Cross-Transformer, respectively;

step b: embedding Embedding interactively to obtain the input of Cross-Transformer; then, performing interactive embedding calculation of Lattice and Pinyin and Lattice and Radial by using a Cross-Transformer network:

Att _P (Q _L1 ,K _P ,V _P )＝Softmax(Q _L1 ,K _P )V _P (3)

Att _L1 (Q _P ,K _L1 ,V _L1 )＝Softmax(Q _P ,K _L1 )V _L1 (4)

Att _L2 (Q _R ,K _L2 ,V _L2 )＝Softmax(Q _R ,K _L2 )V _L2 (5)

Att _R (Q _L2 ,K _R ,V _R )＝Softmax(Q _L2 ,K _R )V _R (6)

wherein, att is the finally obtained Attention Embedding; l1 and L2 respectively represent Lattice1 and Lattice2 which are interactively calculated with Pinyin and Radical, P represents Pinyin, and R represents Radical; q, K, V represents the three inputs, embedding, of Cross-Transformer, respectively.

4. The method for Chinese named entity recognition with multi-modal selective fusion of information as claimed in claim 1, wherein the selective fusion method is as follows:

step I: selecting character granularity weight;

use of h _i To represent the Attention vector for the ith character; using a selective gate unit to control how much information can flow to the hybrid Embedding representation, the gate value is calculated by a full link layer and a sigmoid function; the input comprises a Cross Attention representation after passing through a Cross-Transformer; pinyin, two Lattice and RThe gate value of the additive mode is expressed as g ^p 、g ^l1 、g ^l2 、g ^r Then, the fused Embedding of the ith character represents the following calculation:

wherein, W ^p ,W ^l1 ,W ^l2 ,W ^r ,b ^p ,b ^l1 ,b ^l2 ,b ^r Are all learnable parameters; σ is a sigmoid function;

respectively representing the Attention vectors of the ith character in Pinyin, lattice1, lattice2 and radial;

step II: fusing Embedding representation;

after the gate values of the Embedding are obtained in the step I, weighted summation is carried out on the extension Embedding and the gate values to obtain a fusion Embedding representation

Step III: sentence granularity learning;

applying a Transformer Layer to fully learn the Lattice, pinyin and radial information at the sentence level; the mixed representation of all characters is packed as:

wherein H ₀ A hybrid representation of all the characters is represented,

a fused representation of the nth character is represented,

a fused representation representing an Mth character;

the final hybrid imbedding representation is calculated as follows:

H＝Transformer(H ₀ ) (13)

where H represents the hidden layer output through the transform layer.