CN115906825A

CN115906825A - Chinese word sense disambiguation combining multi-channel mixed hole convolution with residual error and attention

Info

Publication number: CN115906825A
Application number: CN202211495234.9A
Authority: CN
Inventors: 张春祥; 张育隆; 杨玉建; 高雪瑶
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2022-11-26
Filing date: 2022-11-26
Publication date: 2023-04-04

Abstract

The invention relates to a Chinese word sense disambiguation method Combining Multi-Channel mixed cavity Convolution with Residual error and Attention (MHDCNN-RA). The method comprises the steps of performing word segmentation, part of speech tagging and semantic class tagging on Chinese sentences containing ambiguous vocabularies to obtain processed training corpora and test corpora. And then, training the word sense disambiguation model by utilizing the training corpus to obtain an optimized MHDCNN-RA model. And disambiguating the test corpus on the optimized MHDCNN-RA model to obtain the weight of the ambiguous vocabulary in each semantic category, wherein the semantic category with the maximum weight is the semantic category of the ambiguous vocabulary. The invention realizes good disambiguation on of ambiguous words and more accurately judges the real meanings of the ambiguous words.

Description

Chinese word sense disambiguation combining multi-channel mixed hole convolution with residual error and attention

The technical field is as follows:

the invention relates to a Chinese word meaning disambiguation method combining multi-channel mixed hole convolution with residual error and attention, which is well applied to the technical field of natural language processing.

Background art:

in natural languages, a large number of multi-meaning words exist, and the problem to be solved by word sense disambiguation is to determine which of the multiple word senses of each word should be selected as the correct word sense of the word in a particular context. Word sense disambiguation plays an important role in the fields of machine translation, semantic recognition, information retrieval and the like.

The vocabulary has often been disambiguated previously using some common algorithms, such as: k-means, naive Bayes, association rule-based classification methods, artificial neural networks, and the like. However, conventional algorithms suffer from several drawbacks and deficiencies. The extracted disambiguation features are only limited to local areas, and some features need to be artificially designed, so that the workload is large, the speed is low, and the training effect of the classifier is not good. In recent years, deep learning algorithms have been widely applied to the field of natural language processing. In an MHDCNN-RA model, context information and text similarity of linguistic data can be fully mined through a multi-channel convolution neural network, the grid effect of common hole convolution is effectively improved through mixed hole convolution, and multi-scale information is fully captured. The expression capability of the model is increased by using a deep convolutional network, the gradient disappearance problem of the deep neural network is relieved by using a residual error structure, and the connection between each disambiguation feature is mined by using a multi-head self-attention mechanism so as to obtain higher word sense disambiguation precision.

The invention content is as follows:

in order to solve the problem of lexical ambiguity in the field of natural language processing, the invention discloses a Chinese word sense disambiguation method combining multi-channel mixed hole convolution with residual error and attention. In natural languages, a large number of multi-meaning words exist, and the problem to be solved by word sense disambiguation is to determine which of the multiple word senses of each word should be selected as the correct word sense of the word in a particular context. Word sense disambiguation plays an important role in the fields of machine translation, semantic recognition, information retrieval and the like.

Therefore, the invention provides the following technical scheme:

1. chinese word sense disambiguation method combining multi-channel mixed hole convolution with residual error and attention, wherein ambiguous vocabulary m has C semantic categories s ₁ ,s ₂ ,…,s _C The method is characterized by comprising the following steps:

step 1: performing word segmentation, part of speech tagging and semantic class tagging on a training corpus and a testing corpus of SemEval-2007.

Step 2: using a random initialization Word embedding matrix and using a Word2Vec, fasttext pre-trained Word embedding matrix for disambiguation features extracted from the corpus of SemEval-2007.

And step 3: and optimizing the MHDCNN-RA model by using the training data to obtain the optimized MHDCNN-RA model.

And 4, step 4: and inputting test data on the optimized MHDCNN-RA model, and calculating the weight of the ambiguous vocabulary m under each semantic category, wherein the semantic category with the maximum weight is the semantic category of the ambiguous vocabulary m.

2. The multi-channel mixed cavity convolution combined with residual and attention Chinese word sense disambiguation as claimed in claim 1, wherein in step 1, the training corpus and the testing corpus of SemEval-2007:

step 1-1, segmenting words of Chinese sentences by using a Chinese word segmentation tool;

step 1-2, performing part-of-speech tagging on the vocabulary by using a Chinese part-of-speech tagging tool;

step 1-3, according to synonym forest, semantic labeling is carried out on words by utilizing a Chinese semantic labeling tool;

step 1-4, extracting the word shapes, the part of speech, the semantic classes and the stroke numbers of four adjacent word units around the ambiguous word m, and taking 4 near-synonyms with the similarity of 2 adjacent words around the ambiguous word m and the synonym dictionary as disambiguation characteristics.

3. The multi-channel mixed-cavity convolution combining residual and Chinese Word sense disambiguation of attention according to claim 1, characterized in that in step 2, a randomly initialized Word embedding matrix and a Word2Vec, fasttext pre-trained Word embedding matrix are used for disambiguation features extracted from the training corpus of SemEval-2007 task #5, three Word embedding matrices are used as training data, and a randomly initialized Word embedding matrix and a Word2Vec, fasttext pre-trained Word embedding matrix are used for disambiguation features extracted from the testing corpus of SemEval-2007:

step 2-1 obtains a word embedding matrix V by using random initialization for disambiguation features extracted from the corpus of SemEval-2007 ₁ ；

Step 2-2 for disambiguation features extracted from the corpus of SemEval-2007 task #5, a Word2Vec is used to obtain a pre-trained Word embedding matrix V ₂ ；

Step 2-3 uses FastText to obtain a pre-trained word embedding matrix V for disambiguation characteristics extracted from the training corpus of SemEval-2007 ₃ ；

Step 2-4, embedding the three words obtained in the step 2-1, the step 2-2, the step 2-3 into a matrix to be used as training data;

step 2-5 obtains the word embedding matrix V by using random initialization for disambiguation features extracted from the test corpus of SemEval-2007 ₁ ；

Step 2-6 the disambiguation signatures extracted from the test corpus of SemEval-2007 Task #5 were used Word2Vec to obtain a pre-trained Word embedding matrix V ₂ ；

Step 2-7 uses FastText to obtain a pre-trained word embedding matrix V for disambiguation characteristics extracted from the test corpus of SemEval-2007 ₃ ；

And 2-8, using the three word embedding matrixes obtained in the steps 2-5,2-6 and 2-7 as test data.

4. The multi-channel mixed hole convolution combining residual error and Chinese word sense disambiguation of attention according to claim 1, wherein in the step 3, the MHDCNN-RA model is optimized by using training data to obtain an optimized MHDCNN-RA model, and the specific steps are as follows:

step 3-1, loading three word embedded matrixes of training data to an input embedded layer of an initialized MHDCNN-RA model as weights to form a three-channel input matrix [ V [ ] ₁ ,V ₂ ,V ₃ ]；

Step 3-2, through a feature fusion layer, firstly fusing the three-channel matrix by using two-dimensional convolution to obtain an output Z ₁ Then to Z ₁ The odd number position adopts sine coding, the even number position adopts cosine coding, the output P is obtained, the obtained position coding characteristic P and the original characteristic Z are ₁ Adding to obtain a new fusion characteristic Z ₂ Finally, the feature matrix Z is pair by using one-dimensional convolution ₂ Compressed and fused to obtain output Z ₃ The feature fusion process is as follows:

Z ₁ ＝Conv2D(V ₁ ,V ₂ ,V ₃ )

P ^(pos,2i) ＝sin(pos/10000 ^2i/d )

P ^(pos,2i+1) ＝cos(pos/10000 ^2i/d )

Z ₂ ＝Z ₁ +P

Z ₃ ＝Conv1D(Z ₂ )

where pos represents the index of one disambiguation feature in a group of disambiguation features, 2i and 2i +1 represent the parity position of the word vector dimension, where the word vector dimension d =256, then 2i = [0,2,4, ·.,254],2i +1= [1,3,5,...., 255];

and 3-3, passing through a depth convolution layer, wherein the depth convolution layer is formed by stacking 12 one-dimensional convolution blocks, the structures of all the convolution blocks are the same, only the expansion rates are different, two one-dimensional convolution forms in the one-dimensional convolution blocks are the same (the number of convolution kernels and the size of the convolution kernels are the same), but weights are not shared, one of the convolution blocks uses a sigmoid activation function, the other convolution block does not use an activation function, the two activation functions are multiplied bit by bit, the output dimension is consistent with the input dimension through filling, the input is also added to form a residual error structure, gradient disappearance is avoided, and information can be transmitted in multiple channels. The one-dimensional volume block calculation process is as follows:

wherein

Is the multiplication of corresponding elements, sigma is the sigmoid function;

the stacking of one-dimensional volume blocks uses a hybrid void convolution scheme, i.e. expansion ratio [1,2, 4]]Repeating for three times to make the convolution kernel just cover the characteristic matrix Z ₃ Then using [1,1 ]]Fine granularity fine adjustment is carried out, and output Z is obtained through 12 one-dimensional convolution blocks ₄ . The deep convolutional layer process is as follows:

Z ₄ ＝Conv1D_Block(…(Conv1D_Block(Z ₃ )))

wherein Conv1D _ Block is a one-dimensional volume Block;

step 3-4, passing through a normalization layer, for Z ₄ Carrying out normalization;

step 3-5, digging the relation between each disambiguation characteristic through a multi-head self-attention layer, wherein the multi-head self-attention calculation process comprises the following steps:

Q＝Z ₄ ·W ^Q ,K＝Z ₄ ·W ^K ,V＝Z ₄ ·W ^V

MultiHead(Q,K,V)＝Concat(head ₁ ,...,head _h )W ^o

wherein W ^Q 、W ^K 、W ^V Is a parameter matrix;

3-6, reducing parameters while keeping main characteristics through a maximum pooling layer;

step 3-7, outputting the ambiguous vocabulary m in the semantic category s through a self-adaptive average pooling layer _i Assigning weight of w(s) to _i |m)，i＝1,2,…,C；

Step 3-8 uses cross entropy loss function to calculate the error loss between the actual output and the expected output, the calculation process is as follows:

loss represents the average error of the training data, n is the number of training data, y _k Is the label for the kth training data. Updating parameters layer by layer according to the error loss back propagation, wherein the parameter updating process is as follows:

wherein θ represents a parameter set, θ' represents an updated parameter set, and a is a learning rate;

and 3-9 continuously iterating the step 3-1 to the step 3-8 until the set iteration times are reached to obtain the optimized MHDCNN-RA model.

5. The multi-channel mixed hole convolution combining residual error and Chinese word sense disambiguation of attention according to claim 1, wherein in the step 4, a test process, namely a semantic classification process, inputs test data on the optimized MHDCNN-RA model, and calculates the weight of the ambiguous word m in each semantic category, wherein the semantic category with the largest weight is the semantic category of the ambiguous word, and the specific process is as follows:

step 4-1, loading three word embedded matrixes of test data into an input embedded layer of an initialized MHDCNN-RA model to be used as weights to form a three-channel input matrix [ V [ ] ₁ ,V ₂ ,V ₃ ]；

Step 4-2, through a feature fusion layer, firstly fusing the three-channel matrix by using two-dimensional convolution to obtain an output Z ₁ Then to Z ₁ Of (2)The number positions adopt sine coding, the even number positions adopt cosine coding to obtain output P, and the obtained position coding characteristic P and the original characteristic Z are used ₁ Adding to obtain a new fusion characteristic Z ₂ Finally, the feature matrix Z is pair by using one-dimensional convolution ₂ Compressed and fused to obtain output Z ₃ The feature fusion process is as follows:

Z ₁ ＝Conv2D(V ₁ ,V ₂ ,V ₃ )

P ^(pos,2i) ＝sin(pos/10000 ^2i/d )

P ^(pos,2i+1) ＝cos(pos/10000 ^2i/d )

Z ₂ ＝Z ₁ +P

Z ₃ ＝Conv1D(Z ₂ )

and 4-3, passing through a depth convolution layer, wherein the depth convolution layer is formed by stacking 12 one-dimensional convolution blocks, the structures of all the convolution blocks are completely the same and only have different expansion rates, two one-dimensional convolution forms in the one-dimensional convolution blocks are the same (the number of convolution kernels and the size of the convolution kernels are the same), but weights are not shared, one of the two one-dimensional convolution blocks uses a sigmoid activation function, the other one of the two one-dimensional convolution blocks does not use an activation function, then the two one-dimensional convolution blocks multiply the positions bit by bit, the output and the input dimension are kept consistent through filling, the input is added to form a residual error structure, gradient disappearance is avoided, and information can be transmitted in multiple channels. The one-dimensional volume block calculation process is as follows:

wherein

Is the multiplication of corresponding elements, sigma is the sigmoid function;

the stacking of one-dimensional volume blocks uses a hybrid void convolution scheme, i.e. expansion ratio [1,2, 4]]Repeating for three times to make the convolution kernel just cover the characteristic matrix Z ₃ Then using [1,1 ]]Fine granularity fine adjustment is carried out, and output Z is obtained through 12 one-dimensional convolution blocks ₄ . The process of the convolution layer is as follows:

Z ₄ ＝Conv1D_Block(…(Conv1D_Block(Z ₃ )))

wherein Conv1D _ Block is a one-dimensional volume Block;

step 4-4, passing through a normalization layer, for Z ₄ Carrying out normalization;

and 4-5, mining the relation between each disambiguation feature through a multi-head self-attention layer, wherein the multi-head self-attention calculation process comprises the following steps:

Q＝Z ₄ ·W ^Q ,K＝Z ₄ ·W ^K ,V＝Z ₄ ·W ^V

MultiHead(Q,K,V)＝Concat(head ₁ ,...,head _h )W ^o

wherein W ^Q 、W ^K 、W ^V Is a parameter matrix;

4-6, reducing parameters while keeping main characteristics through a maximum pooling layer;

step 4-7, outputting the ambiguous vocabulary m in the semantic category s through a self-adaptive average pooling layer _i Assigning weight of w(s) to _i |m)，i＝1,2,…,C；

Step 4-7, outputting the semantic category with the maximum weight, wherein the process is as follows:

where s is the semantic category of the ambiguous vocabulary m.

Has the advantages that:

1. the invention relates to a Chinese word sense disambiguation method combining multi-channel mixed hole convolution with residual error and attention. The Chinese sentences are subjected to Word segmentation and part-of-speech tagging, the stroke numbers and the similar meaning words of words adjacent to each other on the left and right of ambiguous words are extracted, a random initialization Word embedding matrix and Word embedding matrices pre-trained by Word2vec and FastText are used, and the three-channel Word embedding matrix formed by combining the three words has high quality.

2. The model used by the method is a neural network combining multi-channel mixed hole convolution with residual error and attention, and is mainly characterized in that three words are embedded into a matrix to be combined into three-channel input, and context information and text similarity of linguistic data can be fully mined. And carrying out position coding on the feature matrix and adding position information. The mixed hole convolution scheme is adopted to solve the problem of grid effect caused by single hole convolution, full coverage scanning is realized on the characteristic matrix, multi-scale information is obtained, the expression capability of the model is improved by using the deep neural network, and the gradient disappearance problem of the deep neural network is effectively relieved by using the residual error structure. And (3) mining the relation among all disambiguation characteristics by using a multi-head self-attention mechanism so as to improve the word sense disambiguation precision and obtain a good classification effect.

3. The cross entropy loss function used by the invention contains a softmax classifier, not only can solve the problem of multi-classification data processing, but also embeds NLLLoss and calculates error loss.

4. And when the model is trained, updating parameters by adopting an adam gradient descent method. By calculating the error, the error returns along the original route through reverse propagation, namely, each layer of parameters is updated layer by layer from the output layer through each middle hidden layer in the reverse direction, and finally the error returns to the output layer. Forward and backward propagation are continuously performed to reduce errors and update model parameters until the MHDCNN-RA is trained. As the parameters are continuously updated along with the back propagation of the errors, the disambiguation accuracy of the whole MHDCNN-RA model on the input data is improved.

Description of the drawings:

FIG. 1 is a flow chart of word sense disambiguation of a Chinese sentence in an embodiment of the present invention;

FIG. 2 is a training process of a word sense disambiguation model based on MHDCNN-RA in an embodiment of the present invention.

FIG. 3 is a testing process of the MHDCNN-RA based word sense disambiguation model in an embodiment of the present invention.

The specific implementation mode is as follows:

in order to clearly and completely describe the technical scheme in the embodiment of the invention, the Chinese sentences in the training corpus of SemEval-2007. "for example, the present invention will be described in further detail with reference to the drawings in the examples. There are 55 sentences in training corpus one and 19 sentences in testing corpus one. The ambiguous word "unit" has two semantic classes, 0: organization,1: and unit.

The flow chart of the multi-channel mixed hole convolution combined with residual error and Chinese word sense disambiguation of attention of the embodiment of the invention is shown in figure 1 and comprises the following steps. The training process of the word sense disambiguation model based on MHDCNN-RA in the embodiment of the invention is shown in FIG. 2. The testing process of the word sense disambiguation model based on MHDCNN-RA in the embodiment of the invention is shown in FIG. 3.

The extraction process of the disambiguation characteristics in the step 1 is as follows:

for the Chinese sentence, from 1 month and 1 day in 1999, no new houses can be physically distributed by any department or unit. ", the feature extraction steps are as follows:

step 1-1, segmenting words of Chinese sentences by using a Chinese word segmentation tool, wherein the word segmentation result is as follows:

from 1/1 in 1999, no department unit can distribute the physical properties of the new house

Step 1-2, performing part-of-speech tagging on the vocabulary by using a Chinese part-of-speech tagging tool, wherein the part-of-speech tagging result is as follows:

from/p 1999/t 1/t, f any/r department/n unit/n must/v re/d will/p Xin House/n proceed/v physical/n distribution/vn

Step 1-3, according to synonym forest, semantic labeling is carried out on words by using a Chinese semantic labeling tool:

from/p/Hi 39/t/Ca 18/t/Bd 02/t/Di 02/f/Kd 02 any/r/Eb 02 department/n/Di 09 unit/n/Di 09 can not/v/Gc 02/d/Ig 04/p/Ae 10 new house/n/Bn 03/v/Ig 03 substantial property/n/-1 distribution/vn/He 05

Step 1-4, extracting the word shapes, parts of speech, semantic classes and stroke numbers of the word shapes of the left and right adjacent word units of the unit, extracting 4 similar words with the similarity of the left and right adjacent words of the unit being close to the front according to the synonym forest, and combining the similar words into a group of disambiguation characteristics:

any r Eb02 department n Di09 cannot be operated by v Gc02 and d Ig04 13 13 15 6 institutions

Step 2, acquiring training data and test data of a unit:

step 2-1, using a random initialization Word embedding matrix and a Word embedding matrix pre-trained by Word2Vec and Fastext for disambiguation features extracted from the training corpus of SemEval-2007;

step 2-2 uses the randomly initialized Word embedding matrix and the Word embedding matrix pre-trained using Word2Vec, fastext for disambiguation features extracted from the test corpus of SemEval-2007 #5, for a total of three Word embedding matrices as test data, as follows:

Deriving a word embedding matrix V using random initialization ₁

Obtaining pre-training Word embedding matrix V by using Word2Vec ₂

Obtaining pre-training word embedding matrix V by using FastText ₃

Step 3 uses the training data to optimize the MHDCNN-RA model:

step 3-1, loading three word embedded matrixes of training data into an input embedded layer of an initialized MHDCNN-RA model to be used as weights to form a three-channel input matrix [ V ₁ ,V ₂ ,V ₃ ]；

Step 3-2, through a feature fusion layer, firstly using two-dimensional convolution to fuse the three-channel matrix to obtain output Z ₁ Then to Z ₂ The odd number position adopts sine coding, the even number position adopts cosine coding, the output P is obtained, the obtained position coding character P and the original character Z are ₁ Adding to obtain a new fusion characteristic Z ₂ Finally, the feature matrix Z is pair by using one-dimensional convolution ₂ Compressed and fused to obtain output Z ₃ The feature fusion process is as follows:

Z ₁ ＝Conv2D(V ₁ ,V ₂ ,V ₃ )

P ^(pos,2i) ＝sin(pos/10000 ^2i/d )

P ^(pos,2i+1) ＝cos(pos/10000 ^2i/d )

Z ₂ ＝Z ₁ +P

Z ₃ ＝Conv1D(Z ₂ )

where pos represents an index of one disambiguating feature in a set of disambiguating features, 2i and 2i +1 represent parity positions of the word vector dimension, where the word vector dimension d =256, then 2i = [0,2,4,.... Multidot.,. 254],2i +1= [1,3,5,. Multidot.,. 255];

and 3-3, stacking the depth convolution layer by 12 one-dimensional convolution blocks, wherein the structures of all the convolution blocks are the same, but the expansion rates are different, the two one-dimensional convolution forms in the one-dimensional convolution blocks are the same (the number of convolution kernels and the sizes of the convolution kernels are the same), but the weights are not shared, one of the convolution blocks uses a sigmoid activation function, the other one of the convolution blocks does not use an activation function, multiplying the two activation functions bit by bit, the output and the input dimensions are kept consistent through filling, the input is also added to form a residual error structure, gradient disappearance is avoided, and information can be transmitted in multiple channels. The one-dimensional volume block calculation process is as follows:

wherein

Is the multiplication of corresponding elements, sigma is the sigmoid function;

the stacking of one-dimensional volume blocks uses a hybrid void convolution scheme, i.e. expansion ratio [1,2, 4]]Repeating for three times to make the convolution kernel just cover the characteristic matrix Z ₃ Then using [1,1 ]]Fine granularity fine adjustment is carried out, and output Z is obtained through 12 one-dimensional convolution blocks ₄ . The depth convolution layer calculation process is as follows:

Z ₄ ＝Conv1D_Block(…(Conv1D_Block(Z ₃ )))

wherein Conv1D _ Block is a one-dimensional volume Block;

Q＝Z ₄ ·W ^Q ,K＝Z ₄ ·W ^K ,V＝Z ₄ ·W ^V

MultiHead(Q,K,V)＝Concat(head ₁ ,...,head _h )W ^o

wherein W ^Q 、W ^K 、W ^V Is a parameter matrix;

step 3-7, outputting ambiguous vocabulary 'units' in semantic category s through a self-adaptive average pooling layer _i Assigning weight of w(s) to _i |m)，i＝1,2，0：s ₁ = organization and 1: s is ₂ ＝unit；

Step 3-8 calculating the error loss between the actual output and the expected output using the cross entropy loss function _Unit The calculation process is as follows:

updating parameters layer by layer according to the error loss back propagation, wherein the parameter updating process is as follows:

wherein, theta _{Unit of} Denotes a parameter set, θ' _{Unit of} Representing the updated parameter set, a being the learning rate;

3-9 continuously iterating the steps 3-1 to 3-8 until the set iteration times are reached to obtain an optimized MHDCNN-RA model;

step 4, performing semantic classification on the ambiguous word unit:

Step 4-2, through a feature fusion layer, firstly using two-dimensional convolution to fuse the three-channel matrix to obtain output Z ₁ Then to Z ₂ The odd number position adopts sine coding, the even number position adopts cosine coding, the output P is obtained, the obtained position coding characteristic P and the original characteristic Z are ₁ Adding to obtain a new fusion characteristic Z ₂ Finally, the feature matrix Z is pair by using one-dimensional convolution ₂ Compressed and fused to obtain output Z ₃ The feature fusion process is as follows:

Z ₁ ＝Conv2D(V ₁ ,V ₂ ,V ₃ )

P ^(pos,2i) ＝sin(pos/10000 ^2i/d )

P ^(pos,2i+1) ＝cos(pos/10000 ^2i/d )

Z ₂ ＝Z ₁ +P

Z ₃ ＝Conv1D(Z ₂ )

and 4-3, passing through a depth convolution layer, wherein the depth convolution layer is formed by stacking 12 one-dimensional convolution blocks, the structures of all the convolution blocks are the same, only the expansion rates are different, two one-dimensional convolution forms in the one-dimensional convolution blocks are the same (the number of convolution kernels and the sizes of the convolution kernels are the same), but weights are not shared, one of the convolution blocks uses a sigmoid activation function, the other convolution block does not use an activation function, the two activation functions are multiplied bit by bit, the output dimension is consistent with the input dimension through filling, the input is added to form a residual error structure, gradient disappearance is avoided, and information can be transmitted in multiple channels. The one-dimensional volume block calculation process is as follows:

wherein

Is the multiplication of corresponding elements, sigma is the sigmoid function;

the stacking of one-dimensional volume blocks uses a hybrid void convolution scheme, i.e. expansion ratio [1,2, 4]]Repeating for three times to make the convolution kernel just cover the characteristic matrix Z ₃ And then using a solution of [1,1]fine granularity fine adjustment is carried out, and output Z is obtained through 12 one-dimensional convolution blocks ₄ . The depth convolution layer calculation process is as follows:

Z ₄ ＝Conv1D_Block(…(Conv1D_Block(Z ₃ )))

wherein Conv1D _ Block is a one-dimensional volume Block;

step 4-4, through the normalization layer, for Z ₄ Normalization is carried out;

step 4-5, digging the relation between each disambiguation characteristic through a multi-head self-attention layer, wherein the multi-head self-attention calculation process comprises the following steps:

Q＝Z ₄ ·W ^Q ,K＝Z ₄ ·W ^K ,V＝Z ₄ ·W ^V

MultiHead(Q,K,V)＝Concat(head ₁ ,...,head _h )W ^o

wherein W ^Q 、W ^K 、W ^V Is a parameter matrix;

and 4-7, outputting an ambiguous word unit in a semantic category 0 through a self-adaptive average pooling layer: s is ₁ = organization and 1: s ₂ Weight allocated in unit w = [ w(s) = ₁ In |, w(s) ₂ I unit)]＝[1.6917,7.5621]；

Step 4-8, outputting the semantic category with the maximum weight as follows:

s ₂ = unit represents semantic category corresponding to ambiguous word "unit".

Through the optimized MHDCNN-RA model, from 1 month and 1 day in 1999, the Chinese sentence containing the ambiguous word "unit" can not be subjected to physical distribution on the new house by any department and unit. The word sense disambiguation is carried out, and the semantic category corresponding to the ambiguous word unit is unit. Through experimental verification, the accuracy rate of the test corpus of the ambiguous word unit on the optimized MHDCNN-RA model reaches 84.21%.

The multi-channel mixed hole convolution in the embodiment of the invention combines the residual error and the Chinese word meaning disambiguation of attention, can select accurate disambiguation characteristics, and adopts the multi-channel mixed hole convolution to combine the neural network of the residual error and the attention to determine the semantic category of ambiguous words.

The foregoing is a detailed description of embodiments of the invention, taken in conjunction with the accompanying drawings, wherein the specific embodiments are merely provided to assist in understanding the method of the invention. For those skilled in the art, variations and modifications can be made within the scope of the embodiments and applications according to the concept of the present invention, and therefore the present invention should not be construed as being limited thereto.

Claims

1. Chinese word meaning disambiguation method combining multi-channel mixed hole convolution with residual error and attention, wherein ambiguous vocabulary m has C semantic categories s ₁ ,s ₂ ,…,s _C The method is characterized by comprising the following steps:

Step 2: using a random initial Word embedding matrix and using a Word2Vec, fastext pre-trained Word embedding matrix for disambiguation features extracted from the corpus of SemEval-2007.

and (1) extracting the morphology, the part of speech, the semantic class and the stroke number of four adjacent vocabulary units around the ambiguous vocabulary m and taking 4 near-meaning words with the similarity close to 2 adjacent vocabularies around the ambiguous word m according to a synonym dictionary as disambiguation characteristics.

3. The multi-channel mixed-hole convolution combined residual and attention Chinese Word sense disambiguation as described in claim 1 wherein in step 2, a randomly initialized Word embedding matrix and a Word embedding matrix using Word2Vec, fasttext pre-trained are used for disambiguation features extracted from the corpus of SemEval-2007:

step 2-1, a word embedding matrix V is obtained by using random initialization on disambiguation characteristics extracted from the corpus of SemEval-2007 ₁ ；

Step 2-2 obtaining a pre-trained Word embedding matrix V using Word2Vec for disambiguation features extracted from the corpus of SemEval-2007 ₂ ；

Step 2-3 uses FastText to obtain a pre-trained word embedding matrix V for disambiguating features extracted from the corpus of SemEval-2007 ₃ ；

Step 3-2, through a feature fusion layer, firstly using two-dimensional convolution to fuse the three-channel matrix to obtain output Z ₁ Then to Z ₁ Is adopted at odd number positionSine coding, cosine coding is adopted at even number positions to obtain output P, and the obtained position coding characteristic P and original characteristic Z are used ₁ Adding to obtain a new fusion characteristic Z ₂ Finally, the feature matrix Z is pair by using one-dimensional convolution ₂ Compressed and fused to obtain output Z ₃ The feature fusion process is as follows:

Z ₁ ＝Conv2D(V ₁ ,V ₂ ,V ₃ )

P ^(pos,2i) ＝sin(pos/10000 ^2i/d )

P ^(pos,2i+1) ＝cos(pos/10000 ^2i/d )

Z ₂ ＝Z ₁ +P

Z ₃ ＝Conv1D(Z ₂ )

wherein

Is the multiplication of corresponding elements, sigma is the sigmoid function;

Z ₄ ＝Conv1D_Block(…(Conv1D_Block(Z ₃ )))

wherein Conv1D _ Block is a one-dimensional volume Block;

Q＝Z ₄ ·W ^Q ,K＝Z ₄ ·W ^K ,V＝Z ₄ ·W ^V

MultiHead(Q,K,V)＝Concat(head ₁ ,...,head _h )W ^o

wherein W ^Q 、W ^K 、W ^V Is a parameter matrix;

loss represents the average error of the training data, n is the number of training data, y _k Is the k-th training dataThe label of (1). Updating parameters layer by layer according to error loss back propagation, wherein the parameter updating process is as follows:

step 4-1, loading three word embedded matrixes of test data into an input embedded layer of an initialized MHDCNN-RA model to be used as weights to form a three-channel input matrix [ V ₁ ,V ₂ ,V ₃ ]；

Step 4-2, through a feature fusion layer, firstly fusing the three-channel matrix by using two-dimensional convolution to obtain an output Z ₁ Then to Z ₁ The odd number position adopts sine coding, the even number position adopts cosine coding to obtain output P, and the obtained position coding characteristic P and original characteristic Z are ₁ Adding to obtain a new fusion characteristic Z ₂ Finally, the feature matrix Z is pair by using one-dimensional convolution ₂ Compressed and fused to obtain output Z ₃ The feature fusion process is as follows:

Z ₁ ＝Conv2D(V ₁ ,V ₂ ,V ₃ )

P ^(pos,2i) ＝sin(pos/10000 ^2i/d )

P ^(pos,2i+1) ＝cos(pos/10000 ^2i/d )

Z ₂ ＝Z ₁ +P

Z ₃ ＝Conv1D(Z ₂ )

wherein

Is the multiplication of corresponding elements, sigma is the sigmoid function;

the stacking of one-dimensional volume blocks uses a hybrid void convolution scheme, i.e. expansion ratio [1,2, 4]]Repeating for three times to make the convolution kernel just cover the characteristic matrix Z ₃ Then using [1,1 ]]Fine granularity fine adjustment is carried out, and output Z is obtained through 12 one-dimensional convolution blocks ₄ . The process of the convolutional layer is as follows:

Z ₄ ＝Conv1D_Block(…(Conv1D_Block(Z ₃ )))

wherein Conv1D _ Block is a one-dimensional volume Block;

Q＝Z ₄ ·W ^Q ,K＝Z ₄ ·W ^K ,V＝Z ₄ ·W ^V

MultiHead(Q,K,V)＝Concat(head ₁ ,...,head _h )W ^o

wherein W ^Q 、W ^K 、W ^V Is a parameter matrix;

where s is the semantic category of the ambiguous vocabulary m.