CN115906825A - Chinese word sense disambiguation combining multi-channel mixed hole convolution with residual error and attention - Google Patents

Chinese word sense disambiguation combining multi-channel mixed hole convolution with residual error and attention Download PDF

Info

Publication number
CN115906825A
CN115906825A CN202211495234.9A CN202211495234A CN115906825A CN 115906825 A CN115906825 A CN 115906825A CN 202211495234 A CN202211495234 A CN 202211495234A CN 115906825 A CN115906825 A CN 115906825A
Authority
CN
China
Prior art keywords
convolution
word
disambiguation
matrix
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211495234.9A
Other languages
Chinese (zh)
Inventor
张春祥
张育隆
杨玉建
高雪瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN202211495234.9A priority Critical patent/CN115906825A/en
Publication of CN115906825A publication Critical patent/CN115906825A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to a Chinese word sense disambiguation method Combining Multi-Channel mixed cavity Convolution with Residual error and Attention (MHDCNN-RA). The method comprises the steps of performing word segmentation, part of speech tagging and semantic class tagging on Chinese sentences containing ambiguous vocabularies to obtain processed training corpora and test corpora. And then, training the word sense disambiguation model by utilizing the training corpus to obtain an optimized MHDCNN-RA model. And disambiguating the test corpus on the optimized MHDCNN-RA model to obtain the weight of the ambiguous vocabulary in each semantic category, wherein the semantic category with the maximum weight is the semantic category of the ambiguous vocabulary. The invention realizes good disambiguation on of ambiguous words and more accurately judges the real meanings of the ambiguous words.

Description

Chinese word sense disambiguation combining multi-channel mixed hole convolution with residual error and attention
The technical field is as follows:
the invention relates to a Chinese word meaning disambiguation method combining multi-channel mixed hole convolution with residual error and attention, which is well applied to the technical field of natural language processing.
Background art:
in natural languages, a large number of multi-meaning words exist, and the problem to be solved by word sense disambiguation is to determine which of the multiple word senses of each word should be selected as the correct word sense of the word in a particular context. Word sense disambiguation plays an important role in the fields of machine translation, semantic recognition, information retrieval and the like.
The vocabulary has often been disambiguated previously using some common algorithms, such as: k-means, naive Bayes, association rule-based classification methods, artificial neural networks, and the like. However, conventional algorithms suffer from several drawbacks and deficiencies. The extracted disambiguation features are only limited to local areas, and some features need to be artificially designed, so that the workload is large, the speed is low, and the training effect of the classifier is not good. In recent years, deep learning algorithms have been widely applied to the field of natural language processing. In an MHDCNN-RA model, context information and text similarity of linguistic data can be fully mined through a multi-channel convolution neural network, the grid effect of common hole convolution is effectively improved through mixed hole convolution, and multi-scale information is fully captured. The expression capability of the model is increased by using a deep convolutional network, the gradient disappearance problem of the deep neural network is relieved by using a residual error structure, and the connection between each disambiguation feature is mined by using a multi-head self-attention mechanism so as to obtain higher word sense disambiguation precision.
The invention content is as follows:
in order to solve the problem of lexical ambiguity in the field of natural language processing, the invention discloses a Chinese word sense disambiguation method combining multi-channel mixed hole convolution with residual error and attention. In natural languages, a large number of multi-meaning words exist, and the problem to be solved by word sense disambiguation is to determine which of the multiple word senses of each word should be selected as the correct word sense of the word in a particular context. Word sense disambiguation plays an important role in the fields of machine translation, semantic recognition, information retrieval and the like.
Therefore, the invention provides the following technical scheme:
1. chinese word sense disambiguation method combining multi-channel mixed hole convolution with residual error and attention, wherein ambiguous vocabulary m has C semantic categories s 1 ,s 2 ,…,s C The method is characterized by comprising the following steps:
step 1: performing word segmentation, part of speech tagging and semantic class tagging on a training corpus and a testing corpus of SemEval-2007.
Step 2: using a random initialization Word embedding matrix and using a Word2Vec, fasttext pre-trained Word embedding matrix for disambiguation features extracted from the corpus of SemEval-2007.
And step 3: and optimizing the MHDCNN-RA model by using the training data to obtain the optimized MHDCNN-RA model.
And 4, step 4: and inputting test data on the optimized MHDCNN-RA model, and calculating the weight of the ambiguous vocabulary m under each semantic category, wherein the semantic category with the maximum weight is the semantic category of the ambiguous vocabulary m.
2. The multi-channel mixed cavity convolution combined with residual and attention Chinese word sense disambiguation as claimed in claim 1, wherein in step 1, the training corpus and the testing corpus of SemEval-2007:
step 1-1, segmenting words of Chinese sentences by using a Chinese word segmentation tool;
step 1-2, performing part-of-speech tagging on the vocabulary by using a Chinese part-of-speech tagging tool;
step 1-3, according to synonym forest, semantic labeling is carried out on words by utilizing a Chinese semantic labeling tool;
step 1-4, extracting the word shapes, the part of speech, the semantic classes and the stroke numbers of four adjacent word units around the ambiguous word m, and taking 4 near-synonyms with the similarity of 2 adjacent words around the ambiguous word m and the synonym dictionary as disambiguation characteristics.
3. The multi-channel mixed-cavity convolution combining residual and Chinese Word sense disambiguation of attention according to claim 1, characterized in that in step 2, a randomly initialized Word embedding matrix and a Word2Vec, fasttext pre-trained Word embedding matrix are used for disambiguation features extracted from the training corpus of SemEval-2007 task #5, three Word embedding matrices are used as training data, and a randomly initialized Word embedding matrix and a Word2Vec, fasttext pre-trained Word embedding matrix are used for disambiguation features extracted from the testing corpus of SemEval-2007:
step 2-1 obtains a word embedding matrix V by using random initialization for disambiguation features extracted from the corpus of SemEval-2007 1
Step 2-2 for disambiguation features extracted from the corpus of SemEval-2007 task #5, a Word2Vec is used to obtain a pre-trained Word embedding matrix V 2
Step 2-3 uses FastText to obtain a pre-trained word embedding matrix V for disambiguation characteristics extracted from the training corpus of SemEval-2007 3
Step 2-4, embedding the three words obtained in the step 2-1, the step 2-2, the step 2-3 into a matrix to be used as training data;
step 2-5 obtains the word embedding matrix V by using random initialization for disambiguation features extracted from the test corpus of SemEval-2007 1
Step 2-6 the disambiguation signatures extracted from the test corpus of SemEval-2007 Task #5 were used Word2Vec to obtain a pre-trained Word embedding matrix V 2
Step 2-7 uses FastText to obtain a pre-trained word embedding matrix V for disambiguation characteristics extracted from the test corpus of SemEval-2007 3
And 2-8, using the three word embedding matrixes obtained in the steps 2-5,2-6 and 2-7 as test data.
4. The multi-channel mixed hole convolution combining residual error and Chinese word sense disambiguation of attention according to claim 1, wherein in the step 3, the MHDCNN-RA model is optimized by using training data to obtain an optimized MHDCNN-RA model, and the specific steps are as follows:
step 3-1, loading three word embedded matrixes of training data to an input embedded layer of an initialized MHDCNN-RA model as weights to form a three-channel input matrix [ V [ ] 1 ,V 2 ,V 3 ];
Step 3-2, through a feature fusion layer, firstly fusing the three-channel matrix by using two-dimensional convolution to obtain an output Z 1 Then to Z 1 The odd number position adopts sine coding, the even number position adopts cosine coding, the output P is obtained, the obtained position coding characteristic P and the original characteristic Z are 1 Adding to obtain a new fusion characteristic Z 2 Finally, the feature matrix Z is pair by using one-dimensional convolution 2 Compressed and fused to obtain output Z 3 The feature fusion process is as follows:
Z 1 =Conv2D(V 1 ,V 2 ,V 3 )
P (pos,2i) =sin(pos/10000 2i/d )
P (pos,2i+1) =cos(pos/10000 2i/d )
Z 2 =Z 1 +P
Z 3 =Conv1D(Z 2 )
where pos represents the index of one disambiguation feature in a group of disambiguation features, 2i and 2i +1 represent the parity position of the word vector dimension, where the word vector dimension d =256, then 2i = [0,2,4, ·.,254],2i +1= [1,3,5,...., 255];
and 3-3, passing through a depth convolution layer, wherein the depth convolution layer is formed by stacking 12 one-dimensional convolution blocks, the structures of all the convolution blocks are the same, only the expansion rates are different, two one-dimensional convolution forms in the one-dimensional convolution blocks are the same (the number of convolution kernels and the size of the convolution kernels are the same), but weights are not shared, one of the convolution blocks uses a sigmoid activation function, the other convolution block does not use an activation function, the two activation functions are multiplied bit by bit, the output dimension is consistent with the input dimension through filling, the input is also added to form a residual error structure, gradient disappearance is avoided, and information can be transmitted in multiple channels. The one-dimensional volume block calculation process is as follows:
Figure BDA0003965497740000051
wherein
Figure BDA0003965497740000052
Is the multiplication of corresponding elements, sigma is the sigmoid function;
the stacking of one-dimensional volume blocks uses a hybrid void convolution scheme, i.e. expansion ratio [1,2, 4]]Repeating for three times to make the convolution kernel just cover the characteristic matrix Z 3 Then using [1,1 ]]Fine granularity fine adjustment is carried out, and output Z is obtained through 12 one-dimensional convolution blocks 4 . The deep convolutional layer process is as follows:
Z 4 =Conv1D_Block(…(Conv1D_Block(Z 3 )))
wherein Conv1D _ Block is a one-dimensional volume Block;
step 3-4, passing through a normalization layer, for Z 4 Carrying out normalization;
step 3-5, digging the relation between each disambiguation characteristic through a multi-head self-attention layer, wherein the multi-head self-attention calculation process comprises the following steps:
Figure BDA0003965497740000053
Q=Z 4 ·W Q ,K=Z 4 ·W K ,V=Z 4 ·W V
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W o
wherein W Q 、W K 、W V Is a parameter matrix;
3-6, reducing parameters while keeping main characteristics through a maximum pooling layer;
step 3-7, outputting the ambiguous vocabulary m in the semantic category s through a self-adaptive average pooling layer i Assigning weight of w(s) to i |m),i=1,2,…,C;
Step 3-8 uses cross entropy loss function to calculate the error loss between the actual output and the expected output, the calculation process is as follows:
Figure BDA0003965497740000061
loss represents the average error of the training data, n is the number of training data, y k Is the label for the kth training data. Updating parameters layer by layer according to the error loss back propagation, wherein the parameter updating process is as follows:
Figure BDA0003965497740000062
wherein θ represents a parameter set, θ' represents an updated parameter set, and a is a learning rate;
and 3-9 continuously iterating the step 3-1 to the step 3-8 until the set iteration times are reached to obtain the optimized MHDCNN-RA model.
5. The multi-channel mixed hole convolution combining residual error and Chinese word sense disambiguation of attention according to claim 1, wherein in the step 4, a test process, namely a semantic classification process, inputs test data on the optimized MHDCNN-RA model, and calculates the weight of the ambiguous word m in each semantic category, wherein the semantic category with the largest weight is the semantic category of the ambiguous word, and the specific process is as follows:
step 4-1, loading three word embedded matrixes of test data into an input embedded layer of an initialized MHDCNN-RA model to be used as weights to form a three-channel input matrix [ V [ ] 1 ,V 2 ,V 3 ];
Step 4-2, through a feature fusion layer, firstly fusing the three-channel matrix by using two-dimensional convolution to obtain an output Z 1 Then to Z 1 Of (2)The number positions adopt sine coding, the even number positions adopt cosine coding to obtain output P, and the obtained position coding characteristic P and the original characteristic Z are used 1 Adding to obtain a new fusion characteristic Z 2 Finally, the feature matrix Z is pair by using one-dimensional convolution 2 Compressed and fused to obtain output Z 3 The feature fusion process is as follows:
Z 1 =Conv2D(V 1 ,V 2 ,V 3 )
P (pos,2i) =sin(pos/10000 2i/d )
P (pos,2i+1) =cos(pos/10000 2i/d )
Z 2 =Z 1 +P
Z 3 =Conv1D(Z 2 )
where pos represents the index of one disambiguation feature in a group of disambiguation features, 2i and 2i +1 represent the parity position of the word vector dimension, where the word vector dimension d =256, then 2i = [0,2,4, ·.,254],2i +1= [1,3,5,...., 255];
and 4-3, passing through a depth convolution layer, wherein the depth convolution layer is formed by stacking 12 one-dimensional convolution blocks, the structures of all the convolution blocks are completely the same and only have different expansion rates, two one-dimensional convolution forms in the one-dimensional convolution blocks are the same (the number of convolution kernels and the size of the convolution kernels are the same), but weights are not shared, one of the two one-dimensional convolution blocks uses a sigmoid activation function, the other one of the two one-dimensional convolution blocks does not use an activation function, then the two one-dimensional convolution blocks multiply the positions bit by bit, the output and the input dimension are kept consistent through filling, the input is added to form a residual error structure, gradient disappearance is avoided, and information can be transmitted in multiple channels. The one-dimensional volume block calculation process is as follows:
Figure BDA0003965497740000071
wherein
Figure BDA0003965497740000072
Is the multiplication of corresponding elements, sigma is the sigmoid function;
the stacking of one-dimensional volume blocks uses a hybrid void convolution scheme, i.e. expansion ratio [1,2, 4]]Repeating for three times to make the convolution kernel just cover the characteristic matrix Z 3 Then using [1,1 ]]Fine granularity fine adjustment is carried out, and output Z is obtained through 12 one-dimensional convolution blocks 4 . The process of the convolution layer is as follows:
Z 4 =Conv1D_Block(…(Conv1D_Block(Z 3 )))
wherein Conv1D _ Block is a one-dimensional volume Block;
step 4-4, passing through a normalization layer, for Z 4 Carrying out normalization;
and 4-5, mining the relation between each disambiguation feature through a multi-head self-attention layer, wherein the multi-head self-attention calculation process comprises the following steps:
Figure BDA0003965497740000073
Q=Z 4 ·W Q ,K=Z 4 ·W K ,V=Z 4 ·W V
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W o
wherein W Q 、W K 、W V Is a parameter matrix;
4-6, reducing parameters while keeping main characteristics through a maximum pooling layer;
step 4-7, outputting the ambiguous vocabulary m in the semantic category s through a self-adaptive average pooling layer i Assigning weight of w(s) to i |m),i=1,2,…,C;
Step 4-7, outputting the semantic category with the maximum weight, wherein the process is as follows:
Figure BDA0003965497740000081
where s is the semantic category of the ambiguous vocabulary m.
Has the advantages that:
1. the invention relates to a Chinese word sense disambiguation method combining multi-channel mixed hole convolution with residual error and attention. The Chinese sentences are subjected to Word segmentation and part-of-speech tagging, the stroke numbers and the similar meaning words of words adjacent to each other on the left and right of ambiguous words are extracted, a random initialization Word embedding matrix and Word embedding matrices pre-trained by Word2vec and FastText are used, and the three-channel Word embedding matrix formed by combining the three words has high quality.
2. The model used by the method is a neural network combining multi-channel mixed hole convolution with residual error and attention, and is mainly characterized in that three words are embedded into a matrix to be combined into three-channel input, and context information and text similarity of linguistic data can be fully mined. And carrying out position coding on the feature matrix and adding position information. The mixed hole convolution scheme is adopted to solve the problem of grid effect caused by single hole convolution, full coverage scanning is realized on the characteristic matrix, multi-scale information is obtained, the expression capability of the model is improved by using the deep neural network, and the gradient disappearance problem of the deep neural network is effectively relieved by using the residual error structure. And (3) mining the relation among all disambiguation characteristics by using a multi-head self-attention mechanism so as to improve the word sense disambiguation precision and obtain a good classification effect.
3. The cross entropy loss function used by the invention contains a softmax classifier, not only can solve the problem of multi-classification data processing, but also embeds NLLLoss and calculates error loss.
4. And when the model is trained, updating parameters by adopting an adam gradient descent method. By calculating the error, the error returns along the original route through reverse propagation, namely, each layer of parameters is updated layer by layer from the output layer through each middle hidden layer in the reverse direction, and finally the error returns to the output layer. Forward and backward propagation are continuously performed to reduce errors and update model parameters until the MHDCNN-RA is trained. As the parameters are continuously updated along with the back propagation of the errors, the disambiguation accuracy of the whole MHDCNN-RA model on the input data is improved.
Description of the drawings:
FIG. 1 is a flow chart of word sense disambiguation of a Chinese sentence in an embodiment of the present invention;
FIG. 2 is a training process of a word sense disambiguation model based on MHDCNN-RA in an embodiment of the present invention.
FIG. 3 is a testing process of the MHDCNN-RA based word sense disambiguation model in an embodiment of the present invention.
The specific implementation mode is as follows:
in order to clearly and completely describe the technical scheme in the embodiment of the invention, the Chinese sentences in the training corpus of SemEval-2007. "for example, the present invention will be described in further detail with reference to the drawings in the examples. There are 55 sentences in training corpus one and 19 sentences in testing corpus one. The ambiguous word "unit" has two semantic classes, 0: organization,1: and unit.
The flow chart of the multi-channel mixed hole convolution combined with residual error and Chinese word sense disambiguation of attention of the embodiment of the invention is shown in figure 1 and comprises the following steps. The training process of the word sense disambiguation model based on MHDCNN-RA in the embodiment of the invention is shown in FIG. 2. The testing process of the word sense disambiguation model based on MHDCNN-RA in the embodiment of the invention is shown in FIG. 3.
The extraction process of the disambiguation characteristics in the step 1 is as follows:
for the Chinese sentence, from 1 month and 1 day in 1999, no new houses can be physically distributed by any department or unit. ", the feature extraction steps are as follows:
step 1-1, segmenting words of Chinese sentences by using a Chinese word segmentation tool, wherein the word segmentation result is as follows:
from 1/1 in 1999, no department unit can distribute the physical properties of the new house
Step 1-2, performing part-of-speech tagging on the vocabulary by using a Chinese part-of-speech tagging tool, wherein the part-of-speech tagging result is as follows:
from/p 1999/t 1/t, f any/r department/n unit/n must/v re/d will/p Xin House/n proceed/v physical/n distribution/vn
Step 1-3, according to synonym forest, semantic labeling is carried out on words by using a Chinese semantic labeling tool:
from/p/Hi 39/t/Ca 18/t/Bd 02/t/Di 02/f/Kd 02 any/r/Eb 02 department/n/Di 09 unit/n/Di 09 can not/v/Gc 02/d/Ig 04/p/Ae 10 new house/n/Bn 03/v/Ig 03 substantial property/n/-1 distribution/vn/He 05
Step 1-4, extracting the word shapes, parts of speech, semantic classes and stroke numbers of the word shapes of the left and right adjacent word units of the unit, extracting 4 similar words with the similarity of the left and right adjacent words of the unit being close to the front according to the synonym forest, and combining the similar words into a group of disambiguation characteristics:
any r Eb02 department n Di09 cannot be operated by v Gc02 and d Ig04 13 13 15 6 institutions
Step 2, acquiring training data and test data of a unit:
step 2-1, using a random initialization Word embedding matrix and a Word embedding matrix pre-trained by Word2Vec and Fastext for disambiguation features extracted from the training corpus of SemEval-2007;
step 2-2 uses the randomly initialized Word embedding matrix and the Word embedding matrix pre-trained using Word2Vec, fastext for disambiguation features extracted from the test corpus of SemEval-2007 #5, for a total of three Word embedding matrices as test data, as follows:
any r Eb02 department n Di09 cannot be operated by v Gc02 and d Ig04 13 13 15 6 institutions
Deriving a word embedding matrix V using random initialization 1
Figure BDA0003965497740000111
Obtaining pre-training Word embedding matrix V by using Word2Vec 2
Figure BDA0003965497740000121
Obtaining pre-training word embedding matrix V by using FastText 3
Figure BDA0003965497740000122
Figure BDA0003965497740000131
Step 3 uses the training data to optimize the MHDCNN-RA model:
step 3-1, loading three word embedded matrixes of training data into an input embedded layer of an initialized MHDCNN-RA model to be used as weights to form a three-channel input matrix [ V 1 ,V 2 ,V 3 ];
Step 3-2, through a feature fusion layer, firstly using two-dimensional convolution to fuse the three-channel matrix to obtain output Z 1 Then to Z 2 The odd number position adopts sine coding, the even number position adopts cosine coding, the output P is obtained, the obtained position coding character P and the original character Z are 1 Adding to obtain a new fusion characteristic Z 2 Finally, the feature matrix Z is pair by using one-dimensional convolution 2 Compressed and fused to obtain output Z 3 The feature fusion process is as follows:
Z 1 =Conv2D(V 1 ,V 2 ,V 3 )
P (pos,2i) =sin(pos/10000 2i/d )
P (pos,2i+1) =cos(pos/10000 2i/d )
Z 2 =Z 1 +P
Z 3 =Conv1D(Z 2 )
where pos represents an index of one disambiguating feature in a set of disambiguating features, 2i and 2i +1 represent parity positions of the word vector dimension, where the word vector dimension d =256, then 2i = [0,2,4,.... Multidot.,. 254],2i +1= [1,3,5,. Multidot.,. 255];
and 3-3, stacking the depth convolution layer by 12 one-dimensional convolution blocks, wherein the structures of all the convolution blocks are the same, but the expansion rates are different, the two one-dimensional convolution forms in the one-dimensional convolution blocks are the same (the number of convolution kernels and the sizes of the convolution kernels are the same), but the weights are not shared, one of the convolution blocks uses a sigmoid activation function, the other one of the convolution blocks does not use an activation function, multiplying the two activation functions bit by bit, the output and the input dimensions are kept consistent through filling, the input is also added to form a residual error structure, gradient disappearance is avoided, and information can be transmitted in multiple channels. The one-dimensional volume block calculation process is as follows:
Figure BDA0003965497740000142
wherein
Figure BDA0003965497740000141
Is the multiplication of corresponding elements, sigma is the sigmoid function;
the stacking of one-dimensional volume blocks uses a hybrid void convolution scheme, i.e. expansion ratio [1,2, 4]]Repeating for three times to make the convolution kernel just cover the characteristic matrix Z 3 Then using [1,1 ]]Fine granularity fine adjustment is carried out, and output Z is obtained through 12 one-dimensional convolution blocks 4 . The depth convolution layer calculation process is as follows:
Z 4 =Conv1D_Block(…(Conv1D_Block(Z 3 )))
wherein Conv1D _ Block is a one-dimensional volume Block;
step 3-4, passing through a normalization layer, for Z 4 Carrying out normalization;
step 3-5, digging the relation between each disambiguation characteristic through a multi-head self-attention layer, wherein the multi-head self-attention calculation process comprises the following steps:
Figure BDA0003965497740000151
Q=Z 4 ·W Q ,K=Z 4 ·W K ,V=Z 4 ·W V
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W o
wherein W Q 、W K 、W V Is a parameter matrix;
3-6, reducing parameters while keeping main characteristics through a maximum pooling layer;
step 3-7, outputting ambiguous vocabulary 'units' in semantic category s through a self-adaptive average pooling layer i Assigning weight of w(s) to i |m),i=1,2,0:s 1 = organization and 1: s is 2 =unit;
Step 3-8 calculating the error loss between the actual output and the expected output using the cross entropy loss function Unit The calculation process is as follows:
Figure BDA0003965497740000153
updating parameters layer by layer according to the error loss back propagation, wherein the parameter updating process is as follows:
Figure BDA0003965497740000152
wherein, theta Unit of Denotes a parameter set, θ' Unit of Representing the updated parameter set, a being the learning rate;
3-9 continuously iterating the steps 3-1 to 3-8 until the set iteration times are reached to obtain an optimized MHDCNN-RA model;
step 4, performing semantic classification on the ambiguous word unit:
step 4-1, loading three word embedded matrixes of test data into an input embedded layer of an initialized MHDCNN-RA model to be used as weights to form a three-channel input matrix [ V [ ] 1 ,V 2 ,V 3 ];
Step 4-2, through a feature fusion layer, firstly using two-dimensional convolution to fuse the three-channel matrix to obtain output Z 1 Then to Z 2 The odd number position adopts sine coding, the even number position adopts cosine coding, the output P is obtained, the obtained position coding characteristic P and the original characteristic Z are 1 Adding to obtain a new fusion characteristic Z 2 Finally, the feature matrix Z is pair by using one-dimensional convolution 2 Compressed and fused to obtain output Z 3 The feature fusion process is as follows:
Z 1 =Conv2D(V 1 ,V 2 ,V 3 )
P (pos,2i) =sin(pos/10000 2i/d )
P (pos,2i+1) =cos(pos/10000 2i/d )
Z 2 =Z 1 +P
Z 3 =Conv1D(Z 2 )
where pos represents an index of one disambiguating feature in a set of disambiguating features, 2i and 2i +1 represent parity positions of the word vector dimension, where the word vector dimension d =256, then 2i = [0,2,4,.... Multidot.,. 254],2i +1= [1,3,5,. Multidot.,. 255];
and 4-3, passing through a depth convolution layer, wherein the depth convolution layer is formed by stacking 12 one-dimensional convolution blocks, the structures of all the convolution blocks are the same, only the expansion rates are different, two one-dimensional convolution forms in the one-dimensional convolution blocks are the same (the number of convolution kernels and the sizes of the convolution kernels are the same), but weights are not shared, one of the convolution blocks uses a sigmoid activation function, the other convolution block does not use an activation function, the two activation functions are multiplied bit by bit, the output dimension is consistent with the input dimension through filling, the input is added to form a residual error structure, gradient disappearance is avoided, and information can be transmitted in multiple channels. The one-dimensional volume block calculation process is as follows:
Figure BDA0003965497740000161
wherein
Figure BDA0003965497740000162
Is the multiplication of corresponding elements, sigma is the sigmoid function;
the stacking of one-dimensional volume blocks uses a hybrid void convolution scheme, i.e. expansion ratio [1,2, 4]]Repeating for three times to make the convolution kernel just cover the characteristic matrix Z 3 And then using a solution of [1,1]fine granularity fine adjustment is carried out, and output Z is obtained through 12 one-dimensional convolution blocks 4 . The depth convolution layer calculation process is as follows:
Z 4 =Conv1D_Block(…(Conv1D_Block(Z 3 )))
wherein Conv1D _ Block is a one-dimensional volume Block;
step 4-4, through the normalization layer, for Z 4 Normalization is carried out;
step 4-5, digging the relation between each disambiguation characteristic through a multi-head self-attention layer, wherein the multi-head self-attention calculation process comprises the following steps:
Figure BDA0003965497740000171
Q=Z 4 ·W Q ,K=Z 4 ·W K ,V=Z 4 ·W V
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W o
wherein W Q 、W K 、W V Is a parameter matrix;
4-6, reducing parameters while keeping main characteristics through a maximum pooling layer;
and 4-7, outputting an ambiguous word unit in a semantic category 0 through a self-adaptive average pooling layer: s is 1 = organization and 1: s 2 Weight allocated in unit w = [ w(s) = 1 In |, w(s) 2 I unit)]=[1.6917,7.5621];
Step 4-8, outputting the semantic category with the maximum weight as follows:
Figure BDA0003965497740000172
s 2 = unit represents semantic category corresponding to ambiguous word "unit".
Through the optimized MHDCNN-RA model, from 1 month and 1 day in 1999, the Chinese sentence containing the ambiguous word "unit" can not be subjected to physical distribution on the new house by any department and unit. The word sense disambiguation is carried out, and the semantic category corresponding to the ambiguous word unit is unit. Through experimental verification, the accuracy rate of the test corpus of the ambiguous word unit on the optimized MHDCNN-RA model reaches 84.21%.
The multi-channel mixed hole convolution in the embodiment of the invention combines the residual error and the Chinese word meaning disambiguation of attention, can select accurate disambiguation characteristics, and adopts the multi-channel mixed hole convolution to combine the neural network of the residual error and the attention to determine the semantic category of ambiguous words.
The foregoing is a detailed description of embodiments of the invention, taken in conjunction with the accompanying drawings, wherein the specific embodiments are merely provided to assist in understanding the method of the invention. For those skilled in the art, variations and modifications can be made within the scope of the embodiments and applications according to the concept of the present invention, and therefore the present invention should not be construed as being limited thereto.

Claims (5)

1. Chinese word meaning disambiguation method combining multi-channel mixed hole convolution with residual error and attention, wherein ambiguous vocabulary m has C semantic categories s 1 ,s 2 ,…,s C The method is characterized by comprising the following steps:
step 1: performing word segmentation, part of speech tagging and semantic class tagging on a training corpus and a testing corpus of SemEval-2007.
Step 2: using a random initial Word embedding matrix and using a Word2Vec, fastext pre-trained Word embedding matrix for disambiguation features extracted from the corpus of SemEval-2007.
And step 3: and optimizing the MHDCNN-RA model by using the training data to obtain the optimized MHDCNN-RA model.
And 4, step 4: and inputting test data on the optimized MHDCNN-RA model, and calculating the weight of the ambiguous vocabulary m under each semantic category, wherein the semantic category with the maximum weight is the semantic category of the ambiguous vocabulary m.
2. The multi-channel mixed cavity convolution combined with residual and attention Chinese word sense disambiguation as claimed in claim 1, wherein in step 1, the training corpus and the testing corpus of SemEval-2007:
step 1-1, segmenting words of Chinese sentences by using a Chinese word segmentation tool;
step 1-2, performing part-of-speech tagging on the vocabulary by using a Chinese part-of-speech tagging tool;
step 1-3, according to synonym forest, semantic labeling is carried out on words by utilizing a Chinese semantic labeling tool;
and (1) extracting the morphology, the part of speech, the semantic class and the stroke number of four adjacent vocabulary units around the ambiguous vocabulary m and taking 4 near-meaning words with the similarity close to 2 adjacent vocabularies around the ambiguous word m according to a synonym dictionary as disambiguation characteristics.
3. The multi-channel mixed-hole convolution combined residual and attention Chinese Word sense disambiguation as described in claim 1 wherein in step 2, a randomly initialized Word embedding matrix and a Word embedding matrix using Word2Vec, fasttext pre-trained are used for disambiguation features extracted from the corpus of SemEval-2007:
step 2-1, a word embedding matrix V is obtained by using random initialization on disambiguation characteristics extracted from the corpus of SemEval-2007 1
Step 2-2 obtaining a pre-trained Word embedding matrix V using Word2Vec for disambiguation features extracted from the corpus of SemEval-2007 2
Step 2-3 uses FastText to obtain a pre-trained word embedding matrix V for disambiguating features extracted from the corpus of SemEval-2007 3
Step 2-4, embedding the three words obtained in the step 2-1, the step 2-2, the step 2-3 into a matrix to be used as training data;
step 2-5 obtains the word embedding matrix V by using random initialization for disambiguation features extracted from the test corpus of SemEval-2007 1
Step 2-6 the disambiguation signatures extracted from the test corpus of SemEval-2007 Task #5 were used Word2Vec to obtain a pre-trained Word embedding matrix V 2
Step 2-7 uses FastText to obtain a pre-trained word embedding matrix V for disambiguation characteristics extracted from the test corpus of SemEval-2007 3
And 2-8, using the three word embedding matrixes obtained in the steps 2-5,2-6 and 2-7 as test data.
4. The multi-channel mixed hole convolution combining residual error and Chinese word sense disambiguation of attention according to claim 1, wherein in the step 3, the MHDCNN-RA model is optimized by using training data to obtain an optimized MHDCNN-RA model, and the specific steps are as follows:
step 3-1, loading three word embedded matrixes of training data to an input embedded layer of an initialized MHDCNN-RA model as weights to form a three-channel input matrix [ V [ ] 1 ,V 2 ,V 3 ];
Step 3-2, through a feature fusion layer, firstly using two-dimensional convolution to fuse the three-channel matrix to obtain output Z 1 Then to Z 1 Is adopted at odd number positionSine coding, cosine coding is adopted at even number positions to obtain output P, and the obtained position coding characteristic P and original characteristic Z are used 1 Adding to obtain a new fusion characteristic Z 2 Finally, the feature matrix Z is pair by using one-dimensional convolution 2 Compressed and fused to obtain output Z 3 The feature fusion process is as follows:
Z 1 =Conv2D(V 1 ,V 2 ,V 3 )
P (pos,2i) =sin(pos/10000 2i/d )
P (pos,2i+1) =cos(pos/10000 2i/d )
Z 2 =Z 1 +P
Z 3 =Conv1D(Z 2 )
where pos represents the index of one disambiguation feature in a group of disambiguation features, 2i and 2i +1 represent the parity position of the word vector dimension, where the word vector dimension d =256, then 2i = [0,2,4, ·.,254],2i +1= [1,3,5,...., 255];
and 3-3, passing through a depth convolution layer, wherein the depth convolution layer is formed by stacking 12 one-dimensional convolution blocks, the structures of all the convolution blocks are the same, only the expansion rates are different, two one-dimensional convolution forms in the one-dimensional convolution blocks are the same (the number of convolution kernels and the size of the convolution kernels are the same), but weights are not shared, one of the convolution blocks uses a sigmoid activation function, the other convolution block does not use an activation function, the two activation functions are multiplied bit by bit, the output dimension is consistent with the input dimension through filling, the input is also added to form a residual error structure, gradient disappearance is avoided, and information can be transmitted in multiple channels. The one-dimensional volume block calculation process is as follows:
Figure FDA0003965497730000031
wherein
Figure FDA0003965497730000041
Is the multiplication of corresponding elements, sigma is the sigmoid function;
the stacking of one-dimensional volume blocks uses a hybrid void convolution scheme, i.e. expansion ratio [1,2, 4]]Repeating for three times to make the convolution kernel just cover the characteristic matrix Z 3 Then using [1,1 ]]Fine granularity fine adjustment is carried out, and output Z is obtained through 12 one-dimensional convolution blocks 4 . The deep convolutional layer process is as follows:
Z 4 =Conv1D_Block(…(Conv1D_Block(Z 3 )))
wherein Conv1D _ Block is a one-dimensional volume Block;
step 3-4, passing through a normalization layer, for Z 4 Carrying out normalization;
step 3-5, digging the relation between each disambiguation characteristic through a multi-head self-attention layer, wherein the multi-head self-attention calculation process comprises the following steps:
Figure FDA0003965497730000042
Q=Z 4 ·W Q ,K=Z 4 ·W K ,V=Z 4 ·W V
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W o
wherein W Q 、W K 、W V Is a parameter matrix;
3-6, reducing parameters while keeping main characteristics through a maximum pooling layer;
step 3-7, outputting the ambiguous vocabulary m in the semantic category s through a self-adaptive average pooling layer i Assigning weight of w(s) to i |m),i=1,2,…,C;
Step 3-8 uses cross entropy loss function to calculate the error loss between the actual output and the expected output, the calculation process is as follows:
Figure FDA0003965497730000043
loss represents the average error of the training data, n is the number of training data, y k Is the k-th training dataThe label of (1). Updating parameters layer by layer according to error loss back propagation, wherein the parameter updating process is as follows:
Figure FDA0003965497730000051
wherein θ represents a parameter set, θ' represents an updated parameter set, and a is a learning rate;
and 3-9 continuously iterating the step 3-1 to the step 3-8 until the set iteration times are reached to obtain the optimized MHDCNN-RA model.
5. The multi-channel mixed hole convolution combining residual error and Chinese word sense disambiguation of attention according to claim 1, wherein in the step 4, a test process, namely a semantic classification process, inputs test data on the optimized MHDCNN-RA model, and calculates the weight of the ambiguous word m in each semantic category, wherein the semantic category with the largest weight is the semantic category of the ambiguous word, and the specific process is as follows:
step 4-1, loading three word embedded matrixes of test data into an input embedded layer of an initialized MHDCNN-RA model to be used as weights to form a three-channel input matrix [ V 1 ,V 2 ,V 3 ];
Step 4-2, through a feature fusion layer, firstly fusing the three-channel matrix by using two-dimensional convolution to obtain an output Z 1 Then to Z 1 The odd number position adopts sine coding, the even number position adopts cosine coding to obtain output P, and the obtained position coding characteristic P and original characteristic Z are 1 Adding to obtain a new fusion characteristic Z 2 Finally, the feature matrix Z is pair by using one-dimensional convolution 2 Compressed and fused to obtain output Z 3 The feature fusion process is as follows:
Z 1 =Conv2D(V 1 ,V 2 ,V 3 )
P (pos,2i) =sin(pos/10000 2i/d )
P (pos,2i+1) =cos(pos/10000 2i/d )
Z 2 =Z 1 +P
Z 3 =Conv1D(Z 2 )
where pos represents an index of one disambiguating feature in a set of disambiguating features, 2i and 2i +1 represent parity positions of the word vector dimension, where the word vector dimension d =256, then 2i = [0,2,4,.... Multidot.,. 254],2i +1= [1,3,5,. Multidot.,. 255];
and 4-3, passing through a depth convolution layer, wherein the depth convolution layer is formed by stacking 12 one-dimensional convolution blocks, the structures of all the convolution blocks are the same, only the expansion rates are different, two one-dimensional convolution forms in the one-dimensional convolution blocks are the same (the number of convolution kernels and the sizes of the convolution kernels are the same), but weights are not shared, one of the convolution blocks uses a sigmoid activation function, the other convolution block does not use an activation function, the two activation functions are multiplied bit by bit, the output dimension is consistent with the input dimension through filling, the input is added to form a residual error structure, gradient disappearance is avoided, and information can be transmitted in multiple channels. The one-dimensional volume block calculation process is as follows:
Figure FDA0003965497730000061
wherein
Figure FDA0003965497730000062
Is the multiplication of corresponding elements, sigma is the sigmoid function;
the stacking of one-dimensional volume blocks uses a hybrid void convolution scheme, i.e. expansion ratio [1,2, 4]]Repeating for three times to make the convolution kernel just cover the characteristic matrix Z 3 Then using [1,1 ]]Fine granularity fine adjustment is carried out, and output Z is obtained through 12 one-dimensional convolution blocks 4 . The process of the convolutional layer is as follows:
Z 4 =Conv1D_Block(…(Conv1D_Block(Z 3 )))
wherein Conv1D _ Block is a one-dimensional volume Block;
step 4-4, through the normalization layer, for Z 4 Normalization is carried out;
step 4-5, digging the relation between each disambiguation characteristic through a multi-head self-attention layer, wherein the multi-head self-attention calculation process comprises the following steps:
Figure FDA0003965497730000063
Q=Z 4 ·W Q ,K=Z 4 ·W K ,V=Z 4 ·W V
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W o
wherein W Q 、W K 、W V Is a parameter matrix;
4-6, reducing parameters while keeping main characteristics through a maximum pooling layer;
step 4-7, outputting the ambiguous vocabulary m in the semantic category s through a self-adaptive average pooling layer i Assigning weight of w(s) to i |m),i=1,2,…,C;
Step 4-7, outputting the semantic category with the maximum weight, wherein the process is as follows:
Figure FDA0003965497730000064
where s is the semantic category of the ambiguous vocabulary m.
CN202211495234.9A 2022-11-26 2022-11-26 Chinese word sense disambiguation combining multi-channel mixed hole convolution with residual error and attention Pending CN115906825A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211495234.9A CN115906825A (en) 2022-11-26 2022-11-26 Chinese word sense disambiguation combining multi-channel mixed hole convolution with residual error and attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211495234.9A CN115906825A (en) 2022-11-26 2022-11-26 Chinese word sense disambiguation combining multi-channel mixed hole convolution with residual error and attention

Publications (1)

Publication Number Publication Date
CN115906825A true CN115906825A (en) 2023-04-04

Family

ID=86475879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211495234.9A Pending CN115906825A (en) 2022-11-26 2022-11-26 Chinese word sense disambiguation combining multi-channel mixed hole convolution with residual error and attention

Country Status (1)

Country Link
CN (1) CN115906825A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118135244A (en) * 2024-05-10 2024-06-04 东北大学 Target detection method under complex overlapped background

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118135244A (en) * 2024-05-10 2024-06-04 东北大学 Target detection method under complex overlapped background

Similar Documents

Publication Publication Date Title
CN108595632B (en) Hybrid neural network text classification method fusing abstract and main body characteristics
Zhang et al. Multiview convolutional neural networks for multidocument extractive summarization
CN112990296B (en) Image-text matching model compression and acceleration method and system based on orthogonal similarity distillation
Qiu et al. Learning word representation considering proximity and ambiguity
CN110600047A (en) Perceptual STARGAN-based many-to-many speaker conversion method
CN111027595B (en) Double-stage semantic word vector generation method
CN110826338B (en) Fine-grained semantic similarity recognition method for single-selection gate and inter-class measurement
CN108427729A (en) Large-scale picture retrieval method based on depth residual error network and Hash coding
CN110765755A (en) Semantic similarity feature extraction method based on double selection gates
CN108733647B (en) Word vector generation method based on Gaussian distribution
CN109002473A (en) A kind of sentiment analysis method based on term vector and part of speech
CN110222338B (en) Organization name entity identification method
CN109918507B (en) textCNN (text-based network communication network) improved text classification method
CN108446334A (en) Image retrieval method based on content for unsupervised countermeasure training
CN115422939B (en) Fine granularity commodity named entity identification method based on big data
CN111061873B (en) Multi-channel text classification method based on Attention mechanism
CN114254645A (en) Artificial intelligence auxiliary writing system
CN115510230A (en) Mongolian emotion analysis method based on multi-dimensional feature fusion and comparative reinforcement learning mechanism
CN115759119A (en) Financial text emotion analysis method, system, medium and equipment
CN115906825A (en) Chinese word sense disambiguation combining multi-channel mixed hole convolution with residual error and attention
CN117610579B (en) Semantic analysis method and system based on long-short-term memory network
CN116629264B (en) Relation extraction method based on multiple word embedding and multi-head self-attention mechanism
CN113065350A (en) Biomedical text word sense disambiguation method based on attention neural network
CN111008529A (en) Chinese relation extraction method based on neural network
CN115965027A (en) Text abstract automatic extraction method based on semantic matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination