CN115906825A - Chinese word sense disambiguation combining multi-channel mixed hole convolution with residual error and attention - Google Patents
Chinese word sense disambiguation combining multi-channel mixed hole convolution with residual error and attention Download PDFInfo
- Publication number
- CN115906825A CN115906825A CN202211495234.9A CN202211495234A CN115906825A CN 115906825 A CN115906825 A CN 115906825A CN 202211495234 A CN202211495234 A CN 202211495234A CN 115906825 A CN115906825 A CN 115906825A
- Authority
- CN
- China
- Prior art keywords
- convolution
- word
- disambiguation
- matrix
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 claims abstract description 48
- 238000012549 training Methods 0.000 claims abstract description 35
- 238000012360 testing method Methods 0.000 claims abstract description 31
- 230000011218 segmentation Effects 0.000 claims abstract description 8
- 239000011159 matrix material Substances 0.000 claims description 66
- 230000008569 process Effects 0.000 claims description 36
- 230000006870 function Effects 0.000 claims description 27
- 230000004913 activation Effects 0.000 claims description 17
- 238000004364 calculation method Methods 0.000 claims description 17
- 230000004927 fusion Effects 0.000 claims description 12
- 238000010606 normalization Methods 0.000 claims description 12
- 238000011176 pooling Methods 0.000 claims description 12
- 230000008034 disappearance Effects 0.000 claims description 8
- 238000007499 fusion processing Methods 0.000 claims description 6
- 238000002372 labelling Methods 0.000 claims description 6
- 239000011800 void material Substances 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 4
- 238000003058 natural language processing Methods 0.000 description 3
- 230000008520 organization Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000704 physical effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Machine Translation (AREA)
Abstract
The invention relates to a Chinese word sense disambiguation method Combining Multi-Channel mixed cavity Convolution with Residual error and Attention (MHDCNN-RA). The method comprises the steps of performing word segmentation, part of speech tagging and semantic class tagging on Chinese sentences containing ambiguous vocabularies to obtain processed training corpora and test corpora. And then, training the word sense disambiguation model by utilizing the training corpus to obtain an optimized MHDCNN-RA model. And disambiguating the test corpus on the optimized MHDCNN-RA model to obtain the weight of the ambiguous vocabulary in each semantic category, wherein the semantic category with the maximum weight is the semantic category of the ambiguous vocabulary. The invention realizes good disambiguation on of ambiguous words and more accurately judges the real meanings of the ambiguous words.
Description
The technical field is as follows:
the invention relates to a Chinese word meaning disambiguation method combining multi-channel mixed hole convolution with residual error and attention, which is well applied to the technical field of natural language processing.
Background art:
in natural languages, a large number of multi-meaning words exist, and the problem to be solved by word sense disambiguation is to determine which of the multiple word senses of each word should be selected as the correct word sense of the word in a particular context. Word sense disambiguation plays an important role in the fields of machine translation, semantic recognition, information retrieval and the like.
The vocabulary has often been disambiguated previously using some common algorithms, such as: k-means, naive Bayes, association rule-based classification methods, artificial neural networks, and the like. However, conventional algorithms suffer from several drawbacks and deficiencies. The extracted disambiguation features are only limited to local areas, and some features need to be artificially designed, so that the workload is large, the speed is low, and the training effect of the classifier is not good. In recent years, deep learning algorithms have been widely applied to the field of natural language processing. In an MHDCNN-RA model, context information and text similarity of linguistic data can be fully mined through a multi-channel convolution neural network, the grid effect of common hole convolution is effectively improved through mixed hole convolution, and multi-scale information is fully captured. The expression capability of the model is increased by using a deep convolutional network, the gradient disappearance problem of the deep neural network is relieved by using a residual error structure, and the connection between each disambiguation feature is mined by using a multi-head self-attention mechanism so as to obtain higher word sense disambiguation precision.
The invention content is as follows:
in order to solve the problem of lexical ambiguity in the field of natural language processing, the invention discloses a Chinese word sense disambiguation method combining multi-channel mixed hole convolution with residual error and attention. In natural languages, a large number of multi-meaning words exist, and the problem to be solved by word sense disambiguation is to determine which of the multiple word senses of each word should be selected as the correct word sense of the word in a particular context. Word sense disambiguation plays an important role in the fields of machine translation, semantic recognition, information retrieval and the like.
Therefore, the invention provides the following technical scheme:
1. chinese word sense disambiguation method combining multi-channel mixed hole convolution with residual error and attention, wherein ambiguous vocabulary m has C semantic categories s 1 ,s 2 ,…,s C The method is characterized by comprising the following steps:
step 1: performing word segmentation, part of speech tagging and semantic class tagging on a training corpus and a testing corpus of SemEval-2007.
Step 2: using a random initialization Word embedding matrix and using a Word2Vec, fasttext pre-trained Word embedding matrix for disambiguation features extracted from the corpus of SemEval-2007.
And step 3: and optimizing the MHDCNN-RA model by using the training data to obtain the optimized MHDCNN-RA model.
And 4, step 4: and inputting test data on the optimized MHDCNN-RA model, and calculating the weight of the ambiguous vocabulary m under each semantic category, wherein the semantic category with the maximum weight is the semantic category of the ambiguous vocabulary m.
2. The multi-channel mixed cavity convolution combined with residual and attention Chinese word sense disambiguation as claimed in claim 1, wherein in step 1, the training corpus and the testing corpus of SemEval-2007:
step 1-1, segmenting words of Chinese sentences by using a Chinese word segmentation tool;
step 1-2, performing part-of-speech tagging on the vocabulary by using a Chinese part-of-speech tagging tool;
step 1-3, according to synonym forest, semantic labeling is carried out on words by utilizing a Chinese semantic labeling tool;
step 1-4, extracting the word shapes, the part of speech, the semantic classes and the stroke numbers of four adjacent word units around the ambiguous word m, and taking 4 near-synonyms with the similarity of 2 adjacent words around the ambiguous word m and the synonym dictionary as disambiguation characteristics.
3. The multi-channel mixed-cavity convolution combining residual and Chinese Word sense disambiguation of attention according to claim 1, characterized in that in step 2, a randomly initialized Word embedding matrix and a Word2Vec, fasttext pre-trained Word embedding matrix are used for disambiguation features extracted from the training corpus of SemEval-2007 task #5, three Word embedding matrices are used as training data, and a randomly initialized Word embedding matrix and a Word2Vec, fasttext pre-trained Word embedding matrix are used for disambiguation features extracted from the testing corpus of SemEval-2007:
step 2-1 obtains a word embedding matrix V by using random initialization for disambiguation features extracted from the corpus of SemEval-2007 1 ;
Step 2-2 for disambiguation features extracted from the corpus of SemEval-2007 task #5, a Word2Vec is used to obtain a pre-trained Word embedding matrix V 2 ;
Step 2-3 uses FastText to obtain a pre-trained word embedding matrix V for disambiguation characteristics extracted from the training corpus of SemEval-2007 3 ;
Step 2-4, embedding the three words obtained in the step 2-1, the step 2-2, the step 2-3 into a matrix to be used as training data;
step 2-5 obtains the word embedding matrix V by using random initialization for disambiguation features extracted from the test corpus of SemEval-2007 1 ;
Step 2-6 the disambiguation signatures extracted from the test corpus of SemEval-2007 Task #5 were used Word2Vec to obtain a pre-trained Word embedding matrix V 2 ;
Step 2-7 uses FastText to obtain a pre-trained word embedding matrix V for disambiguation characteristics extracted from the test corpus of SemEval-2007 3 ;
And 2-8, using the three word embedding matrixes obtained in the steps 2-5,2-6 and 2-7 as test data.
4. The multi-channel mixed hole convolution combining residual error and Chinese word sense disambiguation of attention according to claim 1, wherein in the step 3, the MHDCNN-RA model is optimized by using training data to obtain an optimized MHDCNN-RA model, and the specific steps are as follows:
step 3-1, loading three word embedded matrixes of training data to an input embedded layer of an initialized MHDCNN-RA model as weights to form a three-channel input matrix [ V [ ] 1 ,V 2 ,V 3 ];
Step 3-2, through a feature fusion layer, firstly fusing the three-channel matrix by using two-dimensional convolution to obtain an output Z 1 Then to Z 1 The odd number position adopts sine coding, the even number position adopts cosine coding, the output P is obtained, the obtained position coding characteristic P and the original characteristic Z are 1 Adding to obtain a new fusion characteristic Z 2 Finally, the feature matrix Z is pair by using one-dimensional convolution 2 Compressed and fused to obtain output Z 3 The feature fusion process is as follows:
Z 1 =Conv2D(V 1 ,V 2 ,V 3 )
P (pos,2i) =sin(pos/10000 2i/d )
P (pos,2i+1) =cos(pos/10000 2i/d )
Z 2 =Z 1 +P
Z 3 =Conv1D(Z 2 )
where pos represents the index of one disambiguation feature in a group of disambiguation features, 2i and 2i +1 represent the parity position of the word vector dimension, where the word vector dimension d =256, then 2i = [0,2,4, ·.,254],2i +1= [1,3,5,...., 255];
and 3-3, passing through a depth convolution layer, wherein the depth convolution layer is formed by stacking 12 one-dimensional convolution blocks, the structures of all the convolution blocks are the same, only the expansion rates are different, two one-dimensional convolution forms in the one-dimensional convolution blocks are the same (the number of convolution kernels and the size of the convolution kernels are the same), but weights are not shared, one of the convolution blocks uses a sigmoid activation function, the other convolution block does not use an activation function, the two activation functions are multiplied bit by bit, the output dimension is consistent with the input dimension through filling, the input is also added to form a residual error structure, gradient disappearance is avoided, and information can be transmitted in multiple channels. The one-dimensional volume block calculation process is as follows:
the stacking of one-dimensional volume blocks uses a hybrid void convolution scheme, i.e. expansion ratio [1,2, 4]]Repeating for three times to make the convolution kernel just cover the characteristic matrix Z 3 Then using [1,1 ]]Fine granularity fine adjustment is carried out, and output Z is obtained through 12 one-dimensional convolution blocks 4 . The deep convolutional layer process is as follows:
Z 4 =Conv1D_Block(…(Conv1D_Block(Z 3 )))
wherein Conv1D _ Block is a one-dimensional volume Block;
step 3-4, passing through a normalization layer, for Z 4 Carrying out normalization;
step 3-5, digging the relation between each disambiguation characteristic through a multi-head self-attention layer, wherein the multi-head self-attention calculation process comprises the following steps:
Q=Z 4 ·W Q ,K=Z 4 ·W K ,V=Z 4 ·W V
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W o
wherein W Q 、W K 、W V Is a parameter matrix;
3-6, reducing parameters while keeping main characteristics through a maximum pooling layer;
step 3-7, outputting the ambiguous vocabulary m in the semantic category s through a self-adaptive average pooling layer i Assigning weight of w(s) to i |m),i=1,2,…,C;
Step 3-8 uses cross entropy loss function to calculate the error loss between the actual output and the expected output, the calculation process is as follows:
loss represents the average error of the training data, n is the number of training data, y k Is the label for the kth training data. Updating parameters layer by layer according to the error loss back propagation, wherein the parameter updating process is as follows:
wherein θ represents a parameter set, θ' represents an updated parameter set, and a is a learning rate;
and 3-9 continuously iterating the step 3-1 to the step 3-8 until the set iteration times are reached to obtain the optimized MHDCNN-RA model.
5. The multi-channel mixed hole convolution combining residual error and Chinese word sense disambiguation of attention according to claim 1, wherein in the step 4, a test process, namely a semantic classification process, inputs test data on the optimized MHDCNN-RA model, and calculates the weight of the ambiguous word m in each semantic category, wherein the semantic category with the largest weight is the semantic category of the ambiguous word, and the specific process is as follows:
step 4-1, loading three word embedded matrixes of test data into an input embedded layer of an initialized MHDCNN-RA model to be used as weights to form a three-channel input matrix [ V [ ] 1 ,V 2 ,V 3 ];
Step 4-2, through a feature fusion layer, firstly fusing the three-channel matrix by using two-dimensional convolution to obtain an output Z 1 Then to Z 1 Of (2)The number positions adopt sine coding, the even number positions adopt cosine coding to obtain output P, and the obtained position coding characteristic P and the original characteristic Z are used 1 Adding to obtain a new fusion characteristic Z 2 Finally, the feature matrix Z is pair by using one-dimensional convolution 2 Compressed and fused to obtain output Z 3 The feature fusion process is as follows:
Z 1 =Conv2D(V 1 ,V 2 ,V 3 )
P (pos,2i) =sin(pos/10000 2i/d )
P (pos,2i+1) =cos(pos/10000 2i/d )
Z 2 =Z 1 +P
Z 3 =Conv1D(Z 2 )
where pos represents the index of one disambiguation feature in a group of disambiguation features, 2i and 2i +1 represent the parity position of the word vector dimension, where the word vector dimension d =256, then 2i = [0,2,4, ·.,254],2i +1= [1,3,5,...., 255];
and 4-3, passing through a depth convolution layer, wherein the depth convolution layer is formed by stacking 12 one-dimensional convolution blocks, the structures of all the convolution blocks are completely the same and only have different expansion rates, two one-dimensional convolution forms in the one-dimensional convolution blocks are the same (the number of convolution kernels and the size of the convolution kernels are the same), but weights are not shared, one of the two one-dimensional convolution blocks uses a sigmoid activation function, the other one of the two one-dimensional convolution blocks does not use an activation function, then the two one-dimensional convolution blocks multiply the positions bit by bit, the output and the input dimension are kept consistent through filling, the input is added to form a residual error structure, gradient disappearance is avoided, and information can be transmitted in multiple channels. The one-dimensional volume block calculation process is as follows:
the stacking of one-dimensional volume blocks uses a hybrid void convolution scheme, i.e. expansion ratio [1,2, 4]]Repeating for three times to make the convolution kernel just cover the characteristic matrix Z 3 Then using [1,1 ]]Fine granularity fine adjustment is carried out, and output Z is obtained through 12 one-dimensional convolution blocks 4 . The process of the convolution layer is as follows:
Z 4 =Conv1D_Block(…(Conv1D_Block(Z 3 )))
wherein Conv1D _ Block is a one-dimensional volume Block;
step 4-4, passing through a normalization layer, for Z 4 Carrying out normalization;
and 4-5, mining the relation between each disambiguation feature through a multi-head self-attention layer, wherein the multi-head self-attention calculation process comprises the following steps:
Q=Z 4 ·W Q ,K=Z 4 ·W K ,V=Z 4 ·W V
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W o
wherein W Q 、W K 、W V Is a parameter matrix;
4-6, reducing parameters while keeping main characteristics through a maximum pooling layer;
step 4-7, outputting the ambiguous vocabulary m in the semantic category s through a self-adaptive average pooling layer i Assigning weight of w(s) to i |m),i=1,2,…,C;
Step 4-7, outputting the semantic category with the maximum weight, wherein the process is as follows:
where s is the semantic category of the ambiguous vocabulary m.
Has the advantages that:
1. the invention relates to a Chinese word sense disambiguation method combining multi-channel mixed hole convolution with residual error and attention. The Chinese sentences are subjected to Word segmentation and part-of-speech tagging, the stroke numbers and the similar meaning words of words adjacent to each other on the left and right of ambiguous words are extracted, a random initialization Word embedding matrix and Word embedding matrices pre-trained by Word2vec and FastText are used, and the three-channel Word embedding matrix formed by combining the three words has high quality.
2. The model used by the method is a neural network combining multi-channel mixed hole convolution with residual error and attention, and is mainly characterized in that three words are embedded into a matrix to be combined into three-channel input, and context information and text similarity of linguistic data can be fully mined. And carrying out position coding on the feature matrix and adding position information. The mixed hole convolution scheme is adopted to solve the problem of grid effect caused by single hole convolution, full coverage scanning is realized on the characteristic matrix, multi-scale information is obtained, the expression capability of the model is improved by using the deep neural network, and the gradient disappearance problem of the deep neural network is effectively relieved by using the residual error structure. And (3) mining the relation among all disambiguation characteristics by using a multi-head self-attention mechanism so as to improve the word sense disambiguation precision and obtain a good classification effect.
3. The cross entropy loss function used by the invention contains a softmax classifier, not only can solve the problem of multi-classification data processing, but also embeds NLLLoss and calculates error loss.
4. And when the model is trained, updating parameters by adopting an adam gradient descent method. By calculating the error, the error returns along the original route through reverse propagation, namely, each layer of parameters is updated layer by layer from the output layer through each middle hidden layer in the reverse direction, and finally the error returns to the output layer. Forward and backward propagation are continuously performed to reduce errors and update model parameters until the MHDCNN-RA is trained. As the parameters are continuously updated along with the back propagation of the errors, the disambiguation accuracy of the whole MHDCNN-RA model on the input data is improved.
Description of the drawings:
FIG. 1 is a flow chart of word sense disambiguation of a Chinese sentence in an embodiment of the present invention;
FIG. 2 is a training process of a word sense disambiguation model based on MHDCNN-RA in an embodiment of the present invention.
FIG. 3 is a testing process of the MHDCNN-RA based word sense disambiguation model in an embodiment of the present invention.
The specific implementation mode is as follows:
in order to clearly and completely describe the technical scheme in the embodiment of the invention, the Chinese sentences in the training corpus of SemEval-2007. "for example, the present invention will be described in further detail with reference to the drawings in the examples. There are 55 sentences in training corpus one and 19 sentences in testing corpus one. The ambiguous word "unit" has two semantic classes, 0: organization,1: and unit.
The flow chart of the multi-channel mixed hole convolution combined with residual error and Chinese word sense disambiguation of attention of the embodiment of the invention is shown in figure 1 and comprises the following steps. The training process of the word sense disambiguation model based on MHDCNN-RA in the embodiment of the invention is shown in FIG. 2. The testing process of the word sense disambiguation model based on MHDCNN-RA in the embodiment of the invention is shown in FIG. 3.
The extraction process of the disambiguation characteristics in the step 1 is as follows:
for the Chinese sentence, from 1 month and 1 day in 1999, no new houses can be physically distributed by any department or unit. ", the feature extraction steps are as follows:
step 1-1, segmenting words of Chinese sentences by using a Chinese word segmentation tool, wherein the word segmentation result is as follows:
from 1/1 in 1999, no department unit can distribute the physical properties of the new house
Step 1-2, performing part-of-speech tagging on the vocabulary by using a Chinese part-of-speech tagging tool, wherein the part-of-speech tagging result is as follows:
from/p 1999/t 1/t, f any/r department/n unit/n must/v re/d will/p Xin House/n proceed/v physical/n distribution/vn
Step 1-3, according to synonym forest, semantic labeling is carried out on words by using a Chinese semantic labeling tool:
from/p/Hi 39/t/Ca 18/t/Bd 02/t/Di 02/f/Kd 02 any/r/Eb 02 department/n/Di 09 unit/n/Di 09 can not/v/Gc 02/d/Ig 04/p/Ae 10 new house/n/Bn 03/v/Ig 03 substantial property/n/-1 distribution/vn/He 05
Step 1-4, extracting the word shapes, parts of speech, semantic classes and stroke numbers of the word shapes of the left and right adjacent word units of the unit, extracting 4 similar words with the similarity of the left and right adjacent words of the unit being close to the front according to the synonym forest, and combining the similar words into a group of disambiguation characteristics:
any r Eb02 department n Di09 cannot be operated by v Gc02 and d Ig04 13 13 15 6 institutions
Step 2, acquiring training data and test data of a unit:
step 2-1, using a random initialization Word embedding matrix and a Word embedding matrix pre-trained by Word2Vec and Fastext for disambiguation features extracted from the training corpus of SemEval-2007;
step 2-2 uses the randomly initialized Word embedding matrix and the Word embedding matrix pre-trained using Word2Vec, fastext for disambiguation features extracted from the test corpus of SemEval-2007 #5, for a total of three Word embedding matrices as test data, as follows:
any r Eb02 department n Di09 cannot be operated by v Gc02 and d Ig04 13 13 15 6 institutions
Deriving a word embedding matrix V using random initialization 1
Obtaining pre-training Word embedding matrix V by using Word2Vec 2
Obtaining pre-training word embedding matrix V by using FastText 3
Step 3 uses the training data to optimize the MHDCNN-RA model:
step 3-1, loading three word embedded matrixes of training data into an input embedded layer of an initialized MHDCNN-RA model to be used as weights to form a three-channel input matrix [ V 1 ,V 2 ,V 3 ];
Step 3-2, through a feature fusion layer, firstly using two-dimensional convolution to fuse the three-channel matrix to obtain output Z 1 Then to Z 2 The odd number position adopts sine coding, the even number position adopts cosine coding, the output P is obtained, the obtained position coding character P and the original character Z are 1 Adding to obtain a new fusion characteristic Z 2 Finally, the feature matrix Z is pair by using one-dimensional convolution 2 Compressed and fused to obtain output Z 3 The feature fusion process is as follows:
Z 1 =Conv2D(V 1 ,V 2 ,V 3 )
P (pos,2i) =sin(pos/10000 2i/d )
P (pos,2i+1) =cos(pos/10000 2i/d )
Z 2 =Z 1 +P
Z 3 =Conv1D(Z 2 )
where pos represents an index of one disambiguating feature in a set of disambiguating features, 2i and 2i +1 represent parity positions of the word vector dimension, where the word vector dimension d =256, then 2i = [0,2,4,.... Multidot.,. 254],2i +1= [1,3,5,. Multidot.,. 255];
and 3-3, stacking the depth convolution layer by 12 one-dimensional convolution blocks, wherein the structures of all the convolution blocks are the same, but the expansion rates are different, the two one-dimensional convolution forms in the one-dimensional convolution blocks are the same (the number of convolution kernels and the sizes of the convolution kernels are the same), but the weights are not shared, one of the convolution blocks uses a sigmoid activation function, the other one of the convolution blocks does not use an activation function, multiplying the two activation functions bit by bit, the output and the input dimensions are kept consistent through filling, the input is also added to form a residual error structure, gradient disappearance is avoided, and information can be transmitted in multiple channels. The one-dimensional volume block calculation process is as follows:
the stacking of one-dimensional volume blocks uses a hybrid void convolution scheme, i.e. expansion ratio [1,2, 4]]Repeating for three times to make the convolution kernel just cover the characteristic matrix Z 3 Then using [1,1 ]]Fine granularity fine adjustment is carried out, and output Z is obtained through 12 one-dimensional convolution blocks 4 . The depth convolution layer calculation process is as follows:
Z 4 =Conv1D_Block(…(Conv1D_Block(Z 3 )))
wherein Conv1D _ Block is a one-dimensional volume Block;
step 3-4, passing through a normalization layer, for Z 4 Carrying out normalization;
step 3-5, digging the relation between each disambiguation characteristic through a multi-head self-attention layer, wherein the multi-head self-attention calculation process comprises the following steps:
Q=Z 4 ·W Q ,K=Z 4 ·W K ,V=Z 4 ·W V
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W o
wherein W Q 、W K 、W V Is a parameter matrix;
3-6, reducing parameters while keeping main characteristics through a maximum pooling layer;
step 3-7, outputting ambiguous vocabulary 'units' in semantic category s through a self-adaptive average pooling layer i Assigning weight of w(s) to i |m),i=1,2,0:s 1 = organization and 1: s is 2 =unit;
Step 3-8 calculating the error loss between the actual output and the expected output using the cross entropy loss function Unit The calculation process is as follows:
updating parameters layer by layer according to the error loss back propagation, wherein the parameter updating process is as follows:
wherein, theta Unit of Denotes a parameter set, θ' Unit of Representing the updated parameter set, a being the learning rate;
3-9 continuously iterating the steps 3-1 to 3-8 until the set iteration times are reached to obtain an optimized MHDCNN-RA model;
step 4, performing semantic classification on the ambiguous word unit:
step 4-1, loading three word embedded matrixes of test data into an input embedded layer of an initialized MHDCNN-RA model to be used as weights to form a three-channel input matrix [ V [ ] 1 ,V 2 ,V 3 ];
Step 4-2, through a feature fusion layer, firstly using two-dimensional convolution to fuse the three-channel matrix to obtain output Z 1 Then to Z 2 The odd number position adopts sine coding, the even number position adopts cosine coding, the output P is obtained, the obtained position coding characteristic P and the original characteristic Z are 1 Adding to obtain a new fusion characteristic Z 2 Finally, the feature matrix Z is pair by using one-dimensional convolution 2 Compressed and fused to obtain output Z 3 The feature fusion process is as follows:
Z 1 =Conv2D(V 1 ,V 2 ,V 3 )
P (pos,2i) =sin(pos/10000 2i/d )
P (pos,2i+1) =cos(pos/10000 2i/d )
Z 2 =Z 1 +P
Z 3 =Conv1D(Z 2 )
where pos represents an index of one disambiguating feature in a set of disambiguating features, 2i and 2i +1 represent parity positions of the word vector dimension, where the word vector dimension d =256, then 2i = [0,2,4,.... Multidot.,. 254],2i +1= [1,3,5,. Multidot.,. 255];
and 4-3, passing through a depth convolution layer, wherein the depth convolution layer is formed by stacking 12 one-dimensional convolution blocks, the structures of all the convolution blocks are the same, only the expansion rates are different, two one-dimensional convolution forms in the one-dimensional convolution blocks are the same (the number of convolution kernels and the sizes of the convolution kernels are the same), but weights are not shared, one of the convolution blocks uses a sigmoid activation function, the other convolution block does not use an activation function, the two activation functions are multiplied bit by bit, the output dimension is consistent with the input dimension through filling, the input is added to form a residual error structure, gradient disappearance is avoided, and information can be transmitted in multiple channels. The one-dimensional volume block calculation process is as follows:
the stacking of one-dimensional volume blocks uses a hybrid void convolution scheme, i.e. expansion ratio [1,2, 4]]Repeating for three times to make the convolution kernel just cover the characteristic matrix Z 3 And then using a solution of [1,1]fine granularity fine adjustment is carried out, and output Z is obtained through 12 one-dimensional convolution blocks 4 . The depth convolution layer calculation process is as follows:
Z 4 =Conv1D_Block(…(Conv1D_Block(Z 3 )))
wherein Conv1D _ Block is a one-dimensional volume Block;
step 4-4, through the normalization layer, for Z 4 Normalization is carried out;
step 4-5, digging the relation between each disambiguation characteristic through a multi-head self-attention layer, wherein the multi-head self-attention calculation process comprises the following steps:
Q=Z 4 ·W Q ,K=Z 4 ·W K ,V=Z 4 ·W V
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W o
wherein W Q 、W K 、W V Is a parameter matrix;
4-6, reducing parameters while keeping main characteristics through a maximum pooling layer;
and 4-7, outputting an ambiguous word unit in a semantic category 0 through a self-adaptive average pooling layer: s is 1 = organization and 1: s 2 Weight allocated in unit w = [ w(s) = 1 In |, w(s) 2 I unit)]=[1.6917,7.5621];
Step 4-8, outputting the semantic category with the maximum weight as follows:
s 2 = unit represents semantic category corresponding to ambiguous word "unit".
Through the optimized MHDCNN-RA model, from 1 month and 1 day in 1999, the Chinese sentence containing the ambiguous word "unit" can not be subjected to physical distribution on the new house by any department and unit. The word sense disambiguation is carried out, and the semantic category corresponding to the ambiguous word unit is unit. Through experimental verification, the accuracy rate of the test corpus of the ambiguous word unit on the optimized MHDCNN-RA model reaches 84.21%.
The multi-channel mixed hole convolution in the embodiment of the invention combines the residual error and the Chinese word meaning disambiguation of attention, can select accurate disambiguation characteristics, and adopts the multi-channel mixed hole convolution to combine the neural network of the residual error and the attention to determine the semantic category of ambiguous words.
The foregoing is a detailed description of embodiments of the invention, taken in conjunction with the accompanying drawings, wherein the specific embodiments are merely provided to assist in understanding the method of the invention. For those skilled in the art, variations and modifications can be made within the scope of the embodiments and applications according to the concept of the present invention, and therefore the present invention should not be construed as being limited thereto.
Claims (5)
1. Chinese word meaning disambiguation method combining multi-channel mixed hole convolution with residual error and attention, wherein ambiguous vocabulary m has C semantic categories s 1 ,s 2 ,…,s C The method is characterized by comprising the following steps:
step 1: performing word segmentation, part of speech tagging and semantic class tagging on a training corpus and a testing corpus of SemEval-2007.
Step 2: using a random initial Word embedding matrix and using a Word2Vec, fastext pre-trained Word embedding matrix for disambiguation features extracted from the corpus of SemEval-2007.
And step 3: and optimizing the MHDCNN-RA model by using the training data to obtain the optimized MHDCNN-RA model.
And 4, step 4: and inputting test data on the optimized MHDCNN-RA model, and calculating the weight of the ambiguous vocabulary m under each semantic category, wherein the semantic category with the maximum weight is the semantic category of the ambiguous vocabulary m.
2. The multi-channel mixed cavity convolution combined with residual and attention Chinese word sense disambiguation as claimed in claim 1, wherein in step 1, the training corpus and the testing corpus of SemEval-2007:
step 1-1, segmenting words of Chinese sentences by using a Chinese word segmentation tool;
step 1-2, performing part-of-speech tagging on the vocabulary by using a Chinese part-of-speech tagging tool;
step 1-3, according to synonym forest, semantic labeling is carried out on words by utilizing a Chinese semantic labeling tool;
and (1) extracting the morphology, the part of speech, the semantic class and the stroke number of four adjacent vocabulary units around the ambiguous vocabulary m and taking 4 near-meaning words with the similarity close to 2 adjacent vocabularies around the ambiguous word m according to a synonym dictionary as disambiguation characteristics.
3. The multi-channel mixed-hole convolution combined residual and attention Chinese Word sense disambiguation as described in claim 1 wherein in step 2, a randomly initialized Word embedding matrix and a Word embedding matrix using Word2Vec, fasttext pre-trained are used for disambiguation features extracted from the corpus of SemEval-2007:
step 2-1, a word embedding matrix V is obtained by using random initialization on disambiguation characteristics extracted from the corpus of SemEval-2007 1 ;
Step 2-2 obtaining a pre-trained Word embedding matrix V using Word2Vec for disambiguation features extracted from the corpus of SemEval-2007 2 ;
Step 2-3 uses FastText to obtain a pre-trained word embedding matrix V for disambiguating features extracted from the corpus of SemEval-2007 3 ;
Step 2-4, embedding the three words obtained in the step 2-1, the step 2-2, the step 2-3 into a matrix to be used as training data;
step 2-5 obtains the word embedding matrix V by using random initialization for disambiguation features extracted from the test corpus of SemEval-2007 1 ;
Step 2-6 the disambiguation signatures extracted from the test corpus of SemEval-2007 Task #5 were used Word2Vec to obtain a pre-trained Word embedding matrix V 2 ;
Step 2-7 uses FastText to obtain a pre-trained word embedding matrix V for disambiguation characteristics extracted from the test corpus of SemEval-2007 3 ;
And 2-8, using the three word embedding matrixes obtained in the steps 2-5,2-6 and 2-7 as test data.
4. The multi-channel mixed hole convolution combining residual error and Chinese word sense disambiguation of attention according to claim 1, wherein in the step 3, the MHDCNN-RA model is optimized by using training data to obtain an optimized MHDCNN-RA model, and the specific steps are as follows:
step 3-1, loading three word embedded matrixes of training data to an input embedded layer of an initialized MHDCNN-RA model as weights to form a three-channel input matrix [ V [ ] 1 ,V 2 ,V 3 ];
Step 3-2, through a feature fusion layer, firstly using two-dimensional convolution to fuse the three-channel matrix to obtain output Z 1 Then to Z 1 Is adopted at odd number positionSine coding, cosine coding is adopted at even number positions to obtain output P, and the obtained position coding characteristic P and original characteristic Z are used 1 Adding to obtain a new fusion characteristic Z 2 Finally, the feature matrix Z is pair by using one-dimensional convolution 2 Compressed and fused to obtain output Z 3 The feature fusion process is as follows:
Z 1 =Conv2D(V 1 ,V 2 ,V 3 )
P (pos,2i) =sin(pos/10000 2i/d )
P (pos,2i+1) =cos(pos/10000 2i/d )
Z 2 =Z 1 +P
Z 3 =Conv1D(Z 2 )
where pos represents the index of one disambiguation feature in a group of disambiguation features, 2i and 2i +1 represent the parity position of the word vector dimension, where the word vector dimension d =256, then 2i = [0,2,4, ·.,254],2i +1= [1,3,5,...., 255];
and 3-3, passing through a depth convolution layer, wherein the depth convolution layer is formed by stacking 12 one-dimensional convolution blocks, the structures of all the convolution blocks are the same, only the expansion rates are different, two one-dimensional convolution forms in the one-dimensional convolution blocks are the same (the number of convolution kernels and the size of the convolution kernels are the same), but weights are not shared, one of the convolution blocks uses a sigmoid activation function, the other convolution block does not use an activation function, the two activation functions are multiplied bit by bit, the output dimension is consistent with the input dimension through filling, the input is also added to form a residual error structure, gradient disappearance is avoided, and information can be transmitted in multiple channels. The one-dimensional volume block calculation process is as follows:
the stacking of one-dimensional volume blocks uses a hybrid void convolution scheme, i.e. expansion ratio [1,2, 4]]Repeating for three times to make the convolution kernel just cover the characteristic matrix Z 3 Then using [1,1 ]]Fine granularity fine adjustment is carried out, and output Z is obtained through 12 one-dimensional convolution blocks 4 . The deep convolutional layer process is as follows:
Z 4 =Conv1D_Block(…(Conv1D_Block(Z 3 )))
wherein Conv1D _ Block is a one-dimensional volume Block;
step 3-4, passing through a normalization layer, for Z 4 Carrying out normalization;
step 3-5, digging the relation between each disambiguation characteristic through a multi-head self-attention layer, wherein the multi-head self-attention calculation process comprises the following steps:
Q=Z 4 ·W Q ,K=Z 4 ·W K ,V=Z 4 ·W V
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W o
wherein W Q 、W K 、W V Is a parameter matrix;
3-6, reducing parameters while keeping main characteristics through a maximum pooling layer;
step 3-7, outputting the ambiguous vocabulary m in the semantic category s through a self-adaptive average pooling layer i Assigning weight of w(s) to i |m),i=1,2,…,C;
Step 3-8 uses cross entropy loss function to calculate the error loss between the actual output and the expected output, the calculation process is as follows:
loss represents the average error of the training data, n is the number of training data, y k Is the k-th training dataThe label of (1). Updating parameters layer by layer according to error loss back propagation, wherein the parameter updating process is as follows:
wherein θ represents a parameter set, θ' represents an updated parameter set, and a is a learning rate;
and 3-9 continuously iterating the step 3-1 to the step 3-8 until the set iteration times are reached to obtain the optimized MHDCNN-RA model.
5. The multi-channel mixed hole convolution combining residual error and Chinese word sense disambiguation of attention according to claim 1, wherein in the step 4, a test process, namely a semantic classification process, inputs test data on the optimized MHDCNN-RA model, and calculates the weight of the ambiguous word m in each semantic category, wherein the semantic category with the largest weight is the semantic category of the ambiguous word, and the specific process is as follows:
step 4-1, loading three word embedded matrixes of test data into an input embedded layer of an initialized MHDCNN-RA model to be used as weights to form a three-channel input matrix [ V 1 ,V 2 ,V 3 ];
Step 4-2, through a feature fusion layer, firstly fusing the three-channel matrix by using two-dimensional convolution to obtain an output Z 1 Then to Z 1 The odd number position adopts sine coding, the even number position adopts cosine coding to obtain output P, and the obtained position coding characteristic P and original characteristic Z are 1 Adding to obtain a new fusion characteristic Z 2 Finally, the feature matrix Z is pair by using one-dimensional convolution 2 Compressed and fused to obtain output Z 3 The feature fusion process is as follows:
Z 1 =Conv2D(V 1 ,V 2 ,V 3 )
P (pos,2i) =sin(pos/10000 2i/d )
P (pos,2i+1) =cos(pos/10000 2i/d )
Z 2 =Z 1 +P
Z 3 =Conv1D(Z 2 )
where pos represents an index of one disambiguating feature in a set of disambiguating features, 2i and 2i +1 represent parity positions of the word vector dimension, where the word vector dimension d =256, then 2i = [0,2,4,.... Multidot.,. 254],2i +1= [1,3,5,. Multidot.,. 255];
and 4-3, passing through a depth convolution layer, wherein the depth convolution layer is formed by stacking 12 one-dimensional convolution blocks, the structures of all the convolution blocks are the same, only the expansion rates are different, two one-dimensional convolution forms in the one-dimensional convolution blocks are the same (the number of convolution kernels and the sizes of the convolution kernels are the same), but weights are not shared, one of the convolution blocks uses a sigmoid activation function, the other convolution block does not use an activation function, the two activation functions are multiplied bit by bit, the output dimension is consistent with the input dimension through filling, the input is added to form a residual error structure, gradient disappearance is avoided, and information can be transmitted in multiple channels. The one-dimensional volume block calculation process is as follows:
the stacking of one-dimensional volume blocks uses a hybrid void convolution scheme, i.e. expansion ratio [1,2, 4]]Repeating for three times to make the convolution kernel just cover the characteristic matrix Z 3 Then using [1,1 ]]Fine granularity fine adjustment is carried out, and output Z is obtained through 12 one-dimensional convolution blocks 4 . The process of the convolutional layer is as follows:
Z 4 =Conv1D_Block(…(Conv1D_Block(Z 3 )))
wherein Conv1D _ Block is a one-dimensional volume Block;
step 4-4, through the normalization layer, for Z 4 Normalization is carried out;
step 4-5, digging the relation between each disambiguation characteristic through a multi-head self-attention layer, wherein the multi-head self-attention calculation process comprises the following steps:
Q=Z 4 ·W Q ,K=Z 4 ·W K ,V=Z 4 ·W V
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W o
wherein W Q 、W K 、W V Is a parameter matrix;
4-6, reducing parameters while keeping main characteristics through a maximum pooling layer;
step 4-7, outputting the ambiguous vocabulary m in the semantic category s through a self-adaptive average pooling layer i Assigning weight of w(s) to i |m),i=1,2,…,C;
Step 4-7, outputting the semantic category with the maximum weight, wherein the process is as follows:
where s is the semantic category of the ambiguous vocabulary m.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211495234.9A CN115906825A (en) | 2022-11-26 | 2022-11-26 | Chinese word sense disambiguation combining multi-channel mixed hole convolution with residual error and attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211495234.9A CN115906825A (en) | 2022-11-26 | 2022-11-26 | Chinese word sense disambiguation combining multi-channel mixed hole convolution with residual error and attention |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115906825A true CN115906825A (en) | 2023-04-04 |
Family
ID=86475879
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211495234.9A Pending CN115906825A (en) | 2022-11-26 | 2022-11-26 | Chinese word sense disambiguation combining multi-channel mixed hole convolution with residual error and attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115906825A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118135244A (en) * | 2024-05-10 | 2024-06-04 | 东北大学 | Target detection method under complex overlapped background |
-
2022
- 2022-11-26 CN CN202211495234.9A patent/CN115906825A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118135244A (en) * | 2024-05-10 | 2024-06-04 | 东北大学 | Target detection method under complex overlapped background |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108595632B (en) | Hybrid neural network text classification method fusing abstract and main body characteristics | |
Zhang et al. | Multiview convolutional neural networks for multidocument extractive summarization | |
CN112990296B (en) | Image-text matching model compression and acceleration method and system based on orthogonal similarity distillation | |
Qiu et al. | Learning word representation considering proximity and ambiguity | |
CN110600047A (en) | Perceptual STARGAN-based many-to-many speaker conversion method | |
CN111027595B (en) | Double-stage semantic word vector generation method | |
CN110826338B (en) | Fine-grained semantic similarity recognition method for single-selection gate and inter-class measurement | |
CN108427729A (en) | Large-scale picture retrieval method based on depth residual error network and Hash coding | |
CN110765755A (en) | Semantic similarity feature extraction method based on double selection gates | |
CN108733647B (en) | Word vector generation method based on Gaussian distribution | |
CN109002473A (en) | A kind of sentiment analysis method based on term vector and part of speech | |
CN110222338B (en) | Organization name entity identification method | |
CN109918507B (en) | textCNN (text-based network communication network) improved text classification method | |
CN108446334A (en) | Image retrieval method based on content for unsupervised countermeasure training | |
CN115422939B (en) | Fine granularity commodity named entity identification method based on big data | |
CN111061873B (en) | Multi-channel text classification method based on Attention mechanism | |
CN114254645A (en) | Artificial intelligence auxiliary writing system | |
CN115510230A (en) | Mongolian emotion analysis method based on multi-dimensional feature fusion and comparative reinforcement learning mechanism | |
CN115759119A (en) | Financial text emotion analysis method, system, medium and equipment | |
CN115906825A (en) | Chinese word sense disambiguation combining multi-channel mixed hole convolution with residual error and attention | |
CN117610579B (en) | Semantic analysis method and system based on long-short-term memory network | |
CN116629264B (en) | Relation extraction method based on multiple word embedding and multi-head self-attention mechanism | |
CN113065350A (en) | Biomedical text word sense disambiguation method based on attention neural network | |
CN111008529A (en) | Chinese relation extraction method based on neural network | |
CN115965027A (en) | Text abstract automatic extraction method based on semantic matching |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |