CN115168579A - Text classification method based on multi-head attention mechanism and two-dimensional convolution operation - Google Patents

Text classification method based on multi-head attention mechanism and two-dimensional convolution operation Download PDF

Info

Publication number
CN115168579A
CN115168579A CN202210800916.XA CN202210800916A CN115168579A CN 115168579 A CN115168579 A CN 115168579A CN 202210800916 A CN202210800916 A CN 202210800916A CN 115168579 A CN115168579 A CN 115168579A
Authority
CN
China
Prior art keywords
text
layer
neural network
attention mechanism
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210800916.XA
Other languages
Chinese (zh)
Inventor
孙源佑
邓木清
蔡洁标
张贵有
江嘉宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Smart Traditional Chinese Medicine Technology Guangdong Co ltd
Original Assignee
Smart Traditional Chinese Medicine Technology Guangdong Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Smart Traditional Chinese Medicine Technology Guangdong Co ltd filed Critical Smart Traditional Chinese Medicine Technology Guangdong Co ltd
Priority to CN202210800916.XA priority Critical patent/CN115168579A/en
Publication of CN115168579A publication Critical patent/CN115168579A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a text classification method based on a multi-head attention mechanism and two-dimensional convolution operation, which relates to the technical field of natural language processing.

Description

Text classification method based on multi-head attention mechanism and two-dimensional convolution operation
Technical Field
The invention relates to the technical field of natural language processing, in particular to a text classification method based on a multi-head attention mechanism and two-dimensional convolution operation.
Background
Natural language processing is an important direction in the fields of computer science and artificial intelligence, researches various theories and methods capable of realizing effective communication between people and computers by using natural language, is a science integrating linguistics, computer science and mathematics, and has extremely wide application such as intelligent voice question-answering system, fraud short message identification, network comment emotion identification and the like.
In the medical field, clinical medical information is stored in information systems in a large amount in unstructured (or semi-structured) text form, and natural language processing is a key technology for extracting useful information from medical text. Through natural language processing, the unstructured medical texts are converted into structured data containing important medical information, and scientific research personnel can find useful medical information from the structured data, so that the operation quality of the medical system is improved, and the operation cost is reduced. In the era of rapid development of internet technology, the medical field faces no more information acquisition problems, but how to quickly and accurately acquire valuable information from massive information resources, and the generation modes of medical text information are various and rich, and the huge data volume makes manual distinction and arrangement difficult, so that how to effectively classify texts becomes very important.
At present, the commonly used text classification methods include a support vector machine, a convolutional neural network, a cyclic neural network, a BERT and the like, the BERT and the RNN can realize excellent classification effects, but the models are large, the training is difficult, and the application on a small host is difficult; the TextGCN achieves a good classification effect on a small model through a graph convolution technology, but for nodes which are not seen, the TextGCN cannot classify, a convolution neural network applied to a text is only one-dimensional, and if the text dimension of an input text is high, semantic information is lost by using the one-dimensional convolution neural network. The prior art discloses a self-adaptive text classification method and a self-adaptive text classification device based on BERT, firstly preprocessing corpus sample data to be classified, constructing a preset network model, then inputting the preprocessed sample data into the preset network model, performing supervised training by using a preset loss function to obtain a classification model, setting an output threshold of the classification model, obtaining the set classification model for text classification, setting the output threshold on the classification model to control the advanced output of a classification result, and shortening the model inference time without losing precision.
Disclosure of Invention
In order to solve the problems that a traditional text classification model adopted by the existing medical text classification has long training time and large calculated amount, the invention provides a text classification method based on a multi-head attention mechanism and two-dimensional convolution operation, which has small calculated amount and high training speed and also gives consideration to a good text classification effect.
In order to achieve the technical effects, the technical scheme of the invention is as follows:
a method of text classification based on a multi-head attention mechanism and a two-dimensional convolution operation, the method comprising the steps of:
s1, determining a text data set, and dividing the text data set into a training set and a test set;
s2, preprocessing the texts in the training set;
s3, constructing a neural network, wherein the neural network comprises an embedding layer, a multi-head attention mechanism layer, a convolution layer, a pooling layer and a full-connection layer which are sequentially connected;
s4, inputting the text after the preprocessing operation into an embedding layer of a neural network to obtain a word vector;
s5, forming a word vector matrix based on the word vectors, inputting the word vector matrix to the multi-head attention mechanism layer, executing the attention function in parallel, splicing and mapping to obtain text enhancement semantic representation output after the multi-head attention mechanism layer, and performing pre-training of text classification on the multi-head attention mechanism layer by using the text enhancement semantic representation;
s6, fusing the text enhancement semantic representations to obtain text fusion semantic representations, performing two-dimensional convolution operation on the text fusion semantic representations, and outputting convolution operation feature vectors;
s7, performing text classification training on the neural network by using the convolution operation characteristic vector, and adjusting the weight of a multi-attention machine mechanism layer to obtain the trained neural network;
and S8, preprocessing the text in the test set, and inputting the trained neural network to obtain a classification result.
Preferably, in step S2, the preprocessing operation performed on the texts in the training set includes:
s21, establishing a deactivation word list by using all punctuations and spaces of the texts in the training set;
s22, reading each character in the text in sequence, comparing each character with the character in the stop word list, and automatically skipping if the read character is the character in the stop word list;
s23, the texts with punctuations and spaces removed are input character by character, all characters in all texts are read, and one-hot vectors are established for each character.
Preferably, in step S3, the built embedding layer of the neural network uses one-hot vector embedding at a character-by-character level in the text as semantic representation, the embedding layer has three layers, and the weight matrix of each layer is: w 1 、W 2 、W 3 The activation functions of all layers are sigmoid, the layers are connected in sequence, and after preprocessing operation is carried out on the texts in the training set, each text in all the texts is obtainedInputting the one-hot vector of each word into a neural network to obtain a word vector, wherein the calculation performed at each layer respectively comprises the following steps:
x 1 =sigmoid(W 1 x 0 +b)
x 2 =sigmoid(W 2 x 1 +b)
x=sigmoid(W 3 x 2 +b)
wherein x is 0 One-hot vector, x, representing a word 1 Denotes x 0 Intermediate value, x, after activation of the first layer 2 Denotes x 1 Intermediate value after activation of the second layer, x representing x 2 And b represents a bias vector.
Preferably, a plurality of self-attention mechanisms are connected in series to form a multi-head attention mechanism layer, wherein the input of the self-attention mechanism is query and the dimension is d k Has a key and a dimension of d v In order to obtain the weight of value, a group of query is regarded as a matrix Q, key and value are respectively regarded as a matrix K and a matrix V, and the value is obtained based on a softmax function and an attention function:
Figure BDA0003737608360000031
the calculation method of Q, K and V comprises the following steps:
Q=XW Q
K=XW K
V=XW V
wherein, W Q 、W K 、W V A weight matrix representing the self-attention mechanism of the three inputs query, key and value.
Preferably, the self-vector component word vector matrix for each word obtained at S4 is represented as X = [ X ] 1 ,x 2 ,...,x n ]Inputting the word vector matrix into a multi-head attention mechanism layer, executing an attention function in parallel, splicing and mapping, and obtaining R text enhancement semantic representation X with the same size as X by setting R multi-head attention mechanism layers 1 ,X 2 ,...,X R The calculation formula in the multi-head attention mechanism layer is as follows:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W 0
wherein the content of the first and second substances,
Figure BDA0003737608360000041
i denotes the order of the multi-head attention control layers of the R multi-head attention control layers, i =1, 2.., R; to X 1 ,X 2 ,...,X R After performing the flatten operation, inputting a full connection layer, namely performing pre-training of text classification.
Preferably, in step S6, the text is enhanced with a semantic representation X 1 ,X 2 ,...,X R Fusion is realized through splicing operation, and text fusion semantic representation is obtained and is characterized in that: x s =Concatenate(X 1 ,X 2 ,...,X R ),X s For a three-dimensional tensor, when performing a two-dimensional convolution operation on a text-fused semantic representation, for the convolution layer, X 1 ,X 2 ,...,X R Setting the sizes of convolution kernels and the number of the convolution kernels as input R channels, wherein the size of the convolution kernels on a first dimension is equal to the length of a word vector; fusing semantics X for convolution kernel C and text s The element calculation formula of the convolution result matrix is as follows:
Figure BDA0003737608360000042
wherein X s (i, j, k) is input X s R (p, q) is an element in the convolution result matrix, and C (i, j-p +1, k-q + 1) is an element in the convolution kernel.
Inputting the convolution result matrix into a pooling layer to perform maximum pooling operation, only reserving the maximum element in the convolution result matrix, and outputting corresponding to each convolution kernel:
Figure BDA0003737608360000043
finally, convolution operation feature vectors are output, and loss of semantic information is avoided.
Preferably, when the convolutional operation feature vectors are used for text classification training of the neural network, in the full-connection layer, the weight of the multi-attention machine mechanism layer is adjusted through a back propagation algorithm, and a tensierflow packet is used for adding the pre-trained multi-attention machine mechanism layer into a new model class.
The present invention also provides a computer apparatus comprising a processor, a memory, and a computer program stored on the memory, wherein the processor executes the computer program stored on the memory to implement the method for text classification based on a multi-head attention mechanism and two-dimensional convolution operation as claimed in any one of claims 1 to 7.
The invention also proposes a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the method.
The invention also provides a text classification system based on a multi-head attention mechanism and two-dimensional convolution operation, which comprises:
the text data set dividing module is used for determining a text data set and dividing the text data set into a training set and a test set;
the preprocessing module is used for preprocessing the texts in the training set;
the neural network construction module is used for constructing a neural network, and the neural network comprises an embedding layer, a multi-head attention mechanism layer, a convolution layer, a pooling layer and a full-connection layer which are connected in sequence;
the word vector acquisition module is used for inputting the text after the preprocessing operation into an embedded layer of the neural network to obtain a word vector;
the pre-training module is used for forming a word vector matrix based on the word vectors, inputting the word vector matrix to the multi-head attention mechanism layer, executing the attention function in parallel, splicing and mapping to obtain text enhancement semantic representation output after the multi-head attention mechanism layer, and pre-training text classification on the multi-head attention mechanism layer by using the text enhancement semantic representation;
the two-dimensional convolution operation module is used for fusing the text enhancement semantic representations to obtain text fusion semantic representations, performing two-dimensional convolution operation on the text fusion semantic representations and outputting convolution operation feature vectors;
the training module is used for performing text classification training on the neural network by using the convolution operation characteristic vector and adjusting the weight of the multi-attention machine mechanism layer to obtain the trained neural network;
and the test module is used for preprocessing the text in the test set and inputting the preprocessed text into the trained neural network to obtain a classification result.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the invention provides a text classification method based on a multi-head attention mechanism and two-dimensional convolution operation, which comprises the steps of firstly collecting a text data set to be classified, dividing the text data set into a training set and a testing set, carrying out preprocessing operation on the text in the training set, then constructing a neural network, inputting the text subjected to preprocessing operation into the neural network to obtain word vectors at a word granularity level, reflecting the importance degree of different Chinese characters in the text, then forming a multi-head attention mechanism layer, forming a word vector matrix based on the word vectors, inputting the word vector matrix into the multi-head attention mechanism layer to obtain a multi-dimensional text tensor, namely adopting a matching mode of fusing a pre-training word vector and the multi-head attention mechanism as semantic representation to obtain a text representation tensor, then carrying out two-dimensional convolution operation, extracting text characteristics and fusing different attention points of the multi-head attention mechanism; and introducing a full connection layer, performing text classification training on the neural network by using the convolution operation characteristic vector, adjusting the weight of the multi-attention machine mechanism layer to obtain a trained neural network, preprocessing the text concentrated in the test, and inputting the trained neural network to obtain a classification result. The method can obtain good classification effect and generalization capability on a smaller data set, is quick in fitting and less in model parameter, simplifies the model, reduces the overhead of the system, and effectively avoids the problems of large model data demand, long training time and high computer computing power requirement.
Drawings
Fig. 1 is a schematic flowchart of a text classification method based on a multi-head attention mechanism and two-dimensional convolution operation according to embodiment 1 of the present invention;
fig. 2 is a schematic flowchart of a preprocessing operation performed on texts in a training set according to embodiment 1 of the present invention;
FIG. 3 is a view showing a structure of a neural network constructed in embodiment 1 of the present invention;
fig. 4 is a structural diagram showing a single self-attention mechanism proposed in embodiment 2 of the present invention;
FIG. 5 is a view showing a structure of a multi-headed attention mechanism layer proposed in embodiment 2 of the present invention;
fig. 6 is a diagram showing a structure of a text classification system based on a multi-head attention mechanism and a two-dimensional convolution operation proposed in embodiment 5 of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
for better illustration of the present embodiment, certain parts of the drawings may be omitted, enlarged or reduced, and do not represent actual dimensions;
it will be understood by those skilled in the art that certain descriptions of well-known structures in the drawings may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;
example 1
As shown in fig. 1, the present embodiment provides a text classification method based on a multi-head attention mechanism and a two-dimensional convolution operation, the method includes the following steps:
s1, determining a text data set, and dividing the text data set into a training set and a test set;
s2, preprocessing the texts in the training set;
referring to fig. 2, the preprocessing operations performed on the text in the training set include:
s21, establishing a deactivation word list by using all punctuations and spaces of the texts in the training set;
s22, reading each character in the text in sequence, comparing each character with the character in the stop word list, and automatically skipping if the read character is the character in the stop word list;
s23, the texts with punctuations and spaces removed are input character by character, all characters in all texts are read, and one-hot vectors are established for each character.
The embodiment is programmed by adopting python language, and the used data set is a CMID data set which is a text classification data set in the medical field, wherein the data set comprises twenty-nine-hundred texts and 16 classification types. In the json format file, when a program reads a character, the character in the stop word list is automatically compared with the character in the stop word list, if the character is the stop word list, the character is automatically skipped, and finally the text is separated word by word in a Chinese character form, namely the text without the punctuation marks and the spaces is input character by character and stored in a list python data type; then, firstly, reading all the words in all the texts, establishing a one-hot vector for the words, and storing the one-hot vector in a database. The dimension of the one-hot vector is the number of all words in the database, the value of the vector is 1 in only one dimension, and the values of the other dimensions are 0. Each word has its own unique one-hot vector.
S3, constructing a neural network, wherein the neural network comprises an embedding layer, a multi-head attention mechanism layer, a convolution layer, a pooling layer and a full-connection layer which are sequentially connected, and the structural diagram of the constructed neural network is shown in FIG. 3;
s4, inputting the text after the preprocessing operation into an embedding layer of a neural network to obtain a word vector;
the embedded layer of the constructed neural network is embedded by one-hot vectors at a character-by-character level in the textThe embedded layer has three layers for semantic representation, and the number of the neurons corresponding to each layer is respectively as follows: 3076. 1024, 768, wherein the weight matrix of each layer is respectively: w is a group of 1 、W 2 、W 3 The activation functions of all layers are sigmoid, and the formula of the sigmoid function is as follows:
Figure BDA0003737608360000071
the layers are sequentially connected, after preprocessing operation is carried out on the texts in the training set, one-hot vectors corresponding to all the characters in the texts are obtained, the one-hot vectors of all the characters are input into the neural network, character vectors are obtained, and calculation carried out on each layer is as follows:
x 1 =sigmoid(W 1 x 0 +b)
x 2 =sigmoid(W 2 x 1 +b)
x=sigmoid(W 3 x 2 +b)
wherein x is 0 One-hot vector, x, representing a word 1 Denotes x 0 Intermediate value, x, after activation of the first layer 2 Represents x 1 Intermediate value after activation of the second layer, x representing x 2 And b represents a bias vector, and finally, a 768-dimensional word vector is correspondingly output.
S5, forming a word vector matrix based on the word vectors, inputting the word vector matrix into a multi-head attention machine mechanism layer, executing an attention function in parallel, splicing and mapping to obtain a text enhancement semantic representation output after the multi-head attention machine mechanism layer, and performing text classification pre-training on the multi-head attention machine mechanism layer by using the text enhancement semantic representation;
s6, fusing the text enhancement semantic representations to obtain text fusion semantic representations, performing two-dimensional convolution operation on the text fusion semantic representations, and outputting convolution operation feature vectors;
s7, performing text classification training on the neural network by using the convolution operation characteristic vector, and adjusting the weight of a multi-attention machine mechanism layer to obtain the trained neural network;
and S8, preprocessing the text in the test set, and inputting the trained neural network to obtain a classification result.
Table 1 shows that the method proposed in this embodiment is compared with other existing methods for training on the same text data set, and the classification effect that is not inferior to that of most models can be achieved in a shorter training time.
TABLE 1
Figure BDA0003737608360000081
Example 2
This embodiment is described with respect to a multi-head attention mechanism layer, in which a single self-attention mechanism is shown in fig. 4, multiple self-attention mechanisms are connected in series to form a multi-head attention mechanism layer, and the multi-head attention mechanism layer is shown in fig. 5, in this embodiment, the number of heads using three multi-head attention mechanism layers is 3, 6, and 9, respectively, the input of the self-attention mechanism is represented by query, and the dimension is d k Has a key and a dimension of d v In order to obtain the weight of value, a group of query is regarded as a matrix Q, key and value are respectively regarded as a matrix K and a matrix V, and the value is obtained based on a softmax function and an attention function:
Figure BDA0003737608360000082
the calculation method of Q, K and V comprises the following steps:
Q=XW Q
K=XW K
V=XW V
wherein, W Q 、W K 、W V A weight matrix representing the self-attention mechanism of the three inputs query, key and value.
The self-vector component word vector matrix of each word obtained in S4 is represented as X = [ X ] 1 ,x 2 ,...,x n ]Inputting the word vector matrix into a multi-head attention mechanism layer parallel executionSplicing and mapping after line attention function, and obtaining R text enhanced semantic representation X with the same size as X by setting a total of R multi-head attention mechanism layers 1 ,X 2 ,...,X R The calculation formula in the multi-head attention mechanism layer is as follows:
MultiHead(Q,K,V)=Concat(head 1 ,...,head h )W 0
wherein the content of the first and second substances,
Figure BDA0003737608360000091
i denotes the order of the multi-head attention control layers of the R multi-head attention control layers, i =1, 2.., R; to X 1 ,X 2 ,...,X R After performing the flatten operation, inputting a full connection layer, namely performing pre-training of text classification.
Enhancing text semantic representation X 1 ,X 2 ,...,X R Fusion is realized through splicing operation, and text fusion semantic representation is obtained and is characterized in that: x s =Concatenate(X 1 ,X 2 ,...,X R ),X s For a three-dimensional tensor, when performing a two-dimensional convolution operation on the text fusion semantic representation, for the convolution layer, X 1 ,X 2 ,...,X R Setting the sizes of convolution kernels and the number of the convolution kernels for R channels as input, and fusing semantics X for convolution kernel C and text s Suppose that the outputs after inputting three multi-head attention mechanisms are respectively: x 1 、X 2 、X 3 Then, it is called X = [ X ] 1 ,X 2 ,X 3 ]Is a fused semantic representation of text. X is a three-dimensional tensor, the shape of which is: (3, 30, 768). Performing a two-dimensional convolution operation, X, on the fused semantic representation of the text 1 、X 2 、X 3 The size of the convolution kernel is set as: (2, 3), the number of convolution kernels is 32, the size of each convolution kernel in the first dimension is equal to the length of each word vector, and the element calculation formula of the convolution result matrix is as follows:
Figure BDA0003737608360000092
wherein, X s (i, j, k) is input X s Y (p, q) is an element in a convolution result matrix, and C (i, j-p +1, k-q + 1) is an element in a convolution kernel.
Inputting the convolution result matrix into a pooling layer to perform maximum pooling operation, only reserving the maximum element in the convolution result matrix, and outputting corresponding to each convolution kernel:
Figure BDA0003737608360000093
finally, a 32-bit convolution operation feature vector is output, and loss of semantic information is avoided. When the convolution operation characteristic vector is used for carrying out text classification training on the neural network, in a full-connection layer, the weight of a multi-attention machine mechanism layer is adjusted through a back propagation algorithm, and a tensierflow packet is used for adding a pre-trained multi-attention machine mechanism layer into a new model class.
Example 3
The embodiment provides a computer device, which comprises a processor, a memory and a computer program stored in the memory, wherein the processor executes the computer program stored in the memory to realize the text classification method based on the multi-head attention mechanism and the two-dimensional convolution operation.
The memory may be a disk, a flash memory or any other non-volatile storage medium, and the processor is connected to the memory, and may be implemented as one or more integrated circuits, and may be specifically a microprocessor or a microcontroller, and when executing a computer program stored in the memory, the text classification method based on a multi-head attention mechanism and a two-dimensional convolution operation is implemented for a global model.
Example 4
The present embodiment proposes a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the method.
Example 5
As shown in fig. 6, the present embodiment proposes a text classification system based on a multi-head attention mechanism and a two-dimensional convolution operation, the system including:
the text data set dividing module is used for determining a text data set and dividing the text data set into a training set and a test set;
the preprocessing module is used for preprocessing the texts in the training set;
the neural network construction module is used for constructing a neural network, and the neural network comprises an embedding layer, a multi-head attention mechanism layer, a convolution layer, a pooling layer and a full-connection layer which are sequentially connected;
the word vector acquisition module is used for inputting the text after the preprocessing operation into an embedded layer of the neural network to obtain a word vector;
the pre-training module is used for forming a word vector matrix based on the word vectors, inputting the word vector matrix into the multi-head attention machine mechanism layer, executing an attention function in parallel, splicing and mapping to obtain a text enhancement semantic representation output after the multi-head attention machine mechanism layer, and pre-training text classification on the multi-head attention machine mechanism layer by using the text enhancement semantic representation;
the two-dimensional convolution operation module is used for fusing the text enhancement semantic representations to obtain text fusion semantic representations, performing two-dimensional convolution operation on the text fusion semantic representations and outputting convolution operation feature vectors;
the training module is used for performing text classification training on the neural network by using the convolution operation characteristic vector and adjusting the weight of the multi-attention machine mechanism layer to obtain the trained neural network;
and the test module is used for preprocessing the text in the test set and inputting the preprocessed text into the trained neural network to obtain a classification result.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should it be exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A method for text classification based on a multi-head attention mechanism and two-dimensional convolution operation, the method comprising the steps of:
s1, determining a text data set, and dividing the text data set into a training set and a test set;
s2, preprocessing the texts in the training set;
s3, constructing a neural network, wherein the neural network comprises an embedding layer, a multi-head attention mechanism layer, a convolution layer, a pooling layer and a full-connection layer which are sequentially connected;
s4, inputting the text after the preprocessing operation into an embedding layer of a neural network to obtain a word vector;
s5, forming a word vector matrix based on the word vectors, inputting the word vector matrix to the multi-head attention mechanism layer, executing the attention function in parallel, splicing and mapping to obtain text enhancement semantic representation output after the multi-head attention mechanism layer, and performing pre-training of text classification on the multi-head attention mechanism layer by using the text enhancement semantic representation;
s6, fusing the text enhancement semantic representations to obtain text fusion semantic representations, performing two-dimensional convolution operation on the text fusion semantic representations, and outputting convolution operation feature vectors;
s7, carrying out text classification training on the neural network by using the convolution operation feature vectors, and adjusting the weight of the multi-attention machine control layer to obtain a trained neural network;
and S8, preprocessing the text in the test set, and inputting the trained neural network to obtain a classification result.
2. The method for classifying texts based on the multi-head attention mechanism and the two-dimensional convolution operation according to claim 1, wherein in step S2, the preprocessing operation performed on the texts in the training set comprises:
s21, establishing a deactivation word list by using all punctuations and spaces of the texts in the training set;
s22, reading each character in the text in sequence, comparing each character with the character in the stop word list, and automatically skipping if the read character is the character in the stop word list;
s23, the texts with punctuations and spaces removed are input character by character, all characters in all texts are read, and one-hot vectors are established for each character.
3. The method for classifying texts based on the multi-head attention mechanism and the two-dimensional convolution operation according to claim 2, wherein in step S3, the built-in layer of the neural network uses one-hot vector embedding at a character-by-character level in the text as semantic representation, the built-in layer has three layers, and the weight matrix of each layer is respectively: w is a group of 1 、W 2 、W 3 The activation functions of all layers are sigmoid, all layers are connected in sequence, after the text in the training set is preprocessed, one-hot vectors corresponding to all characters in all the texts are obtained, the one-hot vectors of all the characters are input into the neural network, the character vectors are obtained, and the calculation performed on each layer is respectively as follows:
x 1 =sigmoid(W 1 x 0 +b)
x 2 =sigmoid(W 2 x 1 +b)
x=sigmoid(W 3 x 2 +b)
wherein x is 0 One-hot vector, x, representing a word 1 Represents x 0 Intermediate value, x, after activation of the first layer 2 Denotes x 1 Intermediate value after activation of the second layer, x representing x 2 And b represents a bias vector.
4. The method of claim 3 in which multiple self-attention mechanisms are connected in series to form a multi-head attention mechanism layer, the input of the self-attention mechanism being query with dimension d k Has a key and a dimension of d v In order to obtain the weight of the value, a group of query is regarded as a matrix Q, and the key and the value are respectively regarded as a matrix K and a matrix V, and based on the softmax function and the attention attribute function, the following are obtained:
Figure FDA0003737608350000021
the calculation method of Q, K and V comprises the following steps:
Q=XW Q
K=XW K
V=XW V
wherein, W Q 、W K 、W V A weight matrix representing the self-attention mechanism of the three inputs query, key and value.
5. The method of claim 4, wherein the self-vector of each word obtained in S4 is represented as X = [ X ] to form a word vector matrix by X = [ X = 1 ,x 2 ,…,x n ]Inputting the word vector matrix into a multi-head attention mechanism layer, executing an attention function in parallel, splicing and mapping, and setting a total of R multi-head attention mechanism layers to obtain an R text enhanced semantic representation X with the same size as X 1 ,X 2 ,…,X R The calculation formula in the multi-head attention mechanism layer is as follows:
MulriHead(Q,K,V)=Concat(head 1 ,…,head h )W 0
wherein the head i =Attention(QW i Q ,KW i K ,VW i V ) I denotes the order of the multi-head attention mechanism layers of the R multi-head attention mechanism layers, i =1,2, \ 8230; to X 1 ,X 2 ,…,X R After performing the flatten operation, inputting a full connection layer, namely performing pre-training of text classification.
6. The multi-headed attention-based mechanism and two-dimensional convolution of claim 5Method for classifying an operating text, characterized in that in step S6 the text is enhanced with a semantic representation X 1 ,X 2 ,…,X R Fusion is realized through splicing operation, and text fusion semantic representation is obtained and is characterized in that: x s =Concatenate(X 1 ,X 2 ,…,X R ),X s For a three-dimensional tensor, when performing a two-dimensional convolution operation on the text fusion semantic representation, for the convolution layer, X s Setting the size of a convolution kernel and the number of the convolution kernels as input, wherein the size of the convolution kernel in the first dimension is equal to the length of a word vector; for sizes of [768,vec2,vec3]Convolution kernel C and text fusion semantic X of s The element calculation formula of the convolution result matrix is as follows:
Figure FDA0003737608350000031
wherein, X s (i, j, k) is an input X s Y (p, q) is an element in a convolution result matrix, and C (i, j-p +1, k-q + 1) is an element in a convolution kernel.
Inputting the convolution result matrix into a pooling layer to perform maximum pooling operation, only reserving the maximum element in the convolution result matrix, and outputting corresponding to each convolution kernel:
Figure FDA0003737608350000032
finally outputting the convolution operation characteristic vector.
7. The text classification method based on the multi-head attention mechanism and the two-dimensional convolution operation as claimed in claim 6, wherein when performing text classification training on the neural network by using the feature vectors of the convolution operation, in the fully connected layer, the weights of the multi-attention mechanism layer are adjusted by a back propagation algorithm, and the pre-trained multi-head attention mechanism layer is added to a new model class by using a tensoflow packet.
8. A computer device comprising a processor, a memory, and a computer program stored on the memory, wherein the processor executes the computer program stored on the memory to implement the method for text classification based on a multi-headed attention mechanism and two-dimensional convolution operations of any one of claims 1-7.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon computer program instructions which, when executed by a processor, implement the steps of the method of any one of claims 1 to 7.
10. A text classification system based on a multi-head attention mechanism and two-dimensional convolution operations, the system comprising:
the text data set dividing module is used for determining a text data set and dividing the text data set into a training set and a test set;
the preprocessing module is used for preprocessing the texts in the training set;
the neural network construction module is used for constructing a neural network, and the neural network comprises an embedding layer, a multi-head attention mechanism layer, a convolution layer, a pooling layer and a full-connection layer which are connected in sequence;
the word vector acquisition module is used for inputting the text after the preprocessing operation into an embedded layer of the neural network to obtain a word vector;
the pre-training module is used for forming a word vector matrix based on the word vectors, inputting the word vector matrix into the multi-head attention machine mechanism layer, executing an attention function in parallel, splicing and mapping to obtain a text enhancement semantic representation output after the multi-head attention machine mechanism layer, and pre-training text classification on the multi-head attention machine mechanism layer by using the text enhancement semantic representation;
the two-dimensional convolution operation module is used for fusing the text enhancement semantic representations to obtain text fusion semantic representations, performing two-dimensional convolution operation on the text fusion semantic representations and outputting convolution operation feature vectors;
the training module is used for performing text classification training on the neural network by using the convolution operation characteristic vector and adjusting the weight of the multi-attention machine mechanism layer to obtain the trained neural network;
and the test module is used for preprocessing the texts in the test set and inputting the texts into the trained neural network to obtain a classification result.
CN202210800916.XA 2022-07-08 2022-07-08 Text classification method based on multi-head attention mechanism and two-dimensional convolution operation Pending CN115168579A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210800916.XA CN115168579A (en) 2022-07-08 2022-07-08 Text classification method based on multi-head attention mechanism and two-dimensional convolution operation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210800916.XA CN115168579A (en) 2022-07-08 2022-07-08 Text classification method based on multi-head attention mechanism and two-dimensional convolution operation

Publications (1)

Publication Number Publication Date
CN115168579A true CN115168579A (en) 2022-10-11

Family

ID=83492736

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210800916.XA Pending CN115168579A (en) 2022-07-08 2022-07-08 Text classification method based on multi-head attention mechanism and two-dimensional convolution operation

Country Status (1)

Country Link
CN (1) CN115168579A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116562284A (en) * 2023-04-14 2023-08-08 湖北经济学院 Government affair text automatic allocation model training method and device
CN116660992A (en) * 2023-06-05 2023-08-29 北京石油化工学院 Seismic signal processing method based on multi-feature fusion

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116562284A (en) * 2023-04-14 2023-08-08 湖北经济学院 Government affair text automatic allocation model training method and device
CN116562284B (en) * 2023-04-14 2024-01-26 湖北经济学院 Government affair text automatic allocation model training method and device
CN116660992A (en) * 2023-06-05 2023-08-29 北京石油化工学院 Seismic signal processing method based on multi-feature fusion
CN116660992B (en) * 2023-06-05 2024-03-05 北京石油化工学院 Seismic signal processing method based on multi-feature fusion

Similar Documents

Publication Publication Date Title
CN111753060B (en) Information retrieval method, apparatus, device and computer readable storage medium
Diallo et al. Deep embedding clustering based on contractive autoencoder
CN110147457B (en) Image-text matching method, device, storage medium and equipment
CN112711953B (en) Text multi-label classification method and system based on attention mechanism and GCN
CN115168579A (en) Text classification method based on multi-head attention mechanism and two-dimensional convolution operation
CN109783666A (en) A kind of image scene map generation method based on iteration fining
Lin et al. Deep structured scene parsing by learning with image descriptions
CN113535953B (en) Meta learning-based few-sample classification method
CN114547298A (en) Biomedical relation extraction method, device and medium based on combination of multi-head attention and graph convolution network and R-Drop mechanism
CN114528898A (en) Scene graph modification based on natural language commands
Puscasiu et al. Automated image captioning
Burges Towards the machine comprehension of text: An essay
Wei et al. Sequential transformer via an outside-in attention for image captioning
CN112036189A (en) Method and system for recognizing gold semantic
CN116257632A (en) Unknown target position detection method and device based on graph comparison learning
CN117216617A (en) Text classification model training method, device, computer equipment and storage medium
Tiwari et al. Learning semantic image attributes using image recognition and knowledge graph embeddings
Kumar et al. Self-attention enhanced recurrent neural networks for sentence classification
CN113821610A (en) Information matching method, device, equipment and storage medium
Wang et al. A coarse to fine question answering system based on reinforcement learning
CN111259650A (en) Text automatic generation method based on class mark sequence generation type countermeasure model
Garg et al. Image retrieval using latent feature learning by deep architecture
Habib et al. GAC-Text-to-Image Synthesis with Generative Models using Attention Mechanisms with Contrastive Learning
CN117556275B (en) Correlation model data processing method, device, computer equipment and storage medium
Borkar et al. An application of generative adversarial network in natural language generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination