CN115168579A

CN115168579A - Text classification method based on multi-head attention mechanism and two-dimensional convolution operation

Info

Publication number: CN115168579A
Application number: CN202210800916.XA
Authority: CN
Inventors: 孙源佑; 邓木清; 蔡洁标; 张贵有; 江嘉宁
Original assignee: Smart Traditional Chinese Medicine Technology Guangdong Co ltd
Current assignee: Smart Traditional Chinese Medicine Technology Guangdong Co ltd
Priority date: 2022-07-08
Filing date: 2022-07-08
Publication date: 2022-10-11

Abstract

The invention provides a text classification method based on a multi-head attention mechanism and two-dimensional convolution operation, which relates to the technical field of natural language processing.

Description

Text classification method based on multi-head attention mechanism and two-dimensional convolution operation

Technical Field

The invention relates to the technical field of natural language processing, in particular to a text classification method based on a multi-head attention mechanism and two-dimensional convolution operation.

Background

Natural language processing is an important direction in the fields of computer science and artificial intelligence, researches various theories and methods capable of realizing effective communication between people and computers by using natural language, is a science integrating linguistics, computer science and mathematics, and has extremely wide application such as intelligent voice question-answering system, fraud short message identification, network comment emotion identification and the like.

In the medical field, clinical medical information is stored in information systems in a large amount in unstructured (or semi-structured) text form, and natural language processing is a key technology for extracting useful information from medical text. Through natural language processing, the unstructured medical texts are converted into structured data containing important medical information, and scientific research personnel can find useful medical information from the structured data, so that the operation quality of the medical system is improved, and the operation cost is reduced. In the era of rapid development of internet technology, the medical field faces no more information acquisition problems, but how to quickly and accurately acquire valuable information from massive information resources, and the generation modes of medical text information are various and rich, and the huge data volume makes manual distinction and arrangement difficult, so that how to effectively classify texts becomes very important.

At present, the commonly used text classification methods include a support vector machine, a convolutional neural network, a cyclic neural network, a BERT and the like, the BERT and the RNN can realize excellent classification effects, but the models are large, the training is difficult, and the application on a small host is difficult; the TextGCN achieves a good classification effect on a small model through a graph convolution technology, but for nodes which are not seen, the TextGCN cannot classify, a convolution neural network applied to a text is only one-dimensional, and if the text dimension of an input text is high, semantic information is lost by using the one-dimensional convolution neural network. The prior art discloses a self-adaptive text classification method and a self-adaptive text classification device based on BERT, firstly preprocessing corpus sample data to be classified, constructing a preset network model, then inputting the preprocessed sample data into the preset network model, performing supervised training by using a preset loss function to obtain a classification model, setting an output threshold of the classification model, obtaining the set classification model for text classification, setting the output threshold on the classification model to control the advanced output of a classification result, and shortening the model inference time without losing precision.

Disclosure of Invention

In order to solve the problems that a traditional text classification model adopted by the existing medical text classification has long training time and large calculated amount, the invention provides a text classification method based on a multi-head attention mechanism and two-dimensional convolution operation, which has small calculated amount and high training speed and also gives consideration to a good text classification effect.

In order to achieve the technical effects, the technical scheme of the invention is as follows:

a method of text classification based on a multi-head attention mechanism and a two-dimensional convolution operation, the method comprising the steps of:

s1, determining a text data set, and dividing the text data set into a training set and a test set;

s2, preprocessing the texts in the training set;

s3, constructing a neural network, wherein the neural network comprises an embedding layer, a multi-head attention mechanism layer, a convolution layer, a pooling layer and a full-connection layer which are sequentially connected;

s4, inputting the text after the preprocessing operation into an embedding layer of a neural network to obtain a word vector;

s5, forming a word vector matrix based on the word vectors, inputting the word vector matrix to the multi-head attention mechanism layer, executing the attention function in parallel, splicing and mapping to obtain text enhancement semantic representation output after the multi-head attention mechanism layer, and performing pre-training of text classification on the multi-head attention mechanism layer by using the text enhancement semantic representation;

s6, fusing the text enhancement semantic representations to obtain text fusion semantic representations, performing two-dimensional convolution operation on the text fusion semantic representations, and outputting convolution operation feature vectors;

s7, performing text classification training on the neural network by using the convolution operation characteristic vector, and adjusting the weight of a multi-attention machine mechanism layer to obtain the trained neural network;

and S8, preprocessing the text in the test set, and inputting the trained neural network to obtain a classification result.

Preferably, in step S2, the preprocessing operation performed on the texts in the training set includes:

s21, establishing a deactivation word list by using all punctuations and spaces of the texts in the training set;

s22, reading each character in the text in sequence, comparing each character with the character in the stop word list, and automatically skipping if the read character is the character in the stop word list;

s23, the texts with punctuations and spaces removed are input character by character, all characters in all texts are read, and one-hot vectors are established for each character.

Preferably, in step S3, the built embedding layer of the neural network uses one-hot vector embedding at a character-by-character level in the text as semantic representation, the embedding layer has three layers, and the weight matrix of each layer is: w ₁ 、W ₂ 、W ₃ The activation functions of all layers are sigmoid, the layers are connected in sequence, and after preprocessing operation is carried out on the texts in the training set, each text in all the texts is obtainedInputting the one-hot vector of each word into a neural network to obtain a word vector, wherein the calculation performed at each layer respectively comprises the following steps:

x ₁ ＝sigmoid(W ₁ x ₀ +b)

x ₂ ＝sigmoid(W ₂ x ₁ +b)

x＝sigmoid(W ₃ x ₂ +b)

wherein x is ₀ One-hot vector, x, representing a word ₁ Denotes x ₀ Intermediate value, x, after activation of the first layer ₂ Denotes x ₁ Intermediate value after activation of the second layer, x representing x ₂ And b represents a bias vector.

Preferably, a plurality of self-attention mechanisms are connected in series to form a multi-head attention mechanism layer, wherein the input of the self-attention mechanism is query and the dimension is d _k Has a key and a dimension of d _v In order to obtain the weight of value, a group of query is regarded as a matrix Q, key and value are respectively regarded as a matrix K and a matrix V, and the value is obtained based on a softmax function and an attention function:

the calculation method of Q, K and V comprises the following steps:

Q＝XW ^Q

K＝XW ^K

V＝XW ^V

wherein, W ^Q 、W ^K 、W ^V A weight matrix representing the self-attention mechanism of the three inputs query, key and value.

Preferably, the self-vector component word vector matrix for each word obtained at S4 is represented as X = [ X ] ₁ ，x ₂ ，...，x _n ]Inputting the word vector matrix into a multi-head attention mechanism layer, executing an attention function in parallel, splicing and mapping, and obtaining R text enhancement semantic representation X with the same size as X by setting R multi-head attention mechanism layers ₁ ，X ₂ ，...，X _R The calculation formula in the multi-head attention mechanism layer is as follows:

MultiHead(Q，K，V)＝Concat(head ₁ ，...，head _h )W ⁰

wherein the content of the first and second substances,

i denotes the order of the multi-head attention control layers of the R multi-head attention control layers, i =1, 2.., R; to X ₁ ，X ₂ ，...，X _R After performing the flatten operation, inputting a full connection layer, namely performing pre-training of text classification.

Preferably, in step S6, the text is enhanced with a semantic representation X ₁ ，X ₂ ，...，X _R Fusion is realized through splicing operation, and text fusion semantic representation is obtained and is characterized in that: x _s ＝Concatenate(X ₁ ，X ₂ ，...，X _R )，X _s For a three-dimensional tensor, when performing a two-dimensional convolution operation on a text-fused semantic representation, for the convolution layer, X ₁ ，X ₂ ，...，X _R Setting the sizes of convolution kernels and the number of the convolution kernels as input R channels, wherein the size of the convolution kernels on a first dimension is equal to the length of a word vector; fusing semantics X for convolution kernel C and text _s The element calculation formula of the convolution result matrix is as follows:

wherein X _s (i, j, k) is input X _s R (p, q) is an element in the convolution result matrix, and C (i, j-p +1, k-q + 1) is an element in the convolution kernel.

Inputting the convolution result matrix into a pooling layer to perform maximum pooling operation, only reserving the maximum element in the convolution result matrix, and outputting corresponding to each convolution kernel:

finally, convolution operation feature vectors are output, and loss of semantic information is avoided.

Preferably, when the convolutional operation feature vectors are used for text classification training of the neural network, in the full-connection layer, the weight of the multi-attention machine mechanism layer is adjusted through a back propagation algorithm, and a tensierflow packet is used for adding the pre-trained multi-attention machine mechanism layer into a new model class.

The present invention also provides a computer apparatus comprising a processor, a memory, and a computer program stored on the memory, wherein the processor executes the computer program stored on the memory to implement the method for text classification based on a multi-head attention mechanism and two-dimensional convolution operation as claimed in any one of claims 1 to 7.

The invention also proposes a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the method.

The invention also provides a text classification system based on a multi-head attention mechanism and two-dimensional convolution operation, which comprises:

the text data set dividing module is used for determining a text data set and dividing the text data set into a training set and a test set;

the preprocessing module is used for preprocessing the texts in the training set;

the neural network construction module is used for constructing a neural network, and the neural network comprises an embedding layer, a multi-head attention mechanism layer, a convolution layer, a pooling layer and a full-connection layer which are connected in sequence;

the word vector acquisition module is used for inputting the text after the preprocessing operation into an embedded layer of the neural network to obtain a word vector;

the pre-training module is used for forming a word vector matrix based on the word vectors, inputting the word vector matrix to the multi-head attention mechanism layer, executing the attention function in parallel, splicing and mapping to obtain text enhancement semantic representation output after the multi-head attention mechanism layer, and pre-training text classification on the multi-head attention mechanism layer by using the text enhancement semantic representation;

the two-dimensional convolution operation module is used for fusing the text enhancement semantic representations to obtain text fusion semantic representations, performing two-dimensional convolution operation on the text fusion semantic representations and outputting convolution operation feature vectors;

the training module is used for performing text classification training on the neural network by using the convolution operation characteristic vector and adjusting the weight of the multi-attention machine mechanism layer to obtain the trained neural network;

and the test module is used for preprocessing the text in the test set and inputting the preprocessed text into the trained neural network to obtain a classification result.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the invention provides a text classification method based on a multi-head attention mechanism and two-dimensional convolution operation, which comprises the steps of firstly collecting a text data set to be classified, dividing the text data set into a training set and a testing set, carrying out preprocessing operation on the text in the training set, then constructing a neural network, inputting the text subjected to preprocessing operation into the neural network to obtain word vectors at a word granularity level, reflecting the importance degree of different Chinese characters in the text, then forming a multi-head attention mechanism layer, forming a word vector matrix based on the word vectors, inputting the word vector matrix into the multi-head attention mechanism layer to obtain a multi-dimensional text tensor, namely adopting a matching mode of fusing a pre-training word vector and the multi-head attention mechanism as semantic representation to obtain a text representation tensor, then carrying out two-dimensional convolution operation, extracting text characteristics and fusing different attention points of the multi-head attention mechanism; and introducing a full connection layer, performing text classification training on the neural network by using the convolution operation characteristic vector, adjusting the weight of the multi-attention machine mechanism layer to obtain a trained neural network, preprocessing the text concentrated in the test, and inputting the trained neural network to obtain a classification result. The method can obtain good classification effect and generalization capability on a smaller data set, is quick in fitting and less in model parameter, simplifies the model, reduces the overhead of the system, and effectively avoids the problems of large model data demand, long training time and high computer computing power requirement.

Drawings

Fig. 1 is a schematic flowchart of a text classification method based on a multi-head attention mechanism and two-dimensional convolution operation according to embodiment 1 of the present invention;

fig. 2 is a schematic flowchart of a preprocessing operation performed on texts in a training set according to embodiment 1 of the present invention;

FIG. 3 is a view showing a structure of a neural network constructed in embodiment 1 of the present invention;

fig. 4 is a structural diagram showing a single self-attention mechanism proposed in embodiment 2 of the present invention;

FIG. 5 is a view showing a structure of a multi-headed attention mechanism layer proposed in embodiment 2 of the present invention;

fig. 6 is a diagram showing a structure of a text classification system based on a multi-head attention mechanism and a two-dimensional convolution operation proposed in embodiment 5 of the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

for better illustration of the present embodiment, certain parts of the drawings may be omitted, enlarged or reduced, and do not represent actual dimensions;

it will be understood by those skilled in the art that certain descriptions of well-known structures in the drawings may be omitted.

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

example 1

As shown in fig. 1, the present embodiment provides a text classification method based on a multi-head attention mechanism and a two-dimensional convolution operation, the method includes the following steps:

s2, preprocessing the texts in the training set;

referring to fig. 2, the preprocessing operations performed on the text in the training set include:

The embodiment is programmed by adopting python language, and the used data set is a CMID data set which is a text classification data set in the medical field, wherein the data set comprises twenty-nine-hundred texts and 16 classification types. In the json format file, when a program reads a character, the character in the stop word list is automatically compared with the character in the stop word list, if the character is the stop word list, the character is automatically skipped, and finally the text is separated word by word in a Chinese character form, namely the text without the punctuation marks and the spaces is input character by character and stored in a list python data type; then, firstly, reading all the words in all the texts, establishing a one-hot vector for the words, and storing the one-hot vector in a database. The dimension of the one-hot vector is the number of all words in the database, the value of the vector is 1 in only one dimension, and the values of the other dimensions are 0. Each word has its own unique one-hot vector.

S3, constructing a neural network, wherein the neural network comprises an embedding layer, a multi-head attention mechanism layer, a convolution layer, a pooling layer and a full-connection layer which are sequentially connected, and the structural diagram of the constructed neural network is shown in FIG. 3;

the embedded layer of the constructed neural network is embedded by one-hot vectors at a character-by-character level in the textThe embedded layer has three layers for semantic representation, and the number of the neurons corresponding to each layer is respectively as follows: 3076. 1024, 768, wherein the weight matrix of each layer is respectively: w is a group of ₁ 、W ₂ 、W ₃ The activation functions of all layers are sigmoid, and the formula of the sigmoid function is as follows:

the layers are sequentially connected, after preprocessing operation is carried out on the texts in the training set, one-hot vectors corresponding to all the characters in the texts are obtained, the one-hot vectors of all the characters are input into the neural network, character vectors are obtained, and calculation carried out on each layer is as follows:

x ₁ ＝sigmoid(W ₁ x ₀ +b)

x ₂ ＝sigmoid(W ₂ x ₁ +b)

x＝sigmoid(W ₃ x ₂ +b)

wherein x is ₀ One-hot vector, x, representing a word ₁ Denotes x ₀ Intermediate value, x, after activation of the first layer ₂ Represents x ₁ Intermediate value after activation of the second layer, x representing x ₂ And b represents a bias vector, and finally, a 768-dimensional word vector is correspondingly output.

S5, forming a word vector matrix based on the word vectors, inputting the word vector matrix into a multi-head attention machine mechanism layer, executing an attention function in parallel, splicing and mapping to obtain a text enhancement semantic representation output after the multi-head attention machine mechanism layer, and performing text classification pre-training on the multi-head attention machine mechanism layer by using the text enhancement semantic representation;

Table 1 shows that the method proposed in this embodiment is compared with other existing methods for training on the same text data set, and the classification effect that is not inferior to that of most models can be achieved in a shorter training time.

TABLE 1

Example 2

This embodiment is described with respect to a multi-head attention mechanism layer, in which a single self-attention mechanism is shown in fig. 4, multiple self-attention mechanisms are connected in series to form a multi-head attention mechanism layer, and the multi-head attention mechanism layer is shown in fig. 5, in this embodiment, the number of heads using three multi-head attention mechanism layers is 3, 6, and 9, respectively, the input of the self-attention mechanism is represented by query, and the dimension is d _k Has a key and a dimension of d _v In order to obtain the weight of value, a group of query is regarded as a matrix Q, key and value are respectively regarded as a matrix K and a matrix V, and the value is obtained based on a softmax function and an attention function:

the calculation method of Q, K and V comprises the following steps:

Q＝XW ^Q

K＝XW ^K

V＝XW ^V

The self-vector component word vector matrix of each word obtained in S4 is represented as X = [ X ] ₁ ，x ₂ ，...，x _n ]Inputting the word vector matrix into a multi-head attention mechanism layer parallel executionSplicing and mapping after line attention function, and obtaining R text enhanced semantic representation X with the same size as X by setting a total of R multi-head attention mechanism layers ₁ ，X ₂ ，...，X _R The calculation formula in the multi-head attention mechanism layer is as follows:

MultiHead(Q，K，V)＝Concat(head ₁ ，...，head _h )W ⁰

wherein the content of the first and second substances,

Enhancing text semantic representation X ₁ ，X ₂ ，...，X _R Fusion is realized through splicing operation, and text fusion semantic representation is obtained and is characterized in that: x _s ＝Concatenate(X ₁ ，X ₂ ，...，X _R )，X _s For a three-dimensional tensor, when performing a two-dimensional convolution operation on the text fusion semantic representation, for the convolution layer, X ₁ ，X ₂ ，...，X _R Setting the sizes of convolution kernels and the number of the convolution kernels for R channels as input, and fusing semantics X for convolution kernel C and text _s Suppose that the outputs after inputting three multi-head attention mechanisms are respectively: x ₁ 、X ₂ 、X ₃ Then, it is called X = [ X ] ₁ ，X ₂ ，X ₃ ]Is a fused semantic representation of text. X is a three-dimensional tensor, the shape of which is: (3, 30, 768). Performing a two-dimensional convolution operation, X, on the fused semantic representation of the text ₁ 、X ₂ 、X ₃ The size of the convolution kernel is set as: (2, 3), the number of convolution kernels is 32, the size of each convolution kernel in the first dimension is equal to the length of each word vector, and the element calculation formula of the convolution result matrix is as follows:

wherein, X _s (i, j, k) is input X _s Y (p, q) is an element in a convolution result matrix, and C (i, j-p +1, k-q + 1) is an element in a convolution kernel.

finally, a 32-bit convolution operation feature vector is output, and loss of semantic information is avoided. When the convolution operation characteristic vector is used for carrying out text classification training on the neural network, in a full-connection layer, the weight of a multi-attention machine mechanism layer is adjusted through a back propagation algorithm, and a tensierflow packet is used for adding a pre-trained multi-attention machine mechanism layer into a new model class.

Example 3

The embodiment provides a computer device, which comprises a processor, a memory and a computer program stored in the memory, wherein the processor executes the computer program stored in the memory to realize the text classification method based on the multi-head attention mechanism and the two-dimensional convolution operation.

The memory may be a disk, a flash memory or any other non-volatile storage medium, and the processor is connected to the memory, and may be implemented as one or more integrated circuits, and may be specifically a microprocessor or a microcontroller, and when executing a computer program stored in the memory, the text classification method based on a multi-head attention mechanism and a two-dimensional convolution operation is implemented for a global model.

Example 4

The present embodiment proposes a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the method.

Example 5

As shown in fig. 6, the present embodiment proposes a text classification system based on a multi-head attention mechanism and a two-dimensional convolution operation, the system including:

the neural network construction module is used for constructing a neural network, and the neural network comprises an embedding layer, a multi-head attention mechanism layer, a convolution layer, a pooling layer and a full-connection layer which are sequentially connected;

the pre-training module is used for forming a word vector matrix based on the word vectors, inputting the word vector matrix into the multi-head attention machine mechanism layer, executing an attention function in parallel, splicing and mapping to obtain a text enhancement semantic representation output after the multi-head attention machine mechanism layer, and pre-training text classification on the multi-head attention machine mechanism layer by using the text enhancement semantic representation;

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should it be exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A method for text classification based on a multi-head attention mechanism and two-dimensional convolution operation, the method comprising the steps of:

s2, preprocessing the texts in the training set;

s7, carrying out text classification training on the neural network by using the convolution operation feature vectors, and adjusting the weight of the multi-attention machine control layer to obtain a trained neural network;

2. The method for classifying texts based on the multi-head attention mechanism and the two-dimensional convolution operation according to claim 1, wherein in step S2, the preprocessing operation performed on the texts in the training set comprises:

3. The method for classifying texts based on the multi-head attention mechanism and the two-dimensional convolution operation according to claim 2, wherein in step S3, the built-in layer of the neural network uses one-hot vector embedding at a character-by-character level in the text as semantic representation, the built-in layer has three layers, and the weight matrix of each layer is respectively: w is a group of ₁ 、W ₂ 、W ₃ The activation functions of all layers are sigmoid, all layers are connected in sequence, after the text in the training set is preprocessed, one-hot vectors corresponding to all characters in all the texts are obtained, the one-hot vectors of all the characters are input into the neural network, the character vectors are obtained, and the calculation performed on each layer is respectively as follows:

x ₁ ＝sigmoid(W ₁ x ₀ +b)

x ₂ ＝sigmoid(W ₂ x ₁ +b)

x＝sigmoid(W ₃ x ₂ +b)

wherein x is ₀ One-hot vector, x, representing a word ₁ Represents x ₀ Intermediate value, x, after activation of the first layer ₂ Denotes x ₁ Intermediate value after activation of the second layer, x representing x ₂ And b represents a bias vector.

4. The method of claim 3 in which multiple self-attention mechanisms are connected in series to form a multi-head attention mechanism layer, the input of the self-attention mechanism being query with dimension d _k Has a key and a dimension of d _v In order to obtain the weight of the value, a group of query is regarded as a matrix Q, and the key and the value are respectively regarded as a matrix K and a matrix V, and based on the softmax function and the attention attribute function, the following are obtained:

the calculation method of Q, K and V comprises the following steps:

Q＝XW ^Q

K＝XW ^K

V＝XW ^V

5. The method of claim 4, wherein the self-vector of each word obtained in S4 is represented as X = [ X ] to form a word vector matrix by X = [ X = ₁ ,x ₂ ,…,x _n ]Inputting the word vector matrix into a multi-head attention mechanism layer, executing an attention function in parallel, splicing and mapping, and setting a total of R multi-head attention mechanism layers to obtain an R text enhanced semantic representation X with the same size as X ₁ ，X ₂ ，…，X _R The calculation formula in the multi-head attention mechanism layer is as follows:

MulriHead(Q,K,V)＝Concat(head ₁ ,…,head _h )W ⁰

wherein the head _i ＝Attention(QW _i ^Q ,KW _i ^K ,VW _i ^V ) I denotes the order of the multi-head attention mechanism layers of the R multi-head attention mechanism layers, i =1,2, \ 8230; to X ₁ ，X ₂ ，…，X _R After performing the flatten operation, inputting a full connection layer, namely performing pre-training of text classification.

6. The multi-headed attention-based mechanism and two-dimensional convolution of claim 5Method for classifying an operating text, characterized in that in step S6 the text is enhanced with a semantic representation X ₁ ，X ₂ ，…，X _R Fusion is realized through splicing operation, and text fusion semantic representation is obtained and is characterized in that: x _s ＝Concatenate(X ₁ ，X ₂ ，…，X _R )，X _s For a three-dimensional tensor, when performing a two-dimensional convolution operation on the text fusion semantic representation, for the convolution layer, X _s Setting the size of a convolution kernel and the number of the convolution kernels as input, wherein the size of the convolution kernel in the first dimension is equal to the length of a word vector; for sizes of [768,vec2,vec3]Convolution kernel C and text fusion semantic X of _s The element calculation formula of the convolution result matrix is as follows:

wherein, X _s (i, j, k) is an input X _s Y (p, q) is an element in a convolution result matrix, and C (i, j-p +1, k-q + 1) is an element in a convolution kernel.

finally outputting the convolution operation characteristic vector.

7. The text classification method based on the multi-head attention mechanism and the two-dimensional convolution operation as claimed in claim 6, wherein when performing text classification training on the neural network by using the feature vectors of the convolution operation, in the fully connected layer, the weights of the multi-attention mechanism layer are adjusted by a back propagation algorithm, and the pre-trained multi-head attention mechanism layer is added to a new model class by using a tensoflow packet.

8. A computer device comprising a processor, a memory, and a computer program stored on the memory, wherein the processor executes the computer program stored on the memory to implement the method for text classification based on a multi-headed attention mechanism and two-dimensional convolution operations of any one of claims 1-7.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon computer program instructions which, when executed by a processor, implement the steps of the method of any one of claims 1 to 7.

10. A text classification system based on a multi-head attention mechanism and two-dimensional convolution operations, the system comprising:

and the test module is used for preprocessing the texts in the test set and inputting the texts into the trained neural network to obtain a classification result.