CN116414974A

CN116414974A - Short text classification method and device

Info

Publication number: CN116414974A
Application number: CN202111623468.2A
Authority: CN
Inventors: 何萌; 胡珉
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2023-07-11

Abstract

The invention provides a short text classification method and device, and belongs to the field of artificial intelligence. A short text classification method comprising: determining a model combination according to the service stage and the number of samples of the training data set, wherein the model combination comprises at least one neural network model; training the model in the model combination; and inputting the text to be processed into the trained model combination, and outputting a classification result. The technical scheme of the invention can accurately classify the short texts.

Description

Short text classification method and device

Technical Field

The invention relates to the field of short text classification, in particular to a short text classification method and device.

Background

With the rapid development of information technology, telecommunication service users such as mobile, unicom and the like and various network users such as microblogs, weChat, braille, knowledge and the like can generate massive short text contents every day, and it becomes extremely important to use classification technology to perform public opinion analysis, emotion analysis and evaluation feedback analysis on the short text contents.

Algorithms adopted by the current short text classification comprise traditional machine learning (Nave Bayes, precision Tree, SVM and the like) and deep learning (CNN, RNN, bert and the like), the algorithms are beneficial and disadvantageous, and indexes such as high accurate recall rate (precision rate and recall rate), low time consumption and low variance cannot be simultaneously considered by using a certain single algorithm.

Disclosure of Invention

The invention aims to provide a short text classification method and device, which can accurately classify short texts.

In order to solve the technical problems, the embodiment of the invention provides the following technical scheme:

in one aspect, a method for classifying short text is provided, comprising:

determining a model combination according to the service stage and the number of samples of the training data set, wherein the model combination comprises at least one neural network model;

training the model in the model combination;

and inputting the text to be processed into the trained model combination, and outputting a classification result.

In some embodiments, determining a model combination based on the business stage in which it is located and the number of samples of the training dataset, training the models in the model combination comprises:

in the cold start stage, training only the Bert matching model when the number of samples of the training data set is less than a first threshold;

in the middle and later stages of the service, training the Bert matching model, the convolutional neural network CNN model and the cyclic neural network RNN model when the number of samples of the training data set is greater than or equal to a first threshold value and less than a second threshold value; and training the Bert classification model, the CNN model and the RNN model when the number of samples of the training data set is greater than or equal to a second threshold.

In some embodiments, the first threshold is 3000-7000 and the second threshold is 30000-70000.

In some embodiments, the inputting the text to be processed into the trained model combination, and outputting the classification result includes:

if the number of samples of the currently used training data set is smaller than the first threshold value, searching a text which is most similar to the text to be processed in a training corpus by using a Bert matching model, and taking the category corresponding to the text as the classification result;

if the number of samples of the currently used training data set is greater than or equal to the first threshold value and smaller than the second threshold value, inputting the text to be processed into a CNN model to obtain a first prediction result, inputting the text to be processed into the CNN model to obtain a second prediction result, and if the first prediction result is the same as the second prediction result, taking the first prediction result or the second prediction result as the classification result; if the first predicted result is different from the second predicted result, searching a first text and a second text which are the most similar to the text to be processed in a training corpus of the categories represented by the first predicted result and the second predicted result by using a Bert matching model, and a first score corresponding to the first text and a second score corresponding to the second text respectively, wherein if the first score is greater than or equal to the second score, the category to which the first text belongs is used as the classification result, and if the first score is smaller than the second score, the category to which the second text belongs is used as the classification result;

if the number of samples of the currently used training data set is greater than or equal to the second threshold, inputting the text to be processed into a CNN model to obtain a first prediction result, inputting the text to be processed into the RNN model to obtain a second prediction result, and if the first prediction result is the same as the second prediction result, taking the first prediction result or the second prediction result as the classification result; if the first predicted result is different from the second predicted result, a third predicted result is obtained through a Bert classification model, and the third predicted result is used as the classification result.

In some embodiments of the present invention,

the Bert matching model is a Sentence-Bert model, a first corpus sample is randomly selected from a training data set, a second corpus sample is obtained after the deactivated words of the first corpus sample are replaced, the first corpus sample and the second corpus sample are used as positive samples, corpora which are irrelevant to the first corpus sample are selected from the training data set as negative samples, and the proportion of the positive samples to the negative samples is 1: n, wherein N is an integer not less than 5, training is performed by adopting a fine-tune mode, and a loss function formula is as follows:

max(||sa-sp||-||sa-sn||+∈,0)

where sa represents the sample from the first corpus, sn represents the negative sample, sp represents the second corpus, "|·|" represents the cosine distance, and e represents the distance boundary.

In some embodiments of the present invention,

the sample number ratio of the training set, the verification set and the test set of the CNN model is 8:1:1, the enabling layer of the CNN model uses word vectors as input, the enabling layer adopts two channels of static and dynamic word vectors to be connected with a convolution layer, a Chinese wikipedia training result of word2vector or fast-text is used as a static channel, the convolution layer adopts three convolution kernels of 2, 3 and 4, and a pooling layer adopts a maximum pooling method.

In some embodiments of the present invention,

the RNN model is a Bi-directional long-short-term memory Bi-LSTM model based on an attention mechanism, and the sample number proportion of a training set, a verification set and a test set of the Bi-LSTM model based on the attention mechanism is 8:1:1, the emmbedding layer of the RNN model takes a word vector as input, 3 hidden layers, each containing 256 lstm units.

In some embodiments, the sample number ratio of the training set, the validation set, and the test set of the Bert classification model is 8:1:1, performing fine-tune training on the basis of a Chinese-Bert-Wwm model.

The embodiment of the invention also provides a short text classification device, which comprises:

a processing module for determining a model combination according to the business stage and the number of samples of the training data set, the model combination comprising at least one neural network model;

the training module is used for training the models in the model combination;

and the prediction module is used for inputting the text to be processed into the trained model combination and outputting a classification result.

In some embodiments, the training module is specifically configured to:

training the Bert matching model, the CNN model and the RNN model when the number of samples of the training data set is greater than or equal to a first threshold value and less than a second threshold value in the middle and later stages of the service; and training the Bert classification model, the CNN model and the RNN model when the number of samples of the training data set is greater than or equal to a second threshold.

In some embodiments of the present invention,

the first threshold is 3000-7000, and the second threshold is 30000-70000.

In some embodiments, the prediction module is specifically configured to:

In some embodiments of the present invention,

max(||sa-sp||-||sa-sn||+∈,0)

In some embodiments of the present invention,

The embodiment of the invention also provides a short text classification device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor; the processor, when executing the program, implements the short text classification method as described above.

In some embodiments, the processor is configured to determine a model combination based on the business stage in which it is located and the number of samples of the training data set, the model combination including at least one neural network model; training the model in the model combination; and inputting the text to be processed into the trained model combination, and outputting a classification result.

In some embodiments, the processor is specifically configured to train only the Bert matching model when the number of samples of the training data set is less than a first threshold during a cold start phase; training the Bert matching model, the CNN model and the RNN model when the number of samples of the training data set is greater than or equal to a first threshold value and less than a second threshold value in the middle and later stages of the service; and training the Bert classification model, the CNN model and the RNN model when the number of samples of the training data set is greater than or equal to a second threshold.

In some embodiments, the processor is specifically configured to search, if the number of samples of the currently used training data set is smaller than the first threshold, a text most similar to the text to be processed in the training corpus by using a Bert matching model, and use a category corresponding to the text as the classification result;

if the number of samples of the currently used training data set is greater than or equal to the first threshold value and smaller than the second threshold value, inputting the text to be processed into a CNN model to obtain a first prediction result, inputting the text to be processed into the CNN model to obtain a second prediction result, and if the first prediction result is the same as the second prediction result, taking the first prediction result or the second prediction result as the classification result; if the first predicted result is different from the second predicted result, searching a first text and a second text which are the most similar to the text to be processed in a training corpus of the categories represented by the first predicted result and the second predicted result by using a Bert matching model, and a first score corresponding to the first text and a second score corresponding to the second text respectively, wherein if the first score is greater than or equal to the second score, the category to which the first text belongs is used as the classification result, and if the first score is smaller than the second score, the category to which the second text belongs is used as the classification result; if the number of samples of the currently used training data set is greater than or equal to the second threshold, inputting the text to be processed into a CNN model to obtain a first prediction result, inputting the text to be processed into the RNN model to obtain a second prediction result, and if the first prediction result is the same as the second prediction result, taking the first prediction result or the second prediction result as the classification result; if the first predicted result is different from the second predicted result, a third predicted result is obtained through a Bert classification model, and the third predicted result is used as the classification result.

In some embodiments of the present invention,

max(||sa-sp||-||sa-sn||+∈,0)

In some embodiments of the present invention,

The embodiment of the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the short text classification method as described above.

The embodiment of the invention has the following beneficial effects:

in the above scheme, the model combination is determined according to the service stage and the sample number of the training data set, for example, when the sample number of the training data set is very small, a text matching model is adopted to find the category of the most similar text in the training corpus as a classification result; when the number of samples of the training data set is generally rich, voting is carried out by two more complex classification models, outputting and ending are carried out if the results are consistent, and judging a final result by a text matching model if the results are inconsistent; when the number of samples of the training data set is rich enough, voting is carried out by two more complex classification models, the results are consistent, the output is finished, and the final result is judged by a stronger classification model if the results are inconsistent. The technical scheme of the embodiment can pay attention to the problems of different stages in the text classification application scene respectively; the text matching model is adopted to replace the classification model in the cold start stage, so that the problem that the model is easy to fit when a data set is scarce can be effectively avoided; the weighted average of voting methods instead of bagging ideas is adopted in multi-model fusion can effectively reduce the prediction time consumption while improving the accurate recall rate.

Drawings

FIGS. 1-2 are flow diagrams of a short text classification method according to embodiments of the present invention;

FIG. 3 is a schematic view of a short text classification device according to an embodiment of the present invention;

fig. 4 is a schematic diagram showing the composition of a short text classification device according to an embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantages to be solved by the embodiments of the present invention more apparent, the following detailed description will be given with reference to the accompanying drawings and the specific embodiments.

The current short text classification scheme for multi-model fusion is mainly two kinds: 1. adopting the concept of bagging, independently training a plurality of classifiers, and taking the weighted average of the results of each classifier as a final result in a prediction stage; 2. and setting fusion loss functions for the multiple models to perform joint training so as to output classification results. These schemes, while reducing the prediction variance, still have the following problems: 1. different schemes are not flexibly adopted according to the number of the data sets, and a unified fusion strategy is adopted, so that the quasi-recall rate is improved to a limited extent; 2. the prediction time is higher, and not only is the high time consumption of complex models with huge parameter quantity (such as Bert classification) but also the prediction time consumption of superposition simple models (such as CNN, RNN and the like) is higher; 3. the cold start phase is poor and classification errors caused by overfitting of multiple models accumulate and amplify due to the small number of cold start phase datasets.

The embodiment of the invention provides a short text classification method and device, which can accurately classify short texts.

An embodiment of the present invention provides a short text classification method, as shown in fig. 1, including:

step 101: determining a model combination according to the service stage and the number of samples of the training data set, wherein the model combination comprises at least one neural network model;

step 102: training the model in the model combination;

step 103: and inputting the text to be processed into the trained model combination, and outputting a classification result.

In this embodiment, a model combination is determined according to the service stage and the number of samples of the training data set, for example, when the number of samples of the training data set is very small, a text matching model is used to find the category of the most similar text in the training corpus as a classification result; when the number of samples of the training data set is generally rich, voting is carried out by two more complex classification models, outputting and ending are carried out if the results are consistent, and judging a final result by a text matching model if the results are inconsistent; when the number of samples of the training data set is rich enough, voting is carried out by two more complex classification models, the results are consistent, the output is finished, and the final result is judged by a stronger classification model if the results are inconsistent. The technical scheme of the embodiment can pay attention to the problems of different stages in the text classification application scene respectively; the text matching model is adopted to replace the classification model in the cold start stage, so that the problem that the model is easy to fit when a data set is scarce can be effectively avoided; the weighted average of voting methods instead of bagging ideas is adopted in multi-model fusion can effectively reduce the prediction time consumption while improving the accurate recall rate.

In some embodiments, in determining a model combination based on the business stage in which it is located and the number of samples of the training dataset, training the models in the model combination comprises:

In the embodiment, a Bert matching model is adopted in the cold start stage, a CNN, RNN, bert matching model is adopted in the middle stage for fusion, and a CNN, RNN, bert classification model is adopted in the later stage for fusion. The problem that the common classification model is easy to fit in the cold start stage can be solved to a certain extent, and the problem that the time consumption of the complex model is too high can be solved while the classification effect is improved. Taking 3000-7000 first threshold values and 30000-7000 second threshold values as examples, the prediction phase flow is shown in fig. 2, and the method comprises the following steps:

only the Bert matching model is trained when the number of samples of the training dataset is less than a minimum threshold (e.g., 5000, depending on the specific business situation), three models of Bert matching, CNN, RNN are trained when the number of samples of the training dataset is greater than the minimum threshold (5000) and less than a sub-low threshold (e.g., 50000, depending on the specific business situation), and three models are classified CNN, RNN, bert when the number of datasets is greater than the sub-low threshold (50000).

The Bert matching model is a Sentence-Bert model, a first corpus sample is randomly selected from an original data set (such as a training data set), a second corpus sample is obtained after the stop word of the first corpus sample is replaced, the first corpus sample and the second corpus sample are used as positive samples, corpora which are not related to the first corpus sample are selected from the training data set and used as negative samples, and the proportion of the positive samples to the negative samples is 1: n (N > =5), training by adopting a fine-tune mode, wherein a loss function is shown in a formula 1, sa represents a first corpus text selected from a training set, sn represents an uncorrelated negative sample, sp represents a second corpus sample, epsilon represents a distance boundary, meaning that the distance between sa and sp is at least greater than that between sa and sn by epsilon, and 'I.I.I' represents a cosine distance, and the optimization target is to enable the distance between sa and sp to be closer and the distance between sa and sn to be further. And storing the obtained matching model after training is finished, and simultaneously calculating the vector values of all the original data sets according to the matching model and storing.

Equation 1: max (|sa-sp| -sa-sn|+% e, 0)

The sample number ratio of the training set, the verification set and the test set of the CNN model is 8:1:1, the enabling layer of the CNN model uses word vectors as input, adopts two-channel connection, uses Chinese wikipedia training results of word2vector or fast-text as static channels, adopts three convolution kernels of 2, 3 and 4, and adopts a maximum pooling method.

The sample number ratio of the training set, the verification set and the test set in the Bert classification model is 8:1:1, performing fine-tune training on the basis of a Chinese-Bert-Wwm model.

When the classification result is predicted, when the number of samples of the training data set is smaller than the lowest threshold value (5000 pieces), a Bert matching model is used for finding a text which is most similar to the text to be processed in the whole training corpus, and the classification corresponding to the text is regarded as a final classification result.

When the number of samples of the training data set is greater than the lowest threshold (5000 pieces) and less than the second lowest threshold (50000 pieces), a first prediction result and a second prediction result of the prediction results are obtained through the CNN model and the RNN model respectively, and the results are consistent (the first prediction result is equal to the second prediction result) and the first prediction result or the second prediction result is directly output as a final classification result. If the two results are inconsistent (the first predicted result is not equal to the second predicted result), the Bert matching model is used for respectively matching the first text and the second text which are most similar to the input text in the first predicted result and the second predicted result category original corpus, and the first score and the second score of the first text and the second score of the second text are corresponding, and the category of the text with high score is obtained by comparing the first score and the second score as the final classification result.

When the number of samples of the training data set is more than 50000, a first predicted result and a second predicted result of the predicted result are respectively obtained through a CNN model and an RNN model, and the results are consistent and directly output a final result. If the two results are inconsistent, a Bert classification model is used for obtaining a third prediction result of the prediction results, and the third prediction result is the final classification result.

The embodiment of the invention also provides a short text classifying device, as shown in fig. 3, which comprises:

a processing module 11 for determining a model combination according to the service phase and the number of samples of the training data set, the model combination comprising at least one neural network model;

a training module 12 for training the models in the model combination;

and the prediction module 13 is used for inputting the text to be processed into the trained model combination and outputting a classification result.

In some embodiments, the training module 12 is specifically configured to:

In some embodiments, the prediction module 13 is specifically configured to:

if the number of samples of the currently used training data set is greater than or equal to the first threshold value and smaller than the second threshold value, respectively obtaining a first prediction result and a second prediction result of the prediction results through a CNN model and an RNN model, and if the first prediction result is the same as the second prediction result, taking the first prediction result or the second prediction result as the classification result; if the first predicted result is different from the second predicted result, searching a text first text and a text second which are most similar to the text to be processed in a training corpus of the category represented by the first predicted result and the second predicted result by using a Bert matching model, respectively, and obtaining a first score corresponding to the first text and a second score corresponding to the second text, wherein if the first score is greater than or equal to the second score, the category to which the first text belongs is used as the classification result, and if the first score is smaller than the second score, the category to which the second text belongs is used as the classification result;

if the number of samples of the currently used training data set is greater than or equal to the second threshold value, respectively obtaining a first prediction result and a second prediction result of the prediction result through a CNN model and an RNN model, and if the first prediction result is the same as the second prediction result, taking the first prediction result or the second prediction result as the classification result; if the first predicted result is different from the second predicted result, obtaining a third predicted result of the predicted result through a Bert classification model, and taking the third predicted result as the classification result.

In some embodiments, the Bert matching model is a Sentence-Bert model, a first corpus sample is randomly selected from a training dataset, a second corpus sample is obtained after the deactivated word of the first corpus sample is replaced, the first corpus sample and the second corpus sample are used as positive samples, a corpus unrelated to the first corpus sample is selected from the training dataset as a negative sample, and the ratio of the positive sample to the negative sample is 1: n, wherein N is an integer not less than 5, training is performed by adopting a fine-tune mode, and a loss function formula is as follows:

max(||sa-sp||-||sa-sn||+∈,0)

In some embodiments, the sample number ratio of the training set, the validation set, and the test set of the CNN model is 8:1:1, the enabling layer of the CNN model uses word vectors as input, adopts two-channel connection, uses Chinese wikipedia training results of word2vector or fast-text as static channels, adopts three convolution kernels of 2, 3 and 4, and adopts a maximum pooling method.

In some embodiments, the RNN model is a attention-mechanism-based Bi-long and short term memory Bi-LSTM model, and the ratio of the number of samples of the attention-mechanism-based Bi-LSTM model is 8:1:1, the emmbedding layer of the RNN model takes a word vector as input, 3 hidden layers, each containing 256 lstm units.

The embodiment of the invention also provides a short text classifying device, as shown in fig. 4, comprising a memory 21, a processor 22 and a computer program stored on the memory 21 and capable of running on the processor 22; the processor 22, when executing the program, implements the short text classification method as described above.

In some embodiments, the processor 22 is configured to determine a model combination based on the business stage in which it is located and the number of samples of the training data set, the model combination including at least one neural network model; training the model in the model combination; and inputting the text to be processed into the trained model combination, and outputting a classification result.

In some embodiments, the processor 22 is specifically configured to train only the Bert matching model when the number of samples of the training data set is less than a first threshold during the cold start phase; training the Bert matching model, the CNN model and the RNN model when the number of samples of the training data set is greater than or equal to a first threshold value and less than a second threshold value in the middle and later stages of the service; and training the Bert classification model, the CNN model and the RNN model when the number of samples of the training data set is greater than or equal to a second threshold.

In some embodiments, the processor 22 is specifically configured to search, if the number of samples of the currently used training data set is less than the first threshold, a text most similar to the text to be processed in the training corpus using a Bert matching model, and use a category corresponding to the text as the classification result; if the number of samples of the currently used training data set is greater than or equal to the first threshold value and smaller than the second threshold value, respectively obtaining a first prediction result and a second prediction result of the prediction results through a CNN model and an RNN model, and if the first prediction result is the same as the second prediction result, taking the first prediction result or the second prediction result as the classification result; if the first predicted result is different from the second predicted result, searching a text first text and a text second which are most similar to the text to be processed in a training corpus of the category represented by the first predicted result and the second predicted result by using a Bert matching model, respectively, and obtaining a first score corresponding to the first text and a second score corresponding to the second text, wherein if the first score is greater than or equal to the second score, the category to which the first text belongs is used as the classification result, and if the first score is smaller than the second score, the category to which the second text belongs is used as the classification result; if the number of samples of the currently used training data set is greater than or equal to the second threshold value, respectively obtaining a first prediction result and a second prediction result of the prediction result through a CNN model and an RNN model, and if the first prediction result is the same as the second prediction result, taking the first prediction result or the second prediction result as the classification result; if the first predicted result is different from the second predicted result, obtaining a third predicted result of the predicted result through a Bert classification model, and taking the third predicted result as the classification result.

In some embodiments, the Bert matching model adopts a Sentence-Bert algorithm, randomly selects a first corpus sample from a training data set, replaces deactivated words of the first corpus sample to obtain a second corpus sample, uses the first corpus sample and the second corpus sample as positive samples, selects corpora irrelevant to the first corpus sample from the training data set as negative samples, and the ratio of the positive samples to the negative samples is 1: n, wherein N is an integer not less than 5, training is performed by adopting a fine-tune mode, and a loss function formula is as follows:

max(||sa-sp||-||sa-sn||+∈,0)

In some embodiments, the RNN model is a Bi-directional long-short term memory Bi-LSTM model based on an attention mechanism, and the sample number ratio of the training set, the verification set, and the test set of the Bi-LSTM model based on the attention mechanism is 8:1:1, the emmbedding layer of the RNN model takes a word vector as input, 3 hidden layers, each containing 256 lstm units.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices to be detected, or any other non-transmission medium which can be used to store information that can be accessed by a computing device to be detected. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.

Claims

1. A method of classifying short text, comprising:

training the model in the model combination;

2. The short text classification method of claim 1, wherein determining a model combination based on the business stage and the number of samples of the training data set, training the models in the model combination comprises:

3. The short text classification method according to claim 2, wherein the first threshold is 3000-7000 and the second threshold is 30000-70000.

4. The short text classification method of claim 2, wherein said inputting the text to be processed into the trained model combination, outputting the classification result comprises:

5. The short text classification method according to claim 2, wherein the Bert matching model is a Sentence-Bert model, a first corpus sample is randomly selected from a training data set, a second corpus sample is obtained after the deactivated words of the first corpus sample are replaced, the first corpus sample and the second corpus sample are used as positive samples, a corpus unrelated to the first corpus sample is selected from the training data set as a negative sample, and the ratio of the positive sample to the negative sample is 1: n, wherein N is an integer not less than 5, training is performed by adopting a fine-tune mode, and a loss function formula is as follows:

max(||sa-sp||-||sa-sn||+∈,0)

6. The short text classification method according to claim 2, wherein the sample number ratio of the training set, the validation set, and the test set of the CNN model is 8:1:1, the enabling layer of the CNN model uses word vectors as input, the enabling layer adopts two channels of static and dynamic word vectors to be connected with a convolution layer, a Chinese wikipedia training result of word2vector or fast-text is used as a static channel, the convolution layer adopts three convolution kernels of 2, 3 and 4, and a pooling layer adopts a maximum pooling method.

7. The short text classification method according to claim 2, wherein the RNN model is a Bi-directional long-short term memory Bi-LSTM model based on an attention mechanism, and the sample number ratio of the training set, the verification set, and the test set of the Bi-LSTM model based on the attention mechanism is 8:1:1, the emmbedding layer of the RNN model takes a word vector as input, 3 hidden layers, each containing 256 lstm units.

8. The short text classification method according to claim 2, wherein the sample number ratio of the training set, the validation set, and the test set of the Bert classification model is 8:1:1, performing fine-tune training on the basis of a Chinese-Bert-Wwm model.

9. A short text classification device, comprising:

the training module is used for training the models in the model combination;

10. A short text classification device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor; the short text classification method according to any of claims 1-8, characterized in that the processor implements the short text classification method when executing the program.

11. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the short text classification method according to any of the claims 1-8.