CN114741517A

CN114741517A - Training method, device, equipment and medium of text classification model and text classification method, device and equipment

Info

Publication number: CN114741517A
Application number: CN202210503601.9A
Authority: CN
Inventors: 苑浩
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-05-09
Filing date: 2022-05-09
Publication date: 2022-07-12

Abstract

The disclosure provides a method, a device, equipment and a medium for training a text classification model and classifying texts, and relates to the fields of deep learning, natural language processing and the like. The specific implementation scheme is as follows: clustering the obtained sample texts to obtain at least one target cluster; generating cluster labels corresponding to the sample texts according to the target clusters to which the sample texts belong; performing first-class prediction on each sample text by adopting a text classification model to obtain a prediction label of each sample text; and performing first training on the text classification model according to the prediction labels and the cluster labels corresponding to the sample texts. Because the clustering can capture the significant semantic features in the sample texts, the cluster labels corresponding to the sample texts are generated by clustering the sample texts, and the text classification model is pre-trained on the basis of the cluster labels, the text classification model can effectively learn the significant semantic information in the sample texts before real training, and the model expression and performance are improved.

Description

Training method, device, equipment and medium of text classification model and text classification method, device and equipment

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to the technical fields of deep learning and natural language processing, and more particularly, to a method, an apparatus, a device, and a medium for training a text classification model and text classification.

Background

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. The text classification is a basic work of natural language processing work, and the text classification utilizes a character information carrier to arrange and classify texts, so that the text classification is widely applied to different fields, such as digital library, public opinion analysis, news recommendation, mail filtering and other fields.

In order to realize automatic classification of texts, when a text classification model (also referred to as a text classifier) is constructed on an existing sample text, the text classification model needs to be trained, so that the text to be classified is classified based on the trained text classification model.

How to train the text classification model is very important in order to improve the prediction effect of the model.

Disclosure of Invention

The disclosure provides a method, a device, equipment and a medium for training a text classification model and classifying texts.

According to an aspect of the present disclosure, there is provided a training method of a text classification model, including:

obtaining a plurality of sample texts, and clustering the sample texts to obtain at least one target cluster;

generating a cluster label corresponding to each sample text according to the target cluster to which each sample text belongs, wherein the cluster label is used for indicating the cluster category to which the sample text belongs;

performing first-class prediction on each sample text by adopting a text classification model to obtain a prediction label corresponding to each sample text;

and performing first training on the text classification model according to the prediction label and the cluster label corresponding to each sample text.

According to another aspect of the present disclosure, there is provided a text classification method including:

acquiring a text to be classified;

classifying the texts to be classified by adopting the text classification model trained by the training method of the text classification model provided by the above aspect of the disclosure to obtain the classification labels of the texts to be classified.

According to another aspect of the present disclosure, there is provided a training apparatus for a text classification model, including:

the acquisition module is used for acquiring a plurality of sample texts;

the clustering module is used for clustering the sample texts to obtain at least one target cluster;

the generating module is used for generating a cluster label corresponding to each sample text according to the target cluster to which each sample text belongs, wherein the cluster label is used for indicating the cluster category to which the sample text belongs;

the first prediction module is used for performing first-class prediction on each sample text by adopting a text classification model to obtain a prediction label corresponding to each sample text;

and the second training module is used for performing first training on the text classification model according to the prediction labels and the cluster labels corresponding to the sample texts.

According to still another aspect of the present disclosure, there is provided a text classification apparatus including:

the acquisition module is used for acquiring texts to be classified;

and the classification module is used for classifying the texts to be classified by adopting the text classification model trained by the training device of the text classification model provided by the other aspect of the disclosure to obtain the classification labels of the texts to be classified.

According to still another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of training a text classification model according to one aspect of the disclosure or a method of classifying text according to another aspect of the disclosure.

According to still another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium of computer instructions for causing a computer to perform a training method of a text classification model set forth in the above-mentioned one aspect of the present disclosure or perform a text classification method set forth in the above-mentioned another aspect of the present disclosure.

According to yet another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method for training a text classification model proposed by the above-mentioned aspect of the present disclosure, or implements the method for text classification proposed by the above-mentioned aspect of the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of a training phase included in a text classification model;

fig. 2 is a schematic flowchart of a training method of a text classification model according to an embodiment of the present disclosure;

fig. 3 is a flowchart illustrating a training method of a text classification model according to a second embodiment of the present disclosure;

fig. 4 is a flowchart illustrating a training method of a text classification model according to a third embodiment of the present disclosure;

fig. 5 is a schematic flowchart of a training method of a text classification model according to a fourth embodiment of the present disclosure;

FIG. 6 is a schematic flow chart illustrating a process for training a text classification model using a clustering stage and a fine tuning finetune stage according to the present disclosure;

fig. 7 is a schematic flowchart of a training method of a text classification model according to a fifth embodiment of the present disclosure;

FIG. 8 is a schematic flow chart illustrating a process for training a text classification model using a three-stage training method according to the present disclosure;

fig. 9 is a schematic flowchart of a text classification method according to a sixth embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of a training apparatus for a text classification model according to a seventh embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of a text classification apparatus according to an eighth embodiment of the present disclosure;

FIG. 12 shows a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Text classification techniques in natural language processing are largely applied in different fields, but a large number of training samples need to be labeled when constructing a text classification model. However, on one hand, the labeling of the training samples needs to take longer time, so that the model cannot be started quickly; on the other hand, the labeling of the training samples needs to consume higher labor cost, and the labeling quality is variable.

In order to solve the problems, a small sample technology can be adopted to train the text classification model so as to train the text classification model by adopting a small number of training samples, so that the model has better learning and summarizing capabilities to cope with the condition that the number of the training samples in an actual scene is small.

In the related art, the small sample technology applied to text classification mainly has the following three directions:

the first direction is data, and additional data are explicitly introduced for data enhancement, so that when searching in a search space, the optimization problem can be assisted to be completed through the additional data, and a supervision signal is enhanced by using a priori knowledge.

The second direction is the model, which reduces the search space by a priori knowledge, reducing the size of the hypothesis space.

The third direction is algorithm, which introduces prior knowledge to optimize the search strategy, i.e. learning an optimal search strategy.

However, the above method has several disadvantages:

1. the quality reliability of data enhancement is low, and in particular, when similar data is used for training, larger noise is always introduced;

2. based on a model mode, a pre-training model is mostly adopted for small sample reconstruction, but the model prediction effect of the mode in the fine tuning finetune stage is poor;

as an example, the text classification model includes two training stages, which are a pre-training stage and a fine-tuning finetune stage, as shown in fig. 1, in the pre-training stage, the effect of text classification can be improved by introducing prior knowledge, that is, some information is learned in the pre-training stage in an unsupervised manner, so that in the fine-tuning stage, a small amount of training text can be used to perform algorithm fine-tuning on the text classification model to realize text classification.

3. The algorithm-based method mostly improves the parameters of the model or improves the searching step, and the application range is limited.

In order to solve the above problems, the present disclosure provides a method, an apparatus, a device, and a medium for training a text classification model and text classification.

The following describes training of a text classification model and a text classification method, apparatus, device, and medium according to an embodiment of the present disclosure with reference to the drawings.

Fig. 2 is a flowchart illustrating a training method of a text classification model according to an embodiment of the present disclosure.

The embodiment of the present disclosure is exemplified in that the training method of the text classification model is configured in a training apparatus of the text classification model, and the training apparatus of the text classification model can be applied to any electronic device, so that the electronic device can perform the training function of the text classification model.

The electronic device may be any device with computing capability, for example, a personal computer, a mobile terminal, a server, and the like, and the mobile terminal may be a hardware device with various operating systems, touch screens, and/or display screens, such as a mobile phone, a tablet computer, a personal digital assistant, a wearable device, and the like.

As shown in fig. 2, the training method of the text classification model may include the following steps:

step 201, obtaining a plurality of sample texts, and clustering the plurality of sample texts to obtain at least one target cluster.

In the embodiment of the present disclosure, a plurality of sample texts may be obtained, where the sample texts may be obtained from an existing training set, or the sample texts may also be acquired online, for example, the sample texts may be acquired online through a web crawler technology, or the sample texts may also be acquired offline, for example, the image of the text content of the paper may be acquired, and then each Character in the image is recognized through an OCR (Optical Character Recognition) technology to obtain the sample texts, and the present disclosure does not limit this.

The sample text may be an article, such as a novel prose, a paper, or the like, or may also be news, information, or the like, or may also be a segment of text, and the like, which is not limited in this disclosure.

In the embodiment of the present disclosure, the obtained multiple sample texts may be clustered, for example, a clustering algorithm may be adopted to cluster the multiple sample texts, so as to obtain at least one target cluster. The clustering algorithm may include a K-means clustering algorithm (K-means clustering algorithm), a KNN (K-nearest neighbor) classification algorithm, and the like, which is not limited by the present disclosure.

The number of the target clusters may be one, or may also be multiple, which is not limited in this disclosure.

In a possible implementation manner of the embodiment of the present disclosure, in the process of clustering a plurality of sample texts, there may be abnormal sample texts, for example, the abnormal sample texts do not belong to any cluster, at this time, the sample texts may be eliminated, so that the sample texts in the target clusters obtained by clustering are relatively concentrated, and thus, the text classification model is trained by using the target clusters in which the sample texts are relatively concentrated, so that the model learns relatively strong semantic features in the sample texts in the same target cluster, and interference caused by the abnormal sample texts is ignored.

Step 202, generating a cluster label corresponding to each sample text according to the target cluster to which each sample text belongs, wherein the cluster label is used for indicating the cluster type to which each sample text belongs.

In this embodiment of the present disclosure, the cluster label may be used to indicate a cluster category to which the sample text belongs, where the cluster label may be a number in a numeric form, or may also be a number in an alphabetical form, or may also be a number in a combination of an alphabetical form and a numeric form, or may also be other character strings, special symbols, and the like, which is not limited in this disclosure.

In the embodiment of the present disclosure, a cluster label corresponding to each sample text may be generated according to a target cluster to which each sample text belongs, so as to indicate a cluster category to which each sample text belongs. And cluster labels corresponding to all sample texts belonging to the same target cluster are the same.

For example, the number of the target clusters obtained by clustering is 3, which are respectively a target cluster 1, a target cluster 2 and a target cluster 3, the cluster label corresponding to each sample text in the target cluster 1 is 01, the cluster label corresponding to each sample text in the target cluster 2 is 02, and the cluster label corresponding to each sample text in the target cluster 3 is 03. That is, the cluster label 01 is used to indicate that the sample text belongs to the target cluster 1, the cluster label 02 is used to indicate that the sample text belongs to the target cluster 2, and the cluster label 03 is used to indicate that the sample text belongs to the target cluster 3.

For another example, 2 target clusters are obtained by clustering, which are respectively a target cluster 1 and a target cluster 2, the cluster label corresponding to each sample text in the target cluster 1 is a, and the cluster label corresponding to each sample text in the target cluster 2 is b. That is, the cluster label a is used to indicate that the sample text belongs to the target cluster 1, and the cluster label b is used to indicate that the sample text belongs to the target cluster 2.

As an example, sample texts with similar semantics may be clustered into the same target cluster, for example, taking the sample texts as poems, poems of the sigh spring class may be clustered into a target cluster 1, poems of the song summer class may be clustered into a target cluster 2, poems of the sorrow autumn class may be clustered into a target cluster 3, and poems of the pity winter class may be clustered into a target cluster 4.

For example, the cluster label corresponding to each sample text in the generated target cluster 1 is 01, the cluster label corresponding to each sample text in the target cluster 2 is 02, the cluster label corresponding to each sample text in the target cluster 3 is 03, and the cluster label corresponding to each sample text in the target cluster 4 is 04.

It should be noted that, the foregoing examples of the cluster label are only exemplary, and in actual application, the cluster label corresponding to each sample text may be set according to actual application requirements, which is not limited by the present disclosure.

And 203, performing first-class prediction on each sample text by using a text classification model to obtain a prediction label corresponding to each sample text.

In the embodiment of the present disclosure, a text classification model may be used to perform first class prediction on each sample text, that is, the text classification model is used to predict a cluster class to which each sample text belongs, so as to obtain a prediction label corresponding to each sample text.

For example, clustering is performed on a plurality of sample texts to obtain a target cluster 1 and a target cluster 2, wherein a cluster label corresponding to the target cluster 1 is 02, a cluster label corresponding to the target cluster 2 is 04, a text classification model is used to perform belonging cluster type prediction on each sample text in the target cluster 1, the obtained prediction label can be 02, a text classification model is used to perform belonging cluster type prediction on each sample text 2 in the target cluster 2, and the obtained prediction label can be 04.

And 204, performing first training on the text classification model according to the prediction label and the cluster label corresponding to each sample text.

In the embodiment of the present disclosure, the first training may be performed on the text classification model according to the prediction label and the cluster label corresponding to each sample text.

As a possible implementation manner, for each sample text, when there is a difference between the prediction tag corresponding to the sample text and the cluster tag, it indicates that the prediction accuracy of the text classification model is not high, and at this time, in order to improve the accuracy and reliability of the model prediction result and improve the prediction accuracy of the model, the model parameters in the text classification model may be adjusted, that is, in the present disclosure, the text classification model may be subjected to the first training according to the difference between the prediction tag corresponding to each sample text and the cluster tag. That is, for each sample text, the model parameters in the text classification model may be adjusted according to the difference between the prediction label and the cluster label corresponding to the sample text.

For example, assuming that the cluster label corresponding to the sample text 1 is 02 and the cluster label corresponding to the sample text 2 is 04, the text classification model may be used to perform the belonging cluster type prediction on the sample text 1 and the sample text 2, so as to obtain the prediction label corresponding to each sample text. If the prediction label corresponding to the sample text 1 and the prediction label corresponding to the sample text 2 output by the text classification model are 02 and 04 respectively, determining that the model prediction is accurate without adjusting the model parameters; if the prediction label corresponding to the sample text 1 and the prediction label corresponding to the sample text 2 output by the text classification model are 04 and 04 respectively, the model prediction error is determined, and at this time, the model parameters of the text classification model can be adjusted according to the difference between the cluster label corresponding to the sample text 1 and the prediction label.

As an example, a first loss function may be generated according to a difference between a prediction tag and a cluster tag corresponding to each sample text, where a value of the first loss function and the difference are in a forward relationship, that is, the smaller the difference, the smaller the value of the first loss function is, and conversely, the larger the difference, the larger the value of the first loss function is, so that in the present disclosure, a first training may be performed on a text classification model according to the value of the first loss function, so as to minimize the value of the first loss function.

It should be noted that, in the above example, only the termination condition of the first training of the model is taken as the minimum value of the first loss function, when the method is actually applied, other termination conditions may also be set, for example, the termination condition may also be that the training time reaches a set time threshold, the training time is greater than a set time threshold, and the like, which is not limited by the present disclosure.

The training method of the text classification model of the embodiment of the disclosure obtains at least one target cluster by obtaining a plurality of sample texts and clustering the plurality of sample texts; generating a cluster label corresponding to each sample text according to the target cluster to which each sample text belongs, wherein the cluster label is used for indicating the cluster type to which each sample text belongs; performing first-class prediction on each sample text by adopting a text classification model to obtain a prediction label corresponding to each sample text; and performing first training on the text classification model according to the prediction label and the cluster label corresponding to each sample text. According to the method and the device, the significant semantic features in the sample texts can be captured by clustering, the cluster labels corresponding to the sample texts are generated by clustering the sample texts, and the text classification model is pre-trained on the basis of the cluster labels, so that the text classification model can effectively learn the significant semantic information in the sample texts before real training, and the model expression and performance can be improved when a small amount of sample texts are utilized to perform real training on the text classification model.

In order to clearly illustrate how the plurality of sample texts are clustered to obtain at least one target cluster in the above embodiments of the present disclosure, the present disclosure further provides a training method for a text classification model.

Fig. 3 is a flowchart illustrating a training method of a text classification model according to a second embodiment of the present disclosure.

As shown in fig. 3, the training method of the text classification model may include the following steps:

step 301, obtaining a plurality of sample texts and similarities between the plurality of sample texts.

It should be noted that the explanation for obtaining the multiple sample texts is also applicable to this embodiment, and is not described herein again.

In the embodiment of the present disclosure, the similarity between sample texts may be a text similarity between sample texts, or may also be a semantic similarity between sample texts, which is not limited in the present disclosure.

In the embodiment of the present disclosure, after the plurality of sample texts are obtained, the Similarity between the plurality of sample texts may be obtained, for example, the Similarity between the plurality of sample texts may be calculated by using an N-Gram model, a Cosine Similarity (Cosine Similarity), a Pearson Correlation Coefficient (Pearson Correlation Coefficient), an Euclidean Distance (Euclidean Distance), a Manhattan Distance (Manhattan Distance), and other algorithms, which is not limited by the present disclosure.

As a possible implementation manner, feature extraction may be performed on each sample text to obtain a feature vector of each sample text, and the similarity between each sample text is calculated according to the feature vector of each sample text.

For example, assuming that the sample text 1 is "singing and dancing in the third Zhang meeting", and the sample text 2 is "dancing and playing piano in the fourth Liang meeting", first, the sub-words "third Zhang", "meeting", "singing", "and", "dancing" of the sample text 1, and the sub-words "fourth Liang", "meeting", "dancing", "and", "playing piano" of the sample text 2 are obtained; next, it is determined that a set formed by subwords appearing in the sample text 1 and the sample text 2 is { three, four, meeting, singing, dancing, and playing musical }, a feature vector corresponding to the sample text 1 may be (1, 0, 1, 1, 1, 1, 0), and a feature vector corresponding to the sample text 2 may be (0, 1, 1, 0, 1, 1, 1), where each value in the feature vector corresponding to the sample text represents the number of times that a subword in a corresponding position in the set appears in the sample text. Finally, the similarity between the sample text 1 and the sample text 2 may be determined according to the similarity between the feature vector of the sample text 1 and the feature vector of the sample text 2.

It should be noted that, in practical application, the feature vector corresponding to each sample text may also be determined according to other feature extraction algorithms, which is not limited in this disclosure.

As an example, a cosine similarity algorithm may be used to calculate the similarity between the sample text 1 and the sample text 2, and the feature vector corresponding to the sample text 1 is marked as (x)₁,x₂,x₃,x₄,x₅,x₆,x₇) The feature vector corresponding to the sample text 2 is (y)₁,y₂,y₃,y₄,y₅,y₆,y₇) Then, the cosine similarity S between the sample text 1 and the sample text 2 can be determined according to the following formula:

the value range of S may be [ -1, 1], and when S is 1, the characterization sample text 1 is completely similar to the sample text 2; when S ═ 1, characterization sample text 1 is completely different from sample text 2.

As another example, a euclidean distance algorithm may be used to calculate the similarity between sample text 1 and sample text 2, and the feature vector corresponding to the labeled sample text 1 is (x)₁,x₂,x₃,x₄,x₅,x₆,x₇) The feature vector corresponding to the sample text 2 is (y)₁,y₂,y₃,y₄,y₅,y₆,y₇) Then the euclidean distance d between the sample text 1 and the sample text 2 can be determined according to the following formula:

it should be noted that, when the similarity between a plurality of sample texts is calculated by using an euclidean distance and manhattan distance equidistance measurement algorithm, the distance d and the similarity S are in a negative relationship, that is, the greater the distance, the smaller the similarity, and conversely, the smaller the distance, the greater the similarity.

Step 302, clustering the plurality of sample texts according to the similarity among the plurality of sample texts to obtain at least one target cluster.

In the embodiment of the present disclosure, the plurality of sample texts may be clustered according to the similarity between the plurality of sample texts, so as to obtain at least one target cluster.

As a possible implementation manner, a similarity threshold may be preset, so that the multiple sample texts may be clustered according to similarities (e.g., text similarity, semantic similarity) between the multiple sample texts to obtain at least one target cluster. And the similarity between the sample texts belonging to the same target cluster is greater than a similarity threshold value.

Step 303, generating a cluster label corresponding to each sample text according to the target cluster to which each sample text belongs, wherein the cluster label is used for indicating the cluster type to which each sample text belongs.

And 304, performing first-class prediction on each sample text by adopting a text classification model to obtain a prediction label corresponding to each sample text.

And 305, performing first training on the text classification model according to the prediction label and the cluster label corresponding to each sample text.

The execution process of steps 303 to 305 may refer to the execution process of any embodiment of the present disclosure, and is not described herein again.

According to the training method of the text classification model, the similarity among a plurality of sample texts is obtained; and clustering the plurality of sample texts according to the similarity among the plurality of sample texts to obtain at least one target cluster. Therefore, the clustering method and the clustering device can cluster the sample texts based on the similarity among the sample texts, and can improve the accuracy and reliability of clustering results.

In order to clearly illustrate how to cluster a plurality of sample texts according to the similarity between the plurality of sample texts in any embodiment of the present disclosure, so as to obtain at least one target cluster, the present disclosure further provides a training method of a text classification model.

Fig. 4 is a flowchart illustrating a training method of a text classification model according to a third embodiment of the present disclosure.

As shown in fig. 4, the training method of the text classification model may include the following steps:

step 401, obtaining a plurality of sample texts and similarities between the plurality of sample texts.

The execution process of step 401 may refer to the execution process of any embodiment of the present disclosure, and is not described herein again.

And step 402, clustering the plurality of sample texts according to the set number of the first clusters by adopting a first clustering algorithm according to the similarity among the plurality of sample texts to obtain initial clusters of the number of the first clusters.

In the embodiment of the present disclosure, the number of the first clusters may be preset.

In the embodiment of the present disclosure, the first clustering algorithm may be, for example, a K-means clustering algorithm, a KNN classification algorithm, and the like, which is not limited by the present disclosure.

In the embodiment of the present disclosure, a plurality of sample texts may be clustered according to the set number of first clusters and a first clustering algorithm according to the similarity (e.g., text similarity and semantic similarity) between the sample texts, so as to obtain an initial cluster of the number of first clusters.

At step 403, the distance between each initial cluster is determined.

In embodiments of the present disclosure, the distance between each initial cluster may be determined.

As an example, for each initial cluster, a reference sample text corresponding to the initial cluster may be determined from sample texts in the initial cluster, so that a distance between the reference sample texts of the initial clusters may be calculated, and the distance between the reference sample texts of the initial clusters is taken as the distance between the initial clusters.

For example, for any initial cluster, when determining a reference sample text corresponding to the initial cluster, one sample text may be arbitrarily selected from the initial cluster, and distances between the selected sample text and remaining sample texts in the initial cluster except the selected sample text are calculated, and a sum of the distances between the selected sample text and the remaining sample texts is calculated, and the sum of the distances is used as a weight of the selected sample text. Therefore, in the present disclosure, the reference sample text corresponding to the initial cluster may be determined from each sample text of the initial cluster according to the weight corresponding to each sample text in the initial cluster. For example, the sample text with the smallest weight may be used as the reference sample text of the initial cluster.

As another example, cluster centers of the initial clusters may be determined, and distances between the cluster centers of the initial clusters may be calculated, so that the distance between the cluster centers of the initial clusters may be taken as the distance between the initial clusters.

For example, for any initial cluster, when determining a cluster center corresponding to the initial cluster, a mean value of feature vectors of sample texts in the initial cluster may be determined, and the mean value of feature vectors of sample texts in the initial cluster is used as the cluster center of the initial cluster.

And step 404, under the condition that the distance between the initial clusters is smaller than the set inter-cluster distance threshold, clustering the plurality of sample texts by adopting a second clustering algorithm according to the similarity between the plurality of sample texts according to the inter-cluster distance threshold so as to obtain at least one target cluster.

In the disclosed embodiment, the inter-cluster distance threshold may be preset.

In the embodiment of the present disclosure, the second clustering algorithm is different from the first clustering algorithm, for example, when the first clustering algorithm is a K-means clustering algorithm, the second clustering algorithm may be a KNN clustering algorithm, when the first clustering algorithm is a KNN clustering algorithm, the second clustering algorithm may be a K-means clustering algorithm, and so on, which is not limited by the present disclosure.

In this disclosure, after the distances between the initial clusters are obtained through calculation, the distances between the initial clusters may be compared with a set inter-cluster distance threshold, if a condition exists that the distance between at least one initial cluster is smaller than the set inter-cluster distance threshold, or if the distances between all the initial clusters are smaller than the set inter-cluster distance threshold, it indicates that the effect of clustering by using the first clustering algorithm is not good, at this time, in order to improve the accuracy and reliability of the clustering result, the second clustering algorithm may be selected according to the inter-cluster distance threshold, and the second clustering algorithm is used to cluster the plurality of sample texts according to the similarity between the plurality of sample texts, so as to obtain at least one target cluster.

The similarity between sample texts belonging to the same target cluster is greater than a similarity threshold, and the distance between different target clusters is greater than or equal to a set inter-cluster distance threshold.

In a possible implementation manner of the embodiment of the present disclosure, under the condition that the distance between the initial clusters is smaller than the set inter-cluster distance threshold, the number of the first clusters may also be adjusted (for example, the number of the first clusters is increased and/or the number of the first clusters is decreased), so that the plurality of sample texts may be clustered again by using the first clustering algorithm according to the adjusted number of the first clusters and the inter-cluster distance threshold, so as to obtain the target cluster of the adjusted number of the first clusters.

Therefore, clustering can be performed on the sample texts according to different modes to obtain at least one target cluster, the clustering effect of the target cluster can be improved, and the flexibility and the applicability of the method can be improved.

Step 405, generating a cluster label corresponding to each sample text according to the target cluster to which each sample text belongs, wherein the cluster label is used for indicating the cluster type to which each sample text belongs.

And 406, performing first-class prediction on each sample text by using a text classification model to obtain a prediction label corresponding to each sample text.

Step 407, performing first training on the text classification model according to the prediction labels and the cluster labels corresponding to the sample texts.

The execution process of steps 405 to 407 may refer to the execution process of any embodiment of the present disclosure, which is not described herein again.

According to the training method of the text classification model, a plurality of sample texts are clustered according to the set number of first clusters by adopting a first clustering algorithm according to the similarity among the plurality of sample texts, so that initial clusters of the first cluster number are obtained; determining the distance between each initial cluster; and under the condition that the distance between the initial clusters is smaller than the set inter-cluster distance threshold, clustering the plurality of sample texts by adopting a second clustering algorithm according to the similarity between the plurality of sample texts according to the inter-cluster distance threshold so as to obtain at least one target cluster. Therefore, when the first clustering algorithm is adopted to cluster the sample texts, if the clustering effect of the first clustering algorithm is poor, other clustering algorithms can be replaced to cluster the sample texts again, and the clustering effect of the target clusters can be improved.

In a possible implementation manner of the embodiment of the present disclosure, after the first training is performed on the text classification model, in order to improve the classification effect of the text classification model, a second training may be performed on the text classification model. The second training process is described in detail below with reference to fig. 5.

Fig. 5 is a flowchart illustrating a training method of a text classification model according to a fourth embodiment of the present disclosure.

As shown in fig. 5, the training method of the text classification model may include the following steps:

step 501, obtaining a plurality of sample texts, and clustering the plurality of sample texts to obtain at least one target cluster.

Step 502, generating a cluster label corresponding to each sample text according to the target cluster to which each sample text belongs, wherein the cluster label is used for indicating the cluster type to which each sample text belongs.

Step 503, performing a first class prediction on each sample text by using a text classification model to obtain a prediction label corresponding to each sample text.

Step 504, performing a first training on the text classification model according to the prediction labels and the cluster labels corresponding to the sample texts.

The execution process of steps 501 to 504 may refer to the execution process of any embodiment of the present disclosure, and is not described herein again.

And 505, performing second class prediction on each sample text by using the text classification model after the first training to obtain a classification label.

In the embodiment of the present disclosure, the classification labels may be different for different text classification tasks, wherein the text classification tasks may include emotion classification, whether to classify, other classifications, and the like.

As an example, a text classification task is taken as an emotion classification. The purpose of sentiment classification is: and judging the emotional polarity (positive or negative) of the sample text, or judging the emotional polarity of the target entity in the sample text. When the purpose of emotion classification is to determine the emotion polarity of a sample text or determine the emotion polarity of a target entity in a sample text, a classification label (i.e., emotion polarity) may be "positive" or "negative".

As another example, a text classification task is exemplified as whether to classify. The purpose of whether to classify is: giving a non-question an answer. For example, assuming a non-question of "did yesterday take dance," the classification label (i.e., answer) may include three categories, yes, no, or uncertain, respectively.

When the text classification task is another classification, the classification labels corresponding to the sample texts may be determined in a manner similar to that described above for other sentence classification tasks, viewpoint classification tasks, entity classification tasks, and the like.

In the embodiment of the present disclosure, the second class prediction may be performed on each sample text by using the text classification model after the first training, so as to obtain a classification label corresponding to each sample text. That is, in the present disclosure, the text classification model after the first training may be adopted to perform text classification on each sample text, so as to obtain a classification label of each sample text.

And step 506, performing second training on the text classification model after the first training according to the difference between the classification label corresponding to each sample text and the labeled real label.

In the embodiment of the present disclosure, each sample text may be labeled with a real label, where the real label is used to indicate a classification category to which the corresponding sample text belongs.

In the embodiment of the present disclosure, when there is a difference between the classification label corresponding to each sample text and the labeled real label, it indicates that the prediction precision of the model is not high, and therefore, in order to improve the prediction precision of the model, that is, to improve the accuracy and reliability of the model classification result, the second training may be performed on the text classification model after the first training. That is, for each sample text, the model parameters in the text classification model may be adjusted according to the difference between the classification label and the real label corresponding to the sample text.

As an example, a second loss function may be generated according to a difference between a classification label and a real label corresponding to each sample text, where a value of the second loss function and the difference are in a forward relationship, that is, the smaller the difference, the smaller the value of the second loss function, and conversely, the larger the difference, the larger the value of the second loss function, so that in the present disclosure, a second training may be performed on the text classification model according to the value of the second loss function, so as to minimize the value of the second loss function.

It should be noted that, in the above example, only the termination condition of the second training of the model is taken as the minimum value of the second loss function, when the method is actually applied, other termination conditions may also be set, for example, the termination condition may also be that the training time reaches a set time threshold, the training time is greater than a set time threshold, and the like, which is not limited by the present disclosure.

As an example, a text classification task is taken as an emotion classification, and the following example is used to illustrate a second training process of a text classification model applicable to the emotion classification task:

when the emotion classification aims at judging the emotion polarity of a sample text, a classification label (namely the emotion polarity) can be in a positive direction or a negative direction, the sample text is assumed to be in a state that the match is won with a lot of interest today, a labeled real label is in the positive direction, and if a text classification model after first training is adopted to carry out second type prediction (namely text classification) on the sample text, and the obtained classification label is in the positive direction, the model prediction is determined to be correct without adjusting model parameters; and if the second type prediction is carried out on the sample text by adopting the text classification model after the first training, and the obtained classification label is negative, determining that the model prediction is wrong, and adjusting the model parameters at the moment.

When the emotion classification aims at judging the emotion polarity of a target entity in sample data, a classification label (namely the emotion polarity) can be in a positive direction or a negative direction, a sample text is supposed to be in a good environment of a region A in this year, the target entity is in the region A, a labeled real label is in the positive direction, if a text classification model after first training is adopted to perform second type prediction on the sample text, the obtained classification label is in the positive direction, the model prediction is determined to be correct, and model parameters are not required to be adjusted; if the second type prediction is carried out on the sample text by adopting the text classification model after the first training, and the obtained classification label is negative, the model prediction error is determined, and the model parameters can be adjusted at the moment.

As another example, taking the text classification task as whether to classify, the following example is used to illustrate the second training process of the text classification model applicable to the classification task:

if the non-question is 'did take part in the dance every year', the sample text is 'Zhaosan has taken part in the dance every day', the labeled real label is 'yes', if the second type prediction is carried out on the sample text by adopting the text classification model after the first training, and the obtained classification label is 'yes', the model prediction is determined to be correct, and the model parameters do not need to be adjusted; if the second class prediction is carried out on the sample text by adopting the text classification model after the first training, and the obtained classification label is 'no' or 'uncertain', the model prediction is determined to be wrong, and at the moment, the model parameters can be adjusted.

It should be noted that, when the text classification task is other classifications, the classification labels corresponding to the sample text may be determined in a manner similar to that described above for other sentence classification tasks, viewpoint classification tasks, entity classification tasks, and the like, so as to determine whether there is a difference between the classification labels and the real labels corresponding to the sample text, and if so, the second training may be performed on the text classification model subjected to the first training according to the difference, which is not described herein again.

As an example, a first training process of the text classification model may be referred to as a clustering stage, and a second training process may be referred to as a fine tuning finetune stage, as shown in fig. 6, in the clustering stage, the text classification model may perform first class prediction on the sample text through a cluster label classification layer to obtain a prediction label corresponding to the sample text, so that the text classification model may be subjected to first training according to a difference between the cluster label corresponding to the sample text and the prediction label.

For example, an unsupervised clustering stage may be performed using each sample text, each sample text may be clustered using a clustering algorithm to obtain each target cluster, a cluster label is added to each sample text according to the target cluster to which each sample text belongs, and then the clustered cluster label is learned using a model, so that the model learns some significant semantic information.

In the fine tuning finetune stage, the text classification model can perform second class prediction on the sample text through the real label classification layer to obtain a classification label corresponding to the sample text, so that second training can be performed on the text classification model according to the difference between the classification label of the sample text and the labeled real label.

It can be understood that the clustering can capture more significant semantic features in the sample text, the cluster labels (or called auxiliary labels) of the sample text can be obtained through the clustering mode, and the text classification model can be trained by using the cluster labels, so that the text classification model can learn significant semantic information in the sample text before real training, and the expression of the model can be improved when a small amount of sample text is adopted to perform real training (i.e. second training) on the text classification model.

According to the training method of the text classification model, the second class prediction is carried out on each sample text by adopting the text classification model after the first training, so that the classification label is obtained; and performing second training on the text classification model after the first training according to the difference between the classification label corresponding to each sample text and the labeled real label. Therefore, the real labels marked by the sample texts are adopted to perform the second training on the text classification model after the first training, so that the classification effect of the text classification model can be improved, namely the accuracy and the reliability of the text classification result are improved.

In a possible implementation manner of the embodiment of the present disclosure, before the first training of the text classification model, a third training may be performed on the text classification model, so as to further improve the performance and performance of the model. The third training process is described in detail below with reference to fig. 7.

Fig. 7 is a flowchart illustrating a training method of a text classification model according to a fifth embodiment of the present disclosure.

As shown in fig. 7, the training method of the text classification model may include the following steps:

step 701, obtaining a plurality of sample texts.

The execution process of step 701 may refer to the execution process of any embodiment of the present disclosure, and details are not described herein.

Step 702, for any sample text in the multiple sample texts, masking at least one sample character in any sample text to obtain a masked sample text.

In the embodiment of the present disclosure, masking the sample character means replacing the sample character with a mask character. The mask characters may be preset fixed characters, or the mask characters may also be random characters, which is not limited in this disclosure.

In the embodiment of the present disclosure, for any sample text in a plurality of sample texts, a mask character may be used to perform masking processing on at least one sample character in the any sample text, so that a masked sample text may be obtained. The number of the sample characters subjected to the mask processing may be 1, or may also be multiple, which is not limited in the present disclosure; the number of sample texts to be masked may be, but is not limited to, 1, and the disclosure does not limit this.

As an example, to exemplify the masking processing of the sample text "the game is won with prizes today", assuming that the sample character to be masked is "the game" and the mask character is "high", the masked sample text may be "the game is won with prizes today".

It should be noted that, the above example only uses the mask character as "high" for schematic illustration, and in practical applications, a person skilled in the art may select the mask character used for performing the mask processing on the sample character according to actual business requirements, that is, the mask character is not specifically limited in the embodiment of the present disclosure.

And 703, performing character prediction on the masked sample text by using a text classification model to obtain a predicted text.

In the embodiment of the present disclosure, a text classification model may be used to perform character prediction on the masked sample text, so as to obtain a predicted text. That is, in the present disclosure, the text classification model may predict all characters in the entire text in a similar manner as machine translation, resulting in a predicted text.

Still further to the above example, the text classification model may be input with "today's ratio is high with a premium" and character prediction may be performed on the entire text of "today's ratio is high with a premium" by the text classification model, and the predicted text output by the text classification model may be "today's jump with a premium" and/or "today's race with a premium".

Step 704, performing a third training on the text classification model according to the difference between any sample text and the corresponding predicted text.

In the embodiment of the present disclosure, it may be determined whether there is a difference between a predicted text output by the text classification model and any sample text, and when there is a difference between the any sample text and a corresponding predicted text, it indicates that the prediction accuracy of the model is not high, and therefore, in order to improve the prediction accuracy of the model, model parameters in the text classification model may be adjusted, that is, the text classification model may be subjected to the third training.

As an example, a third loss function may be generated according to a difference between any sample text and a corresponding predicted text, where a value of the third loss function and the difference are in a forward relationship, that is, the smaller the difference is, the smaller the value of the third loss function is, and conversely, the larger the difference is, the larger the value of the third loss function is, so that in the present disclosure, a third training may be performed on the text classification model according to the value of the third loss function, so as to minimize the value of the third loss function.

It should be noted that, in the above example, only the termination condition of the third training of the model is taken as the minimum value of the third loss function, when the method is actually applied, other termination conditions may also be set, for example, the termination condition may also be that the training time reaches a set time threshold, the training time is greater than a set time threshold, and the like, which is not limited by the present disclosure.

In a possible implementation manner of the embodiment of the present disclosure, a text classification model may be further used to predict mask characters in the masked sample text to obtain at least one predicted character, so that a third training may be performed on the text classification model according to a difference between the at least one sample character and the at least one predicted character.

Therefore, the third training of the text classification model can be realized according to different modes, and the flexibility and the applicability of the method can be improved.

In the embodiment of the present disclosure, the masked sample text may be input to a text classification model, and the text classification model predicts mask characters in the masked sample text to obtain at least one predicted character. That is, in the present disclosure, the text classification model may only predict characters that are masked (Mask) off, similar to the task of completing a fill-in-space.

The number of predicted characters is the same as the number of sample characters to be masked.

Thus, in the present disclosure, the text classification model may be subjected to a third training, i.e., model parameters in the text classification model may be adjusted, according to a difference between the at least one sample character and the at least one predicted character.

As an example, a fourth loss function may be generated according to a difference between at least one sample character and at least one predicted character, where a value of the fourth loss function and the difference are in a forward relationship, that is, the smaller the difference, the smaller the value of the fourth loss function, and conversely, the larger the difference, the larger the value of the fourth loss function, so that in the present disclosure, a third training may be performed on the text classification model according to the value of the fourth loss function, so as to minimize the value of the fourth loss function.

It should be noted that, in the above example, only the termination condition of the third training of the model is taken as the minimization of the value of the fourth loss function, when the method is actually applied, other termination conditions may also be set, for example, the termination condition may also be that the number of times of training reaches a set threshold, the training duration is greater than a set threshold, and the like, which is not limited by the present disclosure.

Still taking the above example as an example, the "today's ratio is high and the prize is good" may be input to the text classification model, the text classification model predicts the masked character "race", if the predicted character output by the text classification model is "race", the model prediction is accurate, and there is no need to adjust the model parameters; and if the predicted characters output by the text classification model are other characters, determining that the model is wrong in prediction, and adjusting the model parameters, namely performing third training on the model.

It can be understood that, in addition to the processing mode of the mask for the sample text and the pre-training of the text classification model based on the masked sample text, the text classification model may also be pre-trained by using an auto-regression method, that is, the sample text is filled or expanded based on the existing knowledge of the sample text, which is not limited in this disclosure.

Step 705, clustering the plurality of sample texts to obtain at least one target cluster.

Step 706, generating a cluster label corresponding to each sample text according to the target cluster to which each sample text belongs, wherein the cluster label is used for indicating the cluster type to which each sample text belongs.

The execution process of steps 705 to 706 may refer to the execution process of any embodiment of the present disclosure, and is not described herein again.

And 707, performing first class prediction on each sample text by using the third trained text classification model to obtain a prediction label corresponding to each sample text.

In the embodiment of the present disclosure, the third trained text classification model may be used to perform the first class prediction on each sample text, so as to obtain a prediction label corresponding to each sample text.

Therefore, before the first training is carried out on the text classification model, the third training is carried out on the text classification model, the performance and performance of the model can be improved, and the training effect of the model is improved.

Step 708, performing a first training on the text classification model according to the prediction labels and the cluster labels corresponding to the sample texts.

The execution process of step 708 may refer to the execution process of any embodiment of the present disclosure, and is not described herein again.

According to the training method of the text classification model, at least one sample character in any sample text is masked aiming at any sample text in a plurality of sample texts, so that a masked sample text is obtained; character prediction is carried out on the masked sample text by adopting a text classification model to obtain a predicted text; performing third training on the text classification model according to the difference between any sample text and the corresponding predicted text; and performing first-class prediction on each sample text by adopting the text classification model subjected to the third training so as to obtain a prediction label corresponding to each sample text. Therefore, the sample text is processed in a mask mode, and the text classification model is pre-trained on the basis of the masked sample text, so that the performance and performance of the model can be further improved, and the prediction effect of the model is further improved.

As an example, unlike the existing two-stage training method in fig. 1 (i.e., the pre-training stage and the fine-tuning finetune stage) to train the text classification model, in the present disclosure, an intermediate stage, i.e., the clustering stage, may be added between the pre-training stage and the fine-tuning finetune stage, as shown in fig. 8, the present disclosure may train the text classification model by using a three-stage training method (i.e., the pre-training stage, the clustering stage, and the fine-tuning finetune stage), where the pre-training stage is to perform third training on the text classification model, the clustering stage is to perform first training on the text classification model, and the fine-tuning finetune stage is to perform second training on the text classification model.

In the clustering stage, the task scene or the task category does not need to be considered, namely, the task scene or the task category does not need to be designed independently, and the method is a universal, simple and effective training mode. By carrying out first training on the model, the model can learn the corresponding relation between the text and the cluster label, namely the corresponding relation between the text and the cluster category to which the model belongs, so that the model learns stronger semantic features, and the interference caused by abnormal points can be ignored.

After learning the cluster category to which the text belongs by using the model, in the fine tuning finetune stage, only the semantic feature capability learned by the model itself is needed, so that in the real training of the fine tuning finetune stage, the cluster label classification layer in the clustering stage as shown in fig. 6 does not need to be used, and only the real label labeled by the sample text is needed to learn a new classification layer (i.e., the real label classification layer in fig. 6), wherein the cluster label classification layer learned in the clustering stage and the real label classification layer learned in the fine tuning finetune stage can be as shown in fig. 6.

In conclusion, the three-stage training method is adopted to train the text classification model, and has the following advantages: data enhancement is not needed to be carried out on the sample text, additional mutual data is not needed to be used, and noise is not introduced; when a small amount of sample texts are adopted to perform second training on the model, the text classification effect of the model can be improved; special design is not needed for scenes, data or categories, the mode is simple and convenient, and the method has strong universality.

In the embodiments corresponding to the training method of the text classification model, the disclosure further provides an application method of the text classification model, that is, a text classification method.

Fig. 9 is a schematic flowchart of a text classification method according to a sixth embodiment of the present disclosure.

As shown in fig. 9, the text classification method may include the steps of:

step 901, obtaining a text to be classified.

In the embodiment of the present disclosure, the text to be classified may be obtained from an existing test set, or the text to be classified may also be collected online, for example, the text to be classified may be collected online by using a web crawler technology, or the text to be classified may also be collected offline, or the text to be classified may also be a text input by a user, and the like, which is not limited in the embodiment of the present disclosure.

And 902, classifying the texts to be classified by adopting the trained text classification model to obtain the classification labels of the texts to be classified.

The text classification model can be obtained by training by adopting any method embodiment.

In the embodiment of the present disclosure, the text to be classified may be input into the trained text classification model, and the text classification model classifies the text to be classified to obtain the classification label of the text to be classified output by the text classification model.

The text classification method of the embodiment of the disclosure comprises the steps of obtaining a text to be classified; and classifying the texts to be classified by adopting the trained text classification model to obtain the classification labels of the texts to be classified. Therefore, the text to be classified is classified based on the deep learning technology, and the accuracy and reliability of the classification result can be improved.

Corresponding to the training method of the text classification model provided in the embodiments of fig. 2 to 7, the present disclosure also provides a training device of the text classification model, and since the training device of the text classification model provided in the embodiments of the present disclosure corresponds to the training method of the text classification model provided in the embodiments of fig. 2 to 7, the implementation manner of the training method of the text classification model is also applicable to the training device of the text classification model provided in the embodiments of the present disclosure, and will not be described in detail in the embodiments of the present disclosure.

Fig. 10 is a schematic structural diagram of a training apparatus for a text classification model according to a seventh embodiment of the present disclosure.

As shown in fig. 10, the training apparatus 1000 for the text classification model may include: the system comprises an acquisition module 1001, a clustering module 1002, a generation module 1003, a first prediction module 1004 and a first training module 1005.

The obtaining module 1001 is configured to obtain a plurality of sample texts.

The clustering module 1002 is configured to cluster the plurality of sample texts to obtain at least one target cluster.

The generating module 1003 is configured to generate a cluster label corresponding to each sample text according to the target cluster to which each sample text belongs, where the cluster label is used to indicate a cluster type to which each sample text belongs.

The first prediction module 1004 is configured to perform first class prediction on each sample text by using a text classification model to obtain a prediction tag corresponding to each sample text.

The first training module 1005 is configured to perform first training on the text classification model according to the prediction labels and the cluster labels corresponding to the sample texts.

In a possible implementation manner of the embodiment of the present disclosure, the clustering module 1002 is specifically configured to:

obtaining similarity among a plurality of sample texts; and clustering the plurality of sample texts according to the similarity among the plurality of sample texts to obtain at least one target cluster.

In a possible implementation manner of the embodiment of the present disclosure, the clustering module 1002 is specifically configured to: clustering the plurality of sample texts according to the set number of the first clusters by adopting a first clustering algorithm according to the similarity among the plurality of sample texts to obtain initial clusters of the number of the first clusters; determining the distance between each initial cluster; and under the condition that the distance between the initial clusters is smaller than the set inter-cluster distance threshold, clustering the plurality of sample texts by adopting a second clustering algorithm according to the similarity between the plurality of sample texts according to the inter-cluster distance threshold so as to obtain at least one target cluster.

In a possible implementation manner of the embodiment of the present disclosure, the clustering module 1002 is specifically configured to: clustering the plurality of sample texts according to the similarity among the plurality of sample texts by adopting a first clustering algorithm according to the set number of the first clusters to obtain initial clusters of the first cluster number; determining the distance between each initial cluster; under the condition that the distance between the initial clusters is smaller than a set distance threshold value between clusters, adjusting the number of the first clusters; and clustering the plurality of sample texts again by adopting a first clustering algorithm according to the adjusted first cluster number and the inter-cluster distance threshold value to obtain the target cluster of the adjusted first cluster number.

In a possible implementation manner of the embodiment of the present disclosure, the training apparatus 1000 of the text classification model may further include:

and the second prediction module is used for performing second class prediction on each sample text by adopting the text classification model after the first training to obtain a classification label.

And the second training module is used for carrying out second training on the text classification model after the first training according to the difference between the classification label corresponding to each sample text and the labeled real label.

the first mask module is used for performing mask operation on at least one sample character in any sample text in the multiple sample texts to obtain a masked sample text.

And the third prediction module is used for performing character prediction on the masked sample text by adopting a text classification model to obtain a predicted text.

And the third training module is used for carrying out third training on the text classification model according to the difference between any sample text and the corresponding predicted text.

and the second mask module is used for masking at least one sample character in any sample text in the multiple sample texts to obtain a masked sample text.

And the fourth prediction module is used for predicting the mask characters in the masked sample text by adopting the text classification model so as to obtain at least one predicted character.

And the fourth training module is used for carrying out third training on the text classification model according to the difference between the at least one sample character and the at least one predicted character.

In a possible implementation manner of the embodiment of the present disclosure, the first prediction module 1004 is specifically configured to: and performing first-class prediction on each sample text by adopting the text classification model subjected to third training to obtain a prediction label corresponding to each sample text.

The training device for the text classification model of the embodiment of the disclosure obtains at least one target cluster by obtaining a plurality of sample texts and clustering the plurality of sample texts; generating a cluster label corresponding to each sample text according to the target cluster to which each sample text belongs, wherein the cluster label is used for indicating the cluster type to which each sample text belongs; performing first-class prediction on each sample text by adopting a text classification model to obtain a prediction label corresponding to each sample text; and performing first training on the text classification model according to the prediction label and the cluster label corresponding to each sample text. According to the method and the device, the significant semantic features in the sample texts can be captured by clustering, the cluster labels corresponding to the sample texts are generated by clustering the sample texts, and the text classification model is pre-trained on the basis of the cluster labels, so that the text classification model can effectively learn the significant semantic information in the sample texts before real training, and the model expression and performance can be improved when a small amount of sample texts are utilized to perform real training on the text classification model.

Corresponding to the text classification method provided in the embodiment of fig. 9, the present disclosure also provides a text classification device, and since the text classification device provided in the embodiment of the present disclosure corresponds to the text classification method provided in the embodiment of fig. 9, the implementation manner of the text classification method is also applicable to the text classification device provided in the embodiment of the present disclosure, and is not described in detail in the embodiment of the present disclosure.

Fig. 11 is a schematic structural diagram of a text classification apparatus according to an eighth embodiment of the present disclosure.

As shown in fig. 11, the text classification apparatus 1100 may include: an acquisition module 1101 and a classification module 1102.

The obtaining module 1101 is configured to obtain a text to be classified.

The classification module 1102 is configured to classify the text to be classified by using the text classification model trained by the training apparatus of the text classification model shown in fig. 10, so as to obtain a classification label of the text to be classified.

The text classification device of the embodiment of the disclosure acquires the text to be classified; and classifying the texts to be classified by adopting the trained text classification model to obtain the classification labels of the texts to be classified. Therefore, the text to be classified is classified based on the deep learning technology, and the accuracy and reliability of the classification result can be improved.

To implement the above embodiments, the present disclosure also provides an electronic device, which may include at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform a training method or a text classification method of a text classification model according to any of the above embodiments of the disclosure.

In order to achieve the above embodiments, the present disclosure also provides a non-transitory computer readable storage medium storing computer instructions for causing a computer to execute a training method or a text classification method of a text classification model proposed in any one of the above embodiments of the present disclosure.

In order to implement the above embodiments, the present disclosure further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the training method or the text classification method of the text classification model proposed in any of the above embodiments of the present disclosure.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 12 shows a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 12, the electronic apparatus 1200 includes a computing unit 1201 which can perform various appropriate actions and processes in accordance with a computer program stored in a ROM (Read-Only Memory) 1202 or a computer program loaded from a storage unit 1208 into a RAM (Random Access Memory) 1203. In the RAM 1203, various programs and data necessary for the operation of the electronic apparatus 1200 may also be stored. The computing unit 1201, the ROM 1202, and the RAM 1203 are connected to each other by a bus 1204. An I/O (Input/Output) interface 1205 is also connected to the bus 1204.

Various components in the electronic device 1200 are connected to the I/O interface 1205, including: an input unit 1206 such as a keyboard, a mouse, or the like; an output unit 1207 such as various types of displays, speakers, and the like; a storage unit 1208, such as a magnetic disk, optical disk, or the like; and a communication unit 1209 such as a network card, modem, wireless communication transceiver, etc. The communication unit 1209 allows the electronic device 1200 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 1201 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing Unit 1201 include, but are not limited to, a CPU (Central Processing Unit), a GPU (graphics Processing Unit), various dedicated AI (Artificial Intelligence) computing chips, various computing Units running machine learning model algorithms, a DSP (Digital Signal Processor), and any suitable Processor, controller, microcontroller, and the like. The calculation unit 1201 performs various methods and processes described above, such as the above-described training method of the text classification model or the text classification method. For example, in some embodiments, the above-described training method of the text classification model or the text classification method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 1208. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 1200 via the ROM 1202 and/or the communication unit 1209. When the computer program is loaded into the RAM 1203 and executed by the computing unit 1201, one or more steps of the training method of the text classification model or the text classification method described above may be performed. Alternatively, in other embodiments, the computing unit 1201 may be configured in any other suitable way (e.g., by means of firmware) to perform the above-described training method of the text classification model or the text classification method.

Various implementations of the systems and techniques described here above may be realized in digital electronic circuitry, Integrated circuitry, FPGAs (Field Programmable Gate arrays), ASICs (Application-Specific Integrated circuits), ASSPs (Application Specific Standard products), SOCs (System On Chip, System On a Chip), CPLDs (Complex Programmable Logic devices), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a RAM, a ROM, an EPROM (Electrically Programmable Read-Only-Memory) or flash Memory, an optical fiber, a CD-ROM (Compact Disc Read-Only-Memory), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a Display device (e.g., a CRT (Cathode Ray Tube) or LCD (Liquid Crystal Display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: LAN (Local Area Network), WAN (Wide Area Network), internet, and blockchain Network.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service extensibility in a conventional physical host and VPS service (Virtual Private Server). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be noted that artificial intelligence is a subject for studying a computer to simulate some human thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), and includes both hardware and software technologies. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge map technology and the like.

According to the technical scheme of the embodiment of the disclosure, at least one target cluster is obtained by obtaining a plurality of sample texts and clustering the sample texts; generating a cluster label corresponding to each sample text according to the target cluster to which each sample text belongs, wherein the cluster label is used for indicating the cluster type to which each sample text belongs; performing first-class prediction on each sample text by adopting a text classification model to obtain a prediction label corresponding to each sample text; and performing first training on the text classification model according to the prediction label and the cluster label corresponding to each sample text. According to the method and the device, the significant semantic features in the sample texts can be captured by clustering, the cluster labels corresponding to the sample texts are generated by clustering the sample texts, and the text classification model is pre-trained on the basis of the cluster labels, so that the text classification model can effectively learn the significant semantic information in the sample texts before real training, and the model expression and performance can be improved when a small amount of sample texts are utilized to perform real training on the text classification model.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions proposed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of training a text classification model, the method comprising:

2. The method of claim 1, wherein said clustering said plurality of sample texts to obtain at least one target cluster comprises:

obtaining the similarity among the sample texts;

and clustering the sample texts according to the similarity among the sample texts to obtain at least one target cluster.

3. The method of claim 2, wherein said clustering said plurality of sample texts according to similarities therebetween to obtain at least one target cluster comprises:

clustering the plurality of sample texts according to the similarity among the plurality of sample texts by adopting a first clustering algorithm according to the set number of first clusters to obtain initial clusters of the first cluster number;

determining a distance between each of the initial clusters;

and under the condition that the distance between the initial clusters is smaller than a set inter-cluster distance threshold, clustering the plurality of sample texts by adopting a second clustering algorithm according to the similarity between the plurality of sample texts according to the inter-cluster distance threshold so as to obtain at least one target cluster.

4. The method of claim 2, wherein said clustering said plurality of sample texts according to similarities therebetween to obtain at least one target cluster comprises:

clustering the plurality of sample texts according to the set number of first clusters by adopting a first clustering algorithm according to the similarity among the plurality of sample texts to obtain initial clusters of the number of the first clusters;

determining a distance between each of the initial clusters;

under the condition that the distance between the initial clusters is smaller than a set distance threshold value between clusters, adjusting the number of the first clusters;

and clustering the plurality of sample texts again by adopting the first clustering algorithm according to the adjusted first cluster number and the inter-cluster distance threshold value to obtain the target cluster of the adjusted first cluster number.

5. The method of claim 1, wherein the method further comprises:

performing second class prediction on each sample text by adopting the text classification model after the first training to obtain a classification label;

and performing second training on the text classification model after the first training according to the difference between the classification label corresponding to each sample text and the labeled real label.

6. The method of claim 1, wherein after the obtaining a plurality of sample texts, the method further comprises:

for any sample text in the multiple sample texts, masking at least one sample character in the any sample text to obtain a masked sample text;

character prediction is carried out on the sample text after the mask by adopting the text classification model so as to obtain a predicted text;

and performing third training on the text classification model according to the difference between any sample text and the corresponding predicted text.

7. The method of claim 1, wherein after the obtaining a plurality of sample texts, the method further comprises:

predicting mask characters in the masked sample text by adopting the text classification model to obtain at least one predicted character;

performing a third training of the text classification model based on a difference between the at least one sample character and the at least one predicted character.

8. The method according to claim 6 or 7, wherein the performing a first class prediction on each sample text by using a text classification model to obtain a prediction label corresponding to each sample text comprises:

and performing first-class prediction on each sample text by adopting a third-trained text classification model to obtain a prediction label corresponding to each sample text.

9. A method of text classification, the method comprising:

acquiring a text to be classified;

classifying the text to be classified by adopting the text classification model trained by the method according to any one of claims 1-8 to obtain the classification label of the text to be classified.

10. An apparatus for training a text classification model, the apparatus comprising:

the acquisition module is used for acquiring a plurality of sample texts;

the generating module is used for generating cluster labels corresponding to the sample texts according to target clusters to which the sample texts belong, wherein the cluster labels are used for indicating cluster categories to which the sample texts belong;

and the first training module is used for performing first training on the text classification model according to the prediction labels and the cluster labels corresponding to the sample texts.

11. The apparatus according to claim 10, wherein the clustering module is specifically configured to:

obtaining the similarity among the sample texts;

12. The apparatus according to claim 11, wherein the clustering module is specifically configured to:

determining a distance between each of the initial clusters;

13. The apparatus according to claim 11, wherein the clustering module is specifically configured to:

determining a distance between each of the initial clusters;

14. The apparatus of claim 10, wherein the apparatus further comprises:

the second prediction module is used for performing second class prediction on each sample text by adopting the text classification model after the first training to obtain a classification label;

15. The apparatus of claim 10, wherein the apparatus further comprises:

a first masking module, configured to mask, for any sample text in the multiple sample texts, at least one sample character in the any sample text, so as to obtain a masked sample text;

the third prediction module is used for performing character prediction on the masked sample text by adopting the text classification model to obtain a predicted text;

16. The apparatus of claim 10, wherein the apparatus further comprises:

a second masking module, configured to mask, for any sample text in the multiple sample texts, at least one sample character in the any sample text, so as to obtain a masked sample text;

the fourth prediction module is used for predicting mask characters in the masked sample text by adopting the text classification model so as to obtain at least one predicted character;

17. The apparatus according to claim 15 or 16, wherein the first prediction module is specifically configured to:

18. An apparatus for text classification, the apparatus comprising:

the acquisition module is used for acquiring texts to be classified;

a classification module, configured to classify the text to be classified by using the text classification model trained by the apparatus according to any one of claims 10-17, to obtain a classification label of the text to be classified.

19. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8 or, alternatively, to perform the method of claim 9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-8 or to perform the method of claim 9.

21. A computer program product comprising a computer program which, when executed by a processor, carries out the steps of the method according to any one of claims 1-8 or carries out the steps of the method according to claim 9.