CN117493557A - Training method and device for text classification model, electronic equipment and storage medium - Google Patents

Training method and device for text classification model, electronic equipment and storage medium Download PDF

Info

Publication number
CN117493557A
CN117493557A CN202310679685.6A CN202310679685A CN117493557A CN 117493557 A CN117493557 A CN 117493557A CN 202310679685 A CN202310679685 A CN 202310679685A CN 117493557 A CN117493557 A CN 117493557A
Authority
CN
China
Prior art keywords
text
initial
value
values
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310679685.6A
Other languages
Chinese (zh)
Inventor
范智超
蒋宁
陆全
夏粉
吴海英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mashang Xiaofei Finance Co Ltd
Original Assignee
Mashang Xiaofei Finance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mashang Xiaofei Finance Co Ltd filed Critical Mashang Xiaofei Finance Co Ltd
Priority to CN202310679685.6A priority Critical patent/CN117493557A/en
Publication of CN117493557A publication Critical patent/CN117493557A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The disclosure provides a training method and device for a text classification model, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring sample data and a labeling text category label corresponding to the sample data; inputting the sample data and the labeled text category labels into an initial text classification model to perform text category prediction processing to obtain a plurality of predicted text category labels corresponding to the sample data; aiming at a plurality of predicted text category labels, obtaining a plurality of first initial loss values by acquiring error values between each predicted text category label and a standard text category label with a corresponding relation as the first initial loss values; obtaining a target loss value according to the plurality of first initial loss values; and adjusting parameters of the initial text classification model according to the target loss value to obtain the text classification model. According to the embodiment of the disclosure, the convergent text classification model can be quickly trained, and the accuracy of the classification result of the text classification model obtained through training can be improved.

Description

Training method and device for text classification model, electronic equipment and storage medium
Technical Field
The disclosure relates to the field of computer technology, and in particular relates to a training method of a text classification model, a text classification method and device, electronic equipment and a storage medium.
Background
At present, when a text classification model is trained, the text classification model is usually trained to classify a text sample into a single text category, in the training process of the text classification model, only one text category label is required to be set for the label of the text sample, and in the training process, a loss value is calculated based on the error between the predicted text category label and the text category label in the label, and then the model is optimized according to the loss value.
However, in the hierarchical multi-label text classification task, a text classification model is required to be used for simultaneously predicting a plurality of text class labels with a hierarchical structure, however, since the text class labels with the hierarchical structure bring hierarchical relations to the text classes of the text sample and greatly increase the computational complexity, if the text classification model for performing the hierarchical multi-label text classification task is trained by using the model training method for training single text class classification, difficulty is caused to model learning, thus inaccurate model prediction results are caused, and even the problem that the model cannot predict is possibly caused.
Disclosure of Invention
The disclosure provides a training method and device for a text classification model, electronic equipment and a storage medium.
In a first aspect, the present disclosure provides a training method of a text classification model, the training method of the text classification model including:
acquiring sample data and a labeling text category label corresponding to the sample data; the labeling text category labels are a plurality of standard text category labels, a preset hierarchical relation is met among the standard text category labels, each hierarchical relation comprises one text category label, and two text category labels of adjacent hierarchical relations are containing relations;
inputting the sample data and the labeled text category labels corresponding to the sample data into an initial text classification model to perform text category prediction processing, so as to obtain a plurality of predicted text category labels corresponding to the sample data; wherein the preset hierarchical relationship is satisfied among the plurality of predictive text category labels;
aiming at the plurality of predicted text category labels, obtaining a plurality of first initial loss values by acquiring error values between each predicted text category label and a standard text category label with a corresponding relation as the first initial loss values; the corresponding relation is that a label level of a standard text class label and a label level of a predicted text class label belong to the same level, and the first initial loss value is used for representing predicted loss of the initial text classification model on the label level corresponding to the first initial loss value;
Obtaining a target loss value according to the plurality of first initial loss values;
and adjusting parameters of the initial text classification model according to the target loss value to obtain a text classification model.
In a second aspect, the present disclosure provides a text classification method comprising:
acquiring target text data to be classified;
inputting the target text data into a text classification model to perform text category prediction processing to obtain a plurality of target text category labels corresponding to the target text data, wherein the text classification model is obtained according to the training method of the text classification model of the first aspect.
In a third aspect, the present disclosure provides a training apparatus for a text classification model, the training apparatus for a text classification model including:
the sample data acquisition unit is used for acquiring sample data and a labeling text category label corresponding to the sample data; the labeling text category labels are a plurality of standard text category labels, a preset hierarchical relation is met among the standard text category labels, each hierarchical relation comprises one text category label, and two text category labels of adjacent hierarchical relations are containing relations;
The prediction unit is used for inputting the sample data and the labeled text category labels corresponding to the sample data into an initial text classification model to perform text category prediction processing, so as to obtain a plurality of predicted text category labels corresponding to the sample data; wherein the preset hierarchical relationship is satisfied among the plurality of predictive text category labels;
the first initial loss value obtaining unit is used for obtaining a plurality of first initial loss values by obtaining error values between each predicted text category label and a standard text category label with a corresponding relation as the first initial loss values for the plurality of predicted text category labels; the corresponding relation is that a label level of a standard text class label and a label level of a predicted text class label belong to the same level, and the first initial loss value is used for representing predicted loss of the initial text classification model on the label level corresponding to the first initial loss value;
the target loss value acquisition unit is used for acquiring target loss values according to the plurality of first initial loss values;
and the adjusting unit is used for adjusting the parameters of the initial text classification model according to the target loss value to obtain a text classification model.
In a fourth aspect, the present disclosure provides a text classification apparatus comprising:
the text data acquisition unit is used for acquiring target text data to be classified;
the classification unit is used for inputting the target text data into a text classification model to conduct text category prediction processing to obtain a plurality of target text category labels corresponding to the target text data, wherein the text classification model is obtained according to the training method of the text classification model of the first aspect.
In a fifth aspect, the present disclosure provides an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores one or more computer programs executable by the at least one processor, the one or more computer programs being executable by the at least one processor to enable the at least one processor to perform the training method of the text classification model of the first aspect or the text classification method of the second aspect described above.
In a sixth aspect, the present disclosure provides a computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the training method of the text classification model of the first aspect or the text classification method of the second aspect described above.
In the embodiment provided by the disclosure, in the process of training a text classification model, sample data and labeled text class labels corresponding to the sample data are input into an initial text classification model to perform text classification prediction processing, so that a plurality of predicted text class labels corresponding to the sample data and meeting a preset hierarchical relationship are obtained; and when the model loss value is calculated, obtaining a plurality of first initial loss values by obtaining an error value between each predicted text category label and a standard text category label with a corresponding relation as a first initial loss value, obtaining a target loss value based on the plurality of first initial loss values, and further adjusting parameters of the initial text classification model according to the target loss value to obtain the text classification model.
Because the standard text class labels with corresponding relations with each predicted text class label are used as the labeled text class labels of the sample data, the label levels in the standard text class labels and the label levels of the predicted text class labels belong to the same level, so that the predicted loss of the initial text classification model when predicting the text class labels on each label level can be respectively obtained based on the plurality of first initial loss values, the target loss value obtained based on the plurality of first initial loss values can represent the optimization direction of the model on each label level in the training round after the current training round, the parameters of the initial text classification model can be quickly trained based on the target loss value, the converged text classification model can be obtained through rapid training, and meanwhile, the accuracy of the classification result of the text classification model obtained through training can be further improved due to the fact that each label level can be pertinently optimized in the training process.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure, without limitation to the disclosure. The above and other features and advantages will become more readily apparent to those skilled in the art by describing in detail exemplary embodiments with reference to the attached drawings, in which:
FIG. 1 is a flowchart of a training method for a text classification model according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a framework for training a text classification model provided by an embodiment of the present disclosure;
FIG. 3 is a flow chart for obtaining a target loss value provided by an embodiment of the present disclosure;
FIG. 4 is a flow chart for obtaining real-time attenuation coefficients provided by an embodiment of the present disclosure;
FIG. 5 is a flowchart of a text classification method provided by an embodiment of the present disclosure;
fig. 6 is a schematic diagram of an application scenario provided in an embodiment of the present disclosure;
FIG. 7 is a block diagram of a training device for text classification models provided by embodiments of the present disclosure;
FIG. 8 is a block diagram of a text classification device provided by an embodiment of the present disclosure;
fig. 9 is a block diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
For a better understanding of the technical solutions of the present disclosure, exemplary embodiments of the present disclosure will be described below with reference to the accompanying drawings, in which various details of the embodiments of the present disclosure are included to facilitate understanding, and they should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Embodiments of the disclosure and features of embodiments may be combined with each other without conflict.
As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
In the related art, a text classification model is usually a single text category for predicting and obtaining a text, for example, in the field of electronic commerce, a water dispenser of xx brand aiming at the text "hello, finds that an instant heating function is not good in use, is operated according to a product specification, is not a quality problem", and if the text is predicted based on the existing text classification model, the text category for obtaining the text may be output as "after-sales category".
Aiming at a text classification model for predicting a single text category, in the model training process, only a single text category label is required to be set for the label of a text sample, model parameters are optimized based on a predicted loss value between the predicted text category label of the text sample and the text category label in the label, and the model parameters can be trained to obtain the text classification model.
However, the text typically does not belong to only a single text category, i.e. a piece of text is typically tagged with multiple text categories, e.g. it may correspond to three hierarchically structured text category tags for the above text simultaneously "after-market category", "after-market consultation" and "product quality problem". For this purpose, for a hierarchical multi-label text classification task, in order to train a text classification model for predicting a plurality of text categories of a text, when constructing its training set, a plurality of text category labels having a hierarchical structure need to be set for labels of a text sample, for example, for the above text, a plurality of text category labels having a label of "after-sales-after-sales consultation-product quality problem", that is, comprising three hierarchical structures, in which a text category label of a lower hierarchy is constrained by a text category label of a higher hierarchy, need to be set. Because the hierarchical structure brings the hierarchical relationship among the class labels and also brings the computational complexity, when a text classification model for carrying out hierarchical multi-label text classification tasks is trained based on the training set, difficulty is usually caused to model learning, and the model prediction result is inaccurate, and even the problem that the model cannot be predicted is possibly caused.
Specifically, when training the text classification model, there is usually a long-tail problem (long-tailed) in the training set, that is, most text class labels correspond to relatively few text samples, few text samples are associated with some text class labels, even some text class labels do not have associated text samples, especially for text class labels at the bottom layer of the hierarchical structure near leaf nodes, the associated text samples are usually very few, so that when training the text classification model by using the training set, the model is difficult to learn correct classification information, even the model cannot be converged, and therefore a plurality of text class labels with hierarchical structures of texts cannot be accurately predicted.
In the related art, in order to solve the problem, in a hierarchical multi-label text classification task, one of the training methods adopted for a text classification model to be trained is as follows: 1. converting the text and the labels thereof into word embedded vectors and label embedded vectors respectively; 2. performing preliminary feature extraction on the word embedding input by using a Bi-GRU (Bidirectional GRU) model; modeling a hierarchical label structure system by using a graph convolution neural network, and generating a label representation containing label relevance; 3. the method comprises the steps of using a plurality of convolution neural networks with different convolution kernel sizes to further conduct local feature extraction of different granularities on output of a Bi-GRU model, respectively conducting pooling to the maximum degree, then splicing the output to form a text feature, and further conducting feature extraction on the text feature subjected to local feature extraction by using attention based on label representation; meanwhile, the self-attention mechanism is used for carrying out global feature extraction on the output of the Bi-GRU model; 4. adaptively fusing text features based on the tag representation and text features based on the self-attention mechanism representation to obtain a text representation based on mixed attention; 5. extracting information among labels through a relational network, and obtaining a final classification result through a multi-layer perceptron; 6. and adjusting parameters of the whole model based on Cross Entropy (Cross Entropy) loss between the classification result and the text sample labeling to obtain a converged text classification model.
In the method, in order to enable the model to predict a plurality of text categories to which the text belongs, text feature extraction is performed by using Bi-GRU models respectively, tag representation is generated by using convolutional neural networks, and finally, relevance ranking is performed by fusing the text and the features of the tags to predict the plurality of text categories to which the text belongs, namely, the emphasis is on the design of a model structure, so that the model can accurately learn the relation between various types of tags and text samples, but for a loss value in the training process, the error between a predicted classification result and a real result is simply calculated by using cross entropy, the loss of the model on each level is not calculated respectively, the training result has no obvious emphasis on each level, and therefore, the problem that the model prediction result is not accurate enough due to the long tail problem in the training set cannot be effectively solved.
In addition, in order to solve the problems, another training method for the text classification model also exists in the related technology, specifically: embedding graph nodes on the undirected graph of the text and the keywords by using an attention mechanism, and outputting updated text node characteristics; inputting the updated text node characteristics into a pre-trained deep neural network model with a plurality of outputs to obtain an overall global label and a local label; and combining the local tag and the global tag through an attention mechanism, obtaining a final tag classification result after the combined tag is subjected to calculation processing, and calculating a loss value based on an error value between the tag classification result and the text label so as to tune the model.
In the method, the emphasis of training is based on a complex model structure so that the model can accurately predict a plurality of text categories to which the text belongs, the method is also simple in calculating the overall loss of the model when calculating the loss value in the training process, the loss of each level is not concerned, and the training directions of different levels are not optimized in the subsequent training process based on the loss of each level, so that the method can not effectively solve the problem that the model prediction result is inaccurate due to the long tail problem in the training set.
Therefore, the embodiment of the disclosure provides a training method of a text classification model, which includes inputting sample data and labeled text class labels corresponding to the sample data into an initial text classification model to perform text classification prediction processing, so as to obtain a plurality of predicted text class labels corresponding to the sample data and meeting a preset hierarchical relationship; and when the model loss value is calculated, obtaining a plurality of first initial loss values by obtaining an error value between each predicted text category label and a standard text category label with a corresponding relation as a first initial loss value, obtaining a target loss value based on the plurality of first initial loss values, and further adjusting parameters of the initial text classification model according to the target loss value to obtain the text classification model.
Because the standard text class labels with corresponding relations with each predicted text class label are used as the labeled text class labels of the sample data, the label levels in the standard text class labels and the label levels of the predicted text class labels belong to the same level, so that the predicted loss of the initial text classification model when predicting the text class labels on each label level can be respectively obtained based on the plurality of first initial loss values, the target loss value obtained based on the plurality of first initial loss values can represent the optimization direction of the model on each label level in the training round after the current training round, the parameters of the initial text classification model can be quickly trained based on the target loss value, the converged text classification model can be obtained, and meanwhile, the accuracy of the classification result of the text classification model obtained by training can be further improved due to the fact that the specific optimization is carried out on each label level in the training process.
Referring to fig. 1 and fig. 2, a flowchart of a training method for a text classification model according to an embodiment of the disclosure and a schematic frame diagram for training a text classification model according to an embodiment of the disclosure are shown, and the training method for a text classification model according to an embodiment of the disclosure is described below with reference to fig. 1 and fig. 2. It should be noted that the method may be applied to an electronic device, which may be a terminal device or may be a server, and is not limited herein.
As shown in fig. 1, the training method of the text classification model provided in the embodiment of the present disclosure may include the following steps S101 to S105.
Step S101, obtaining sample data and a labeling text category label corresponding to the sample data; the labeling text category labels are a plurality of standard text category labels, and the standard text category labels meet a preset hierarchical relationship; the preset hierarchical relationship is as follows: each hierarchy includes one text category label, and an inclusion relationship is between two text category labels of adjacent hierarchies.
In the embodiment of the present disclosure, the sample data may be any text data in a training set, which is not limited herein, wherein the training set includes a plurality of pieces of sample data for training the initial text classification model.
The labeling text class labels of the sample data, namely the labeling of the sample data, can be a plurality of standard text class labels, the standard text class labels are information for truly representing the text class of the sample data on the corresponding level, the plurality of standard text class labels meet a preset level relation, the predicted level relation specifically can be that each level only comprises one text class label, two text class labels of adjacent levels are inclusion relations, namely the plurality of standard text class labels are level multi-labels which are in one-to-one correspondence with the plurality of levels, and the two standard text class labels of the adjacent levels are inclusion relations.
For example, for the text "hello, xx brand water dispenser finds that the instant heating function is not good during use, all operating according to the product specification, is not quality problem", the text may correspond to three levels of standard text category labels as shown in fig. 2, level 1, i.e. the standard text category label on L1 may be "after-market category", level 2, i.e. the standard text category label on L2 may be "after-market consultation", and level 3, i.e. the standard text category label on L3 may be "product quality problem". Of course, the level of the plurality of standard text class labels corresponding to the sample data is exemplified as level 3, and the number of levels is not particularly limited in actual implementation, and may be at least level 1.
Step S102, inputting sample data and labeled text category labels corresponding to the sample data into an initial text classification model for text category prediction processing to obtain a plurality of predicted text category labels corresponding to the sample data; the plurality of predictive text category labels meet a preset hierarchical relationship.
The initial text classification model may be a neural network model for predicting a plurality of text class labels to which the text belongs, and the plurality of text class labels satisfy a preset hierarchical relationship, that is, the initial text classification model may be used for predicting a hierarchical multi-label of the text for performing a hierarchical multi-label text classification task. In the embodiment of the present disclosure, the model structure of the initial text classification model is not particularly limited, and the initial text classification model may be a network model of any structure for predicting a plurality of text category labels to which a text belongs.
The plurality of predictive text category labels corresponding to the sample data refer to classification results obtained by performing text classification processing on the sample data based on the initial text classification model in each training round (Epoch) in the process of training the initial text classification model.
It may be understood that in the embodiment of the present disclosure, the number of levels between the plurality of standard text category labels in the labeled text category labels of the sample data is the same as the number of levels between the plurality of predicted text category labels obtained by performing the text category prediction processing on the sample data, for example, in the case that the number of levels between the plurality of standard text category labels in the labeled text category labels of the sample data is 3, the number of levels between the plurality of predicted text category labels corresponding to the sample data is also 3.
For example, for the above text, in inputting it into the initial text classification model, the obtained plurality of predictive text category labels may be the classification result of "after-sales-consultation-product use problem" as shown in fig. 2, that is, the predictive text category label on level 1 is "after-sales", the predictive text category label on level 2 is "after-sales consultation", and the predictive text category label on level 3 is "product use problem".
Step S103, aiming at a plurality of predicted text category labels, obtaining a plurality of first initial loss values by acquiring error values between each predicted text category label and a standard text category label with a corresponding relation as first loss values; the corresponding relation is that the label level of the standard text type label and the label level of the predicted text type label belong to the same level, and the first initial loss value is used for representing the predicted loss of the initial text classification model on the label level corresponding to the first initial loss value.
Still taking the above text as an example, as shown in fig. 2, when the text is labeled as a plurality of standard class labels, "after-sales consultation-product quality problem", and the predicted result is a plurality of predicted text class labels, "after-sales consultation-product use problem", the standard class labels having a correspondence relationship with each predicted text class label may be as shown in the following table 1:
label hierarchy Predictive text category labels Standard text category label with corresponding relation
Level 1 After-sales class After-sales class
Level 2 After-sales consultation After-sales consultation
Level 3 Problem of product use Product quality problem
TABLE 1
The predicted loss of the model at each label level, i.e., the first initial loss value in the embodiments of the present disclosure, may be an error value between the predicted text class label obtained by prediction and the standard text class label with a correspondence.
For example, the first initial loss value of the model at level 1 may be obtained by calculating the error value between the predicted text category label "after-market class" and the corresponding standard text category label "after-market class"; the first initial loss value of the model on the level 2 can be obtained by calculating an error value between the predictive text category label after-sales consultation and the corresponding standard text category label after-sales consultation; and, a first initial loss value of the model at level 3, may be obtained by calculating an error value between the predicted text category label "product use problem" and the corresponding standard text category label "product quality problem".
In the embodiment of the disclosure, when calculating the error value between each predicted text category label and the standard text category label with the corresponding relation, considering that in the hierarchical multi-label classification task, the labels do not have mutual exclusion relation, after the error between each predicted text category label and the standard text category label with the corresponding relation is converted into a value between 0 and 1 by using a sigmoid activation function, the first initial loss value is calculated by using a binary cross entropy loss (BCE-loss, binary CrossEntropy Loss) function.
Step S104, obtaining a target loss value according to the plurality of first initial loss values.
After obtaining the plurality of first initial loss values for representing the predicted loss of the model at each label level during the training based on the above step S103, a target loss value may be obtained based on the plurality of first initial loss values as shown in fig. 2, for example, the target loss value may be obtained by summing the plurality of first initial loss values.
And step S105, adjusting parameters of the initial text classification model according to the target loss value to obtain the text classification model.
With continued reference to fig. 2, after the target loss value is obtained based on step S104, since the target loss value is obtained based on the predicted loss of the model at each label level, the initial text classification model may be subjected to parameter adjustment processing based on the target loss value, that is, parameters thereof may be adjusted and optimized, so as to obtain the text classification model meeting the preset convergence condition through continuous iterative optimization, where the preset convergence condition may be set as required, and the preset convergence condition may be, for example, a training round reaching a preset training round, or may be that the obtained target loss value is lower than a preset loss threshold, and is not limited in this way.
Therefore, based on the plurality of first initial loss values, the predicted loss of the initial text classification model when predicting the text class label on each label level can be obtained respectively, and the target loss values obtained based on the plurality of first initial loss values can represent the optimization direction of the model on each label level in the subsequent training round, so that the parameters of the initial text classification model can be adjusted based on the target loss values, the convergent text classification model can be obtained through rapid training, and meanwhile, the accuracy of the classification result of the obtained text classification model can be improved through training because each label level can be optimized in a targeted manner in the training process.
In some embodiments, obtaining the target loss value according to the plurality of first initial loss values in step S104 includes: respectively correcting the plurality of first initial loss values to obtain a plurality of first corrected loss values; and obtaining a target loss value according to the plurality of first correction loss values.
In some embodiments, after obtaining the plurality of first initial loss values representing the predicted loss of the model at each label level through step S103, a target loss value representing the overall predicted loss of the model may be obtained by simply summing the plurality of first initial loss values as described above.
However, considering that in a hierarchical multi-label text classification task, since there is a long tail problem in the training set, the head sample under the unbalanced training set is relatively large, and the loss value generated by a small number of samples under the unbalanced training set is often small, that is, the text samples corresponding to the text class labels with higher levels are relatively large, and the loss value generated by the text samples with lower levels (for example, the text class labels at leaf nodes) is usually small due to the small number of text samples, this makes it difficult for the model to learn semantic information of the tail difficult sample for the label with lower levels, that is, the label with lower levels in the model training process, and therefore, in the embodiment of the present disclosure, in order to enable the model to optimize the optimizing direction to the level with higher model identification difficulty in the training process, after the first initial loss values are obtained, correction processing can be performed on the first initial loss values respectively, so as to determine the value of the corrected loss values obtained after correction processing, and to improve the convergence direction and the model prediction speed of the model at each level in the subsequent training turn.
As shown in fig. 3, in some embodiments, the correction processing for the plurality of first initial loss values described in the embodiments of the present disclosure, to obtain a plurality of first corrected loss values, may include the following steps S301 to S302.
Step S301, a plurality of weight values corresponding to a plurality of first initial loss values in an ith training round are obtained, and/or a plurality of attenuation coefficients corresponding to a plurality of first initial loss values in the ith training round are obtained; wherein, the plurality of weight values are in one-to-one correspondence with the plurality of first initial loss values, and the plurality of attenuation coefficients are in one-to-one correspondence with the plurality of first initial loss values; the multiple weight values and the multiple attenuation coefficients are used for representing the optimization direction of the initial text classification model on the label level corresponding to each weight value and/or each attenuation coefficient in the (i+1) th training round, and i is an integer greater than 0.
In the embodiment of the present disclosure, the label level corresponding to each weight value refers to a label level of a predicted text category label corresponding to a first initial loss value corresponding to one weight value. For example, for a plurality of predictive text category labels "after-sales-consultation-product use problems", the loss value corresponding to the label "after-sales" on the level 1 is loss1, the loss value corresponding to the label "after-sales-consultation" on the level 2 is loss2, the loss value corresponding to the label "product use problem" on the level 3 is loss3, the weight value corresponding to the loss1 is w1, the weight value corresponding to the loss2 is w2, the weight value corresponding to the loss3 is w3, the label level corresponding to the weight value w1 is level 1, the label level corresponding to the weight value w2 is level 2, and the label level corresponding to the weight value w3 is level 3.
Similarly, in the embodiment of the present disclosure, the label level corresponding to each attenuation coefficient refers to a label level of a predicted text class label corresponding to a first initial loss value corresponding to one attenuation coefficient.
In actual implementation, the plurality of weight values may be represented in the form of a weight matrix, or the plurality of attenuation coefficients may be represented in the form of a weight matrix, and the present invention is not limited thereto.
In step S302, correction processing is performed on the plurality of first initial loss values according to the plurality of weight values and/or the plurality of attenuation coefficients, so as to obtain a plurality of first corrected loss values.
In the embodiment of the present disclosure, the correcting process is performed on the plurality of first initial loss values respectively, which may be by obtaining a weight value corresponding to each first initial loss value, so as to correct the plurality of first initial loss values according to the obtained plurality of weight values, to obtain a plurality of first corrected loss values; and/or determining an attenuation coefficient corresponding to each first initial loss value, so as to correct the plurality of first initial loss values according to the plurality of attenuation coefficients, thereby obtaining a plurality of first corrected loss values.
That is, in the embodiment of the present disclosure, the performing correction processing on the plurality of first initial loss values according to the plurality of weight values and/or the plurality of attenuation coefficients to obtain a plurality of first corrected loss values may be: and multiplying the plurality of first initial loss values by corresponding weight values in the plurality of weight values and/or corresponding attenuation coefficients in the plurality of attenuation coefficients respectively to obtain a plurality of first correction loss values.
Specifically, for a hierarchical multi-label text classification task, because the long tail problem usually exists in a training set, and if the long tail problem in the training set is to be solved manually, a large amount of work is usually required to construct the training set, and the problem of time and labor waste exists; then, in the training process, for a plurality of first initial loss values calculated in the ith training round, through the thought of weighted average, that is, if the corresponding predicted loss value of a certain label level in the training process is smaller, the prediction of the model on the label level is relatively accurate, so in the (i+1) th training round, the optimization direction of the model on the label level can be reduced, and if the corresponding predicted loss value of a certain label level is larger, the prediction of the model on the label level has larger error and needs to be further optimized, therefore, in the (i+1) th training round, the model parameter can be adjusted to enable the optimization direction of the model to be inclined to the label level, and the model is focused on the label level with larger predicted loss value, so that the training speed and the prediction accuracy of the model are improved.
In addition, since the model may have converged in some label levels as the training rounds increase during the training process, if the label levels continue to be optimized in subsequent training rounds, a problem of overfit (overfit) may occur in some label levels due to noise in sample data in the training set, and for this purpose, in the embodiment of the present disclosure, it may be further determined whether the model has been optimized in the label levels by observing the decrease condition of the predicted loss value of the model in each training round in each label level during the training process of the model, and if the model has been optimized in the label levels, that is, if the predicted loss value of the model in the label levels does not decrease any more or decreases less, a smaller attenuation coefficient may be set for the predicted loss value in the label levels, so that in the calculated target loss value, the duty ratio of the predicted loss value in the label levels is as small as possible, to reduce the degree of overfit to the label levels in the subsequent rounds, thereby preventing the problem of overtraining of the label levels in the label levels.
After the target loss value is obtained through the processing calculation, parameters of the model can be adjusted based on the target loss value, so that the optimization direction of the model on each label level in the subsequent training round is adjusted, and the model training speed and the accuracy of the prediction result are improved.
It should be noted that, in some embodiments, the obtaining the plurality of weight values corresponding to the plurality of first initial loss values in the ith training round in the step S301 may be: acquiring label levels of the predictive text class labels corresponding to the first initial loss values in a plurality of predictive text class labels; and inquiring preset weight values corresponding to the first initial loss values in preset mapping data according to the label levels corresponding to the first initial loss values to obtain a plurality of weight values, wherein the preset mapping data is used for reflecting the corresponding relation between the levels and the preset weight values.
The preset mapping data may be preset weight values corresponding to each of the preset levels in the multi-level labels. Specifically, considering that the long tail problem in the training set is generally that text samples corresponding to the tags with higher levels are more and text samples corresponding to the tags with lower levels are less, in order to promote optimization of the model on the levels to which the tags with lower levels belong, a weight value with a larger value may be preset for the initial loss value of the tag with lower levels, and a weight value with a smaller value may be preset for the initial loss value of the tag with higher levels.
For example, mapping data { (level 1, w 1), (level 2, w 2), (level 3, w 3) } may be set in advance, where the sum of w1+w2+w3 is 1, and w1< w2< w3; in this way, in the model training process, when the weight value corresponding to each first initial loss value is obtained, the weight value corresponding to each first initial loss value can be obtained by determining the label level corresponding to each first initial loss value and then inquiring the preset mapping data according to the label level.
That is, in some embodiments, a weight value may be preset for the initial loss value of each label level according to the height of each label level, so as to adjust the optimization emphasis point of the model for each label level in the training process, that is, the attention degree of each label level; however, considering that although there is a long tail problem in the training set, in the case of more label levels, there may be no or fewer samples of labels with lower levels, and thus, a method of training a model by using a preset weight value may result in inaccurate prediction results of the model, in some embodiments of the present disclosure, the weight value of each label level may be dynamically calculated based on the plurality of first initial loss values after obtaining the predicted loss value of the model for each label level in the current ith training round in the model training process, that is, the plurality of first initial loss values.
Specifically, in this embodiment, the acquiring a plurality of weight values corresponding to a plurality of first initial loss values in the ith training round in step S301 may be: obtaining the sum of a plurality of first initial loss values in the ith training round; calculating the ratio between each first initial loss value and the sum value respectively; and calculating the weight value of the corresponding first initial loss value according to the ratio corresponding to each first initial loss value to obtain a plurality of weight values.
That is, in the embodiment of the present disclosure, after the predicted loss value of the model on each label level is calculated in the current ith training round, that is, after the plurality of first initial loss values, since the initial loss value is larger, which indicates that the recognition capability of the model on the corresponding label level is poor, and the initial loss value is smaller, which indicates that the recognition capability of the model on the corresponding label level is relatively strong, the weight value corresponding to each first initial loss value may be calculated by obtaining the sum value of the plurality of first initial loss values and based on the ratio of each first initial loss value in the sum value.
For example, in the case where the label level corresponding to the text is three layers, if the predicted loss values of the model at the level 1, the level 2 and the level 3, that is, the initial loss values are loss1, loss2 and loss3, respectively, the weight value corresponding to the initial loss value loss1 at the level 1 obtained by the dynamic calculation may be: the weight value corresponding to the initial loss value on level 2 may be: the weight value corresponding to the initial loss value on level 3 may be: the loss 3/(loss 1+loss2+loss 3) is adopted, so that the weight value of the label level with a large predicted loss value is relatively large, gradient optimization of the loss value of the layer is increased, learning of a model on the label level is accelerated, and recognition capability of the model on label semantic information on the label level is improved.
In the above example, the ratio of each initial loss value to the sum value is directly used as the weight value of the corresponding initial loss value, and in actual implementation, the weight value may be obtained by further calculating based on the ratio and other coefficients, which is not particularly limited herein.
Therefore, according to the method provided by the embodiment of the disclosure, in the model training process, the weight value corresponding to the predicted loss value on each level is dynamically calculated according to the predicted loss value of the label of each level in the model training process, so that the model can dynamically adjust the optimization direction of each label level in the training process, the model can be used for fitting the predicted loss value on each label level to a smaller value as soon as possible through continuous iterative training, and the overall convergence speed of the model and the accuracy of a predicted result are improved.
Referring to fig. 4, a flowchart for acquiring real-time attenuation coefficients is provided in an embodiment of the present disclosure. As shown in fig. 4, in some embodiments, the step S301 of acquiring a plurality of attenuation coefficients corresponding to a plurality of first initial loss values in the ith training round includes the following steps S401 to S403.
Step S401, obtaining a to-be-processed loss value from the plurality of first initial loss values, and obtaining a real-time attenuation coefficient corresponding to the to-be-processed loss value, where the to-be-processed loss value is a first initial loss value that is not subjected to correction processing in any of the plurality of first initial loss values.
Step S402, when there is a first initial loss value which is not corrected in the plurality of first initial loss values, acquiring the first initial loss value which is not corrected as a new loss value to be processed, and re-executing the step of acquiring a real-time attenuation coefficient corresponding to the new loss value to be processed; and step S403, when there is no first initial loss value that is not subjected to the correction processing among the plurality of first initial loss values, setting the obtained plurality of real-time attenuation coefficients as a plurality of attenuation coefficients corresponding to the plurality of first initial loss values.
As shown in fig. 4, in the step S401, the obtaining the real-time attenuation coefficient corresponding to the loss value to be processed may include the following steps S4011 to S4013.
Step S4011, obtaining target label levels of predicted text class labels corresponding to the to-be-processed loss values in a plurality of predicted text class labels, and obtaining preset training round thresholds corresponding to the target label levels; the preset training round threshold is used for representing training rounds corresponding to the initial text classification model under the condition that the real-time attenuation coefficient corresponding to the to-be-processed loss value is acquired.
The preset training round threshold may be a training round threshold set for a label on each level in advance, and in the model training process, the real-time attenuation coefficient corresponding to the label level may be obtained based on the preset training round threshold corresponding to each label level when the current ith training round is greater than or equal to the preset training round threshold corresponding to the label level. The preset training round threshold may be, for example, 1000. It should be noted that, in actual implementation, the preset training round threshold value corresponding to each label level may be the same value, or may also be different values. For example, considering that the samples of the higher-level tags are generally more, the model has a higher learning ability on the level and can converge on the level faster, so that a training round threshold with a smaller value can be set for the higher-level tags, while the model may need more training rounds to converge on the level because the samples of the lower-level tags are often less, so that a training round threshold with a larger value can be set for the lower-level tags, for example, a preset training round threshold of 500 can be set for level 1 and a preset training round threshold of 1000 can be set for level 3.
Step S4012, obtaining a historical initial loss value corresponding to the target label level in the ith training round when the ith training round is greater than or equal to a preset training round threshold.
The historical initial loss value refers to the predicted loss value of the model in the ith-1 training round. For example, for level 1, where its preset training round threshold is 1000 and the current training round is 1000, the historical initial loss value for that level may be the predicted loss value for level 1 in the 999 th training round.
Step S4013, according to the historical loss value and the loss value to be processed, determining a real-time attenuation coefficient corresponding to the loss value to be processed.
After obtaining the historical initial loss value, it may be determined whether the predicted loss value of the model on the target label level has a decrease based on the difference between the historical initial loss value and the to-be-processed loss value, and if the predicted loss value is not present or has a smaller decrease, it is indicated that the model may have converged on the target label level, and excessive learning in a subsequent training round is not required to avoid over-fitting on the label level. That is, in some embodiments, the determining, according to the historical initial loss value and the to-be-processed loss value, the real-time attenuation coefficient corresponding to the to-be-processed loss value may be: acquiring an absolute value of a difference value between a historical initial loss value and a loss value to be processed; setting the real-time attenuation coefficient as a first preset value under the condition that the absolute value of the difference value is smaller than a preset threshold value; and setting the real-time attenuation coefficient as a second preset value under the condition that the absolute value of the difference value is larger than or equal to a preset threshold value.
In practical implementation, the preset threshold may be set as required, for example, may be 0, or may be other values close to 0, which is not limited herein.
The first preset value may be, for example, 0.1, so as to reduce the specific gravity of the predicted loss value on the label level and avoid the model from being over fitted on the label level if it is determined that the predicted loss value of the model on the target label level is no longer reduced.
The second preset value may be, for example, 1, which is not limited herein, that is, if it is determined that the predicted loss value of the model on the target label level still decreases, it indicates that the model has not converged on the label level yet, and it is necessary to continue to optimize in the subsequent training round to improve the prediction accuracy of the model on the label level.
It can be seen that, in the embodiment of the present disclosure, in order to avoid the problem that the model is trained and fitted on each label level in the training process, by setting a preset training round threshold for each label level, in order to observe whether the predicted loss value on the label level drops or not when the current ith training step is greater than or equal to the preset training round threshold corresponding to the corresponding label level in the model training process, if the predicted loss value does not drop, a smaller attenuation coefficient is set for the predicted loss value, so as to reduce the attention of the model on the label level in the subsequent training rounds, and make the model pay attention to the label level difficult to identify, thereby not only avoiding the model from being fitted on the corresponding label level, but also improving the overall convergence rate of the model and the accuracy of the predicted result by continuously adjusting the optimization direction on each label level.
In correspondence to the foregoing embodiments, the embodiments of the present disclosure further provide a text classification method, please refer to fig. 5, which is a flowchart of a text classification method provided in the embodiments of the present disclosure. The method may be applied to an electronic device, which may be a server, or may also be a terminal device, which is not particularly limited herein.
As shown in fig. 5, the text classification method provided by the embodiment of the present disclosure may include the following steps S501 to S502.
Step S501, obtaining target text data to be classified;
the target text data may be any text to be used for text classification, and it may be understood that in the embodiment of the present disclosure, when text classification is performed on the target text data, the target text data may be a plurality of text category labels used for determining that the target text data belongs to, that is, a hierarchical multi-label corresponding to the target text data is obtained.
Step S502, inputting the target text data into a text classification model for text category prediction processing to obtain a plurality of target text category labels corresponding to the target text data, wherein the text classification model is obtained according to a training method of the text classification model.
Therefore, according to the text classification method provided by the embodiment of the disclosure, in the process of training the text classification model, the model overall target loss value is calculated based on the predicted loss value of each label level in the training process of the model, so that the optimization direction of the model to each label level in the subsequent training round can be determined based on the target loss value, the prediction accuracy of the model can be improved, a plurality of target text class labels of target text data can be predicted based on the text classification model obtained by training, and the accuracy of a prediction result can be improved.
In practical implementation, the text classification method provided by the embodiment of the disclosure can be applied to an electronic commerce scene and can be used for classifying user call texts in the electronic commerce scene.
In this application scenario, when classifying the user call text, for example, the user call text may be classified into hierarchical multi-labels of a 3-layer structure as shown in table 2 below.
TABLE 2
In practical implementation, after a voice call is performed between the user 1 and the user 2, for example, when the mobile phone used by the user 1 signs that the user 1 is authorized, call voice data is obtained and is subjected to voice recognition processing to obtain a call text, and then the call text is input into a text classification model to be subjected to text classification processing to obtain a plurality of text class labels corresponding to the call text, for example, three text class labels represented in a hierarchy, such as "after-sales-after-sales consultation-product quality problem", can be obtained.
It should be noted that, in the above table 2 is only used for schematic illustration, in practical implementation, when the text classification method provided in the embodiment of the present disclosure is applied to an e-commerce scene, when classifying the text, the hierarchical multi-labels required to be classified may be set according to needs, which is not limited in particular herein.
In addition, it can be appreciated that, in practical implementation, the text classification method may be applied to other scenarios, for example, to financial scenarios, and for ease of understanding, please refer to fig. 6, which is a schematic diagram of an application scenario provided in an embodiment of the present disclosure. As shown in fig. 6, the text classification method provided in the embodiment of the present disclosure may also be applied to a financial scenario, and may be used to classify a user call text in the financial scenario.
In this application scenario, when classifying the user call text, for example, the user call text may be classified into a hierarchical multi-tag of a 3-layer structure as shown in table 3 below.
/>
TABLE 3 Table 3
As shown in fig. 6, in actual implementation, after a user 3 and a user 4 make a voice call, an electronic device implementing the text classification method according to the embodiments of the present disclosure, for example, a mobile phone used by the user 3 shown in fig. 6 obtains call voice data under the condition of authorizing the user 3, then performs a voice recognition process on the call voice data to obtain a call text, and inputs the call text into a text classification model to perform a text classification process to obtain a plurality of text class labels corresponding to the call text, for example, three text class labels represented in a hierarchy, such as "fraud-fraud overproduction-telecommunication fraud" may be obtained.
In addition, as shown in fig. 6, after obtaining the text category labels, the electronic device may further perform risk discrimination processing according to the text category labels, and send risk prompt information to the user according to the discrimination result, for example, if the text category labels indicate that the voice call may involve "telecom fraud", push corresponding prompt information and the text category labels corresponding to the call text to the user, so as to prompt the user in time, and avoid property loss of the user.
In fig. 6, the text classification method is described by taking an example in which the text classification method is used in one of the two parties of the call, that is, in the electronic device used by the user 3, and in practical implementation, the method may be applied to the electronic device used by the two parties of the call, for example, in the electronic device used by the user 3 in fig. 6, and the text classification method provided in the embodiment of the present disclosure may be implemented, which is not limited in particular herein.
It will be appreciated that, in actual implementation, the call text may be obtained through other ways, for example, the call text may be directly sent by a third party application program to an electronic device for implementing the text classification method according to the embodiments of the present disclosure, so that the electronic device performs text classification processing on the call text based on a built-in text classification model, which is not limited herein. Of course, in actual implementation, the text classification method provided in the embodiment of the present disclosure may also be applied to other scenes as required, for example, the text classification method may also be used to perform text classification processing on an electronic book in an online electronic book platform, which is not limited in particular herein.
It will be appreciated that the above-mentioned method embodiments of the present disclosure may be combined with each other to form a combined embodiment without departing from the principle logic, and are limited to the description of the present disclosure. It will be appreciated by those skilled in the art that in the above-described methods of the embodiments, the particular order of execution of the steps should be determined by their function and possible inherent logic.
In addition, the disclosure further provides a training device for a text classification model, a text classification device, an electronic device, and a computer readable storage medium, where the foregoing may be used to implement any one of the training methods for a text classification model or any one of the text classification methods provided in the disclosure, and corresponding technical schemes and descriptions and corresponding descriptions of method parts are omitted.
Fig. 7 is a block diagram of a training device for a text classification model according to an embodiment of the disclosure.
Referring to fig. 7, an embodiment of the present disclosure provides a training apparatus for a text classification model, the training apparatus 700 for a text classification model including: a sample data acquisition unit 701, a prediction unit 702, a first initial loss value acquisition unit 703, a target loss value acquisition unit 704, and an adjustment unit 705.
The sample data obtaining unit 701 is configured to obtain sample data and a labeled text category label corresponding to the sample data; the labeling text category labels are a plurality of standard text category labels, a preset hierarchical relationship is met among the standard text category labels, the preset hierarchical relationship is that each hierarchical layer comprises one text category label, and two text category labels of adjacent hierarchical layers are containing relationships.
The prediction unit 702 is configured to input sample data and labeled text category labels corresponding to the sample data into an initial text classification model to perform text category prediction processing, so as to obtain a plurality of predicted text category labels corresponding to the sample data; the plurality of predictive text category labels meet a preset hierarchical relationship.
The first initial loss value obtaining unit 703 is configured to obtain, for a plurality of predicted text category labels, a plurality of first initial loss values by obtaining, as a first initial loss value, an error value between each predicted text category label and a standard text category label having a corresponding relationship; the corresponding relation is that the label level of the standard text type label and the label level of the predicted text type label belong to the same level, and the first initial loss value is used for representing the predicted loss of the initial text classification model on the label level corresponding to the first initial loss value.
The target loss value obtaining unit 704 is configured to obtain a target loss value according to the plurality of first initial loss values.
The adjusting unit 705 is configured to adjust parameters of the initial text classification model according to the target loss value, so as to obtain a text classification model.
In some embodiments, the target loss value obtaining unit 704 may be configured to, when obtaining the target loss value according to the plurality of first initial loss values: respectively correcting the plurality of first initial loss values to obtain a plurality of first corrected loss values; and obtaining a target loss value according to the plurality of first correction loss values.
In some embodiments, the target loss value obtaining unit 704 may be configured to, when performing correction processing on the plurality of first initial loss values to obtain a plurality of first corrected loss values, respectively: acquiring a plurality of weight values corresponding to a plurality of first initial loss values in an ith training round, and/or acquiring a plurality of attenuation coefficients corresponding to a plurality of first initial loss values in the ith training round; wherein, the plurality of weight values are in one-to-one correspondence with the plurality of first initial loss values, and the plurality of attenuation coefficients are in one-to-one correspondence with the plurality of first initial loss values; the method comprises the steps that a plurality of weight values and a plurality of attenuation coefficients are used for representing the optimization direction of an initial text classification model on a label level corresponding to each weight value and each attenuation coefficient respectively in an i+1th training round, wherein i is an integer larger than 0; and respectively correcting the plurality of first initial loss values according to the plurality of weight values and/or the plurality of attenuation coefficients to obtain a plurality of first corrected loss values.
In some embodiments, the target loss value obtaining unit 704 may be configured to, when performing correction processing on the plurality of first initial loss values according to the plurality of weight values and/or the plurality of attenuation coefficients, obtain a plurality of first corrected loss values: and multiplying the plurality of first initial loss values by corresponding weight values in the plurality of weight values and/or corresponding attenuation coefficients in the plurality of attenuation coefficients respectively to obtain a plurality of first correction loss values.
In some embodiments, the target loss value obtaining unit 704 may be configured to, when obtaining a plurality of weight values corresponding to a plurality of first initial loss values in an ith training round: obtaining the sum of a plurality of first initial loss values in the ith training round; calculating the ratio between each first initial loss value and the sum value respectively; and calculating the weight value of the corresponding first initial loss value according to the ratio corresponding to each first initial loss value to obtain a plurality of weight values.
In some embodiments, the target loss value obtaining unit 704 may be configured to, when obtaining a plurality of weight values corresponding to a plurality of first initial loss values in an ith training round: acquiring label levels of the predictive text class labels corresponding to the first initial loss values in a plurality of predictive text class labels; and inquiring preset weight values corresponding to the first initial loss values in preset mapping data according to the label levels corresponding to the first initial loss values to obtain a plurality of weight values, wherein the preset mapping data is used for reflecting the corresponding relation between the levels and the preset weight values.
In some embodiments, the target loss value obtaining unit 704 may be configured to, when obtaining a plurality of attenuation coefficients corresponding to a plurality of first initial loss values in an ith training round: acquiring a to-be-processed loss value from a plurality of first initial loss values, and acquiring a real-time attenuation coefficient corresponding to the to-be-processed loss value, wherein the to-be-processed loss value is a first initial loss value which is not subjected to correction processing in any one of the plurality of first initial loss values; when a first initial loss value which is not subjected to correction processing exists in the plurality of first initial loss values, acquiring the first initial loss value which is not subjected to correction processing as a new loss value to be processed, and re-executing the step of acquiring a real-time attenuation coefficient corresponding to the new loss value to be processed; when there is no first initial loss value that is not subjected to the correction processing among the plurality of first initial loss values, the obtained plurality of real-time attenuation coefficients are used as a plurality of attenuation coefficients corresponding to the plurality of first initial loss values.
In some embodiments, the target loss value obtaining unit 704 may be configured to, when obtaining a real-time attenuation coefficient corresponding to a loss value to be processed: acquiring target label levels of the predicted text class labels corresponding to the to-be-processed loss values in the predicted text class labels, and acquiring preset training round thresholds corresponding to the target label levels; the method comprises the steps that a preset training round threshold is used for limiting training rounds corresponding to an initial text classification model under the condition that a real-time attenuation coefficient corresponding to a to-be-processed loss value is acquired; acquiring a historical initial loss value corresponding to a target label level in the ith training round under the condition that the ith training round is larger than or equal to a preset training round threshold; and determining a real-time attenuation coefficient corresponding to the to-be-processed loss value according to the historical initial loss value and the to-be-processed loss value.
In some embodiments, the target loss value obtaining unit 704 may be configured to, when determining the real-time attenuation coefficient corresponding to the loss value to be processed according to the historical initial loss value and the loss value to be processed: acquiring an absolute value of a difference value between a historical initial loss value and a loss value to be processed; setting the real-time attenuation coefficient as a first preset value under the condition that the absolute value of the difference value is smaller than a preset threshold value; and setting the real-time attenuation coefficient as a second preset value under the condition that the absolute value of the difference value is larger than or equal to a preset threshold value.
Therefore, in the target loss value obtaining unit, based on the plurality of first initial loss values, the predicted loss of the initial text classification model when predicting the text class label on each label level can be obtained respectively, and the target loss value obtained based on the plurality of first initial loss values can represent the optimization direction of the model on each label level in the subsequent training round, so that the parameter of the initial text classification model can be adjusted based on the target loss value in the adjusting unit, the converged text classification model can be quickly trained, and meanwhile, because each label level can be pertinently optimized in the training process, the device can further improve the accuracy of the classification result of the text classification model obtained by training.
The respective modules in the text classification model device described above may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or independent of a processor in the electronic device, or may be stored in software in a memory in the electronic device, so that the processor may call and execute operations corresponding to the above modules.
Fig. 8 is a block diagram of a text classification apparatus according to an embodiment of the disclosure.
Referring to fig. 8, an embodiment of the present disclosure provides a text classification apparatus 800 including: a text data acquisition unit 801 and a classification unit 802.
The text data obtaining unit 801 is configured to obtain target text data to be classified.
The classifying unit 802 is configured to input the target text data into a text classification model for text category prediction processing, and obtain a plurality of target text category labels corresponding to the target text data, where the text classification model is obtained according to a training method of the text classification model.
According to the text classification device provided by the embodiment of the disclosure, in the process of training a text classification model, the model overall target loss value is calculated based on the predicted loss value of each label level in the training process of the model, so that the optimization direction of the model to each label level in the subsequent training round can be determined based on the target loss value, and the prediction accuracy of the model can be improved.
The respective modules in the text classification model device described above may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or independent of a processor in the electronic device, or may be stored in software in a memory in the electronic device, so that the processor may call and execute operations corresponding to the above modules.
Fig. 9 is a block diagram of an electronic device according to an embodiment of the present disclosure.
Referring to fig. 9, an embodiment of the present disclosure provides an electronic device 900 including: at least one processor 901; at least one memory 902, and one or more I/O interfaces 903, connected between the processor 901 and the memory 902; the memory 902 stores one or more computer programs executable by the at least one processor 901, and the one or more computer programs are executed by the at least one processor 901 to enable the at least one processor 901 to perform the above-described training method or text classification method of the text classification model.
The disclosed embodiments also provide a computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the above-described training method or text classification method of a text classification model. The computer readable storage medium may be a volatile or nonvolatile computer readable storage medium.
Embodiments of the present disclosure also provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when executed in a processor of an electronic device, performs the above-described training method or text classification method of a text classification model.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer-readable storage media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable program instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, random Access Memory (RAM), read Only Memory (ROM), erasable Programmable Read Only Memory (EPROM), static Random Access Memory (SRAM), flash memory or other memory technology, portable compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical disc storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable program instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and may include any information delivery media.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.
The computer program product described herein may be embodied in hardware, software, or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable text classification model training apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable text classification model training apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, a training apparatus of a programmable text classification model, and/or other devices to function in a particular manner, such that the computer-readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable apparatus for a text classification model, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus for a text classification model, or other device to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus for a text classification model, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purpose of limitation. In some instances, it will be apparent to one skilled in the art that features, characteristics, and/or elements described in connection with a particular embodiment may be used alone or in combination with other embodiments unless explicitly stated otherwise. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.

Claims (13)

1. A method for training a text classification model, comprising:
acquiring sample data and a labeling text category label corresponding to the sample data; the labeling text category labels are a plurality of standard text category labels, a preset hierarchical relation is met among the standard text category labels, each hierarchical relation comprises one text category label, and two text category labels of adjacent hierarchical relations are containing relations;
Inputting the sample data and the labeled text category labels corresponding to the sample data into an initial text classification model to perform text category prediction processing, so as to obtain a plurality of predicted text category labels corresponding to the sample data; wherein the preset hierarchical relationship is satisfied among the plurality of predictive text category labels;
aiming at the plurality of predicted text category labels, obtaining a plurality of first initial loss values by acquiring error values between each predicted text category label and a standard text category label with a corresponding relation as the first initial loss values; the corresponding relation is that a label level of a standard text class label and a label level of a predicted text class label belong to the same level, and the first initial loss value is used for representing predicted loss of the initial text classification model on the label level corresponding to the first initial loss value;
obtaining a target loss value according to the plurality of first initial loss values;
and adjusting parameters of the initial text classification model according to the target loss value to obtain a text classification model.
2. The method of claim 1, wherein the deriving a target loss value from the plurality of first initial loss values comprises:
Respectively correcting the plurality of first initial loss values to obtain a plurality of first corrected loss values;
and obtaining the target loss value according to the plurality of first correction loss values.
3. The method of claim 2, wherein the performing correction processing on the plurality of first initial loss values to obtain a plurality of first corrected loss values includes:
acquiring a plurality of weight values corresponding to the plurality of first initial loss values in the ith training round, and/or acquiring a plurality of attenuation coefficients corresponding to the plurality of first initial loss values in the ith training round;
the plurality of weight values are in one-to-one correspondence with the plurality of first initial loss values, and the plurality of attenuation coefficients are in one-to-one correspondence with the plurality of first initial loss values; the weight values and the attenuation coefficients are used for representing the optimization direction of the initial text classification model on a label level corresponding to each weight value and each attenuation coefficient in the (i+1) th training round, wherein i is an integer greater than 0;
and respectively correcting the plurality of first initial loss values according to the plurality of weight values and/or the plurality of attenuation coefficients to obtain a plurality of first corrected loss values.
4. A method according to claim 3, wherein the correcting the plurality of first initial loss values according to the plurality of weight values and/or the plurality of attenuation coefficients, respectively, to obtain a plurality of first corrected loss values, includes:
and multiplying the plurality of first initial loss values by corresponding weight values in the plurality of weight values and/or corresponding attenuation coefficients in the plurality of attenuation coefficients respectively to obtain a plurality of first correction loss values.
5. A method according to claim 3, wherein the obtaining a plurality of weight values corresponding to the plurality of first initial loss values in the ith training round comprises:
obtaining a sum of the plurality of first initial loss values in an ith training round;
calculating the ratio between each first initial loss value and the sum value respectively;
and calculating the weight value of the corresponding first initial loss value according to the ratio corresponding to each first initial loss value to obtain the plurality of weight values.
6. A method according to claim 3, wherein the obtaining a plurality of weight values corresponding to the plurality of first initial loss values in the ith training round comprises:
Acquiring label levels of the predictive text class labels corresponding to the first initial loss values in the predictive text class labels;
and inquiring preset weight values corresponding to the first initial loss values in preset mapping data according to the label levels corresponding to the first initial loss values to obtain the plurality of weight values, wherein the preset mapping data is used for reflecting the corresponding relation between the levels and the preset weight values.
7. A method according to claim 3, wherein said obtaining a plurality of attenuation coefficients corresponding to said plurality of first initial loss values in an ith training round comprises:
acquiring a to-be-processed loss value from the plurality of first initial loss values, and acquiring a real-time attenuation coefficient corresponding to the to-be-processed loss value, wherein the to-be-processed loss value is a first initial loss value which is not subjected to the correction processing in any one of the plurality of first initial loss values;
acquiring a first initial loss value which is not subjected to the correction processing as a new loss value to be processed when the first initial loss value which is not subjected to the correction processing exists in the plurality of first initial loss values, and re-executing the step of acquiring a real-time attenuation coefficient corresponding to the new loss value to be processed;
And when there is no first initial loss value which is not subjected to the correction processing among the plurality of first initial loss values, using the obtained plurality of real-time attenuation coefficients as a plurality of attenuation coefficients corresponding to the plurality of first initial loss values.
8. The method of claim 7, wherein the obtaining the real-time attenuation coefficient corresponding to the loss value to be processed comprises:
acquiring target label levels of the predicted text class labels corresponding to the to-be-processed loss values in the predicted text class labels, and acquiring preset training round thresholds corresponding to the target label levels; the preset training round threshold is used for limiting training rounds corresponding to the initial text classification model under the condition that the real-time attenuation coefficient corresponding to the to-be-processed loss value is acquired;
acquiring a historical initial loss value corresponding to the target label level in the ith training round under the condition that the ith training round is larger than or equal to the preset training round threshold;
and determining a real-time attenuation coefficient corresponding to the to-be-processed loss value according to the historical initial loss value and the to-be-processed loss value.
9. The method of claim 8, wherein determining the real-time attenuation coefficient corresponding to the loss to be processed value according to the historical initial loss value and the loss to be processed value comprises:
acquiring an absolute value of a difference value between the historical initial loss value and the loss value to be processed;
setting the real-time attenuation coefficient to be a first preset value under the condition that the absolute value of the difference value is smaller than a preset threshold value;
and setting the real-time attenuation coefficient to be a second preset value under the condition that the absolute value of the difference value is larger than or equal to the preset threshold value.
10. A method of text classification, comprising:
acquiring target text data to be classified;
inputting the target text data into a text classification model to perform text category prediction processing to obtain a plurality of target text category labels corresponding to the target text data, wherein the text classification model is obtained according to the training method of the text classification model of any one of claims 1-9.
11. A text classification device, comprising:
the text data acquisition unit is used for acquiring target text data to be classified;
The classification unit is used for inputting the target text data into a text classification model to perform text category prediction processing to obtain a plurality of target text category labels corresponding to the target text data, wherein the text classification model is obtained according to the training method of the text classification model of any one of claims 1-9.
12. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores one or more computer programs executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.
13. A computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method according to any of claims 1-10.
CN202310679685.6A 2023-06-08 2023-06-08 Training method and device for text classification model, electronic equipment and storage medium Pending CN117493557A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310679685.6A CN117493557A (en) 2023-06-08 2023-06-08 Training method and device for text classification model, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310679685.6A CN117493557A (en) 2023-06-08 2023-06-08 Training method and device for text classification model, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117493557A true CN117493557A (en) 2024-02-02

Family

ID=89677029

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310679685.6A Pending CN117493557A (en) 2023-06-08 2023-06-08 Training method and device for text classification model, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117493557A (en)

Similar Documents

Publication Publication Date Title
US10635858B2 (en) Electronic message classification and delivery using a neural network architecture
CN113240510B (en) Abnormal user prediction method, device, equipment and storage medium
CN112463968B (en) Text classification method and device and electronic equipment
CN114817538B (en) Training method of text classification model, text classification method and related equipment
US11151322B2 (en) Computer-implemented method, system and computer program product for providing an application interface
US20220067579A1 (en) Dynamic ontology classification system
CN112966754B (en) Sample screening method, sample screening device and terminal equipment
CN113298121B (en) Message sending method and device based on multi-data source modeling and electronic equipment
US11593633B2 (en) Systems, methods, and computer-readable media for improved real-time audio processing
CN111950268A (en) Method, device and storage medium for detecting junk information
CN117493557A (en) Training method and device for text classification model, electronic equipment and storage medium
Dahanayaka et al. Robust open-set classification for encrypted traffic fingerprinting
CN117523218A (en) Label generation, training of image classification model and image classification method and device
CN111652079B (en) Expression recognition method and system applied to mobile crowd and storage medium
CN113806541A (en) Emotion classification method and emotion classification model training method and device
CN116911955B (en) Training method and device for target recommendation model
US20240028958A1 (en) Method, device, and computer program product for processing request message
CN115730823A (en) Risk probability identification method and device, processor and electronic equipment
CN117952122A (en) Training method of user portrait model, dialogue processing method and related equipment
CN116150310A (en) Training method of dialogue processing model, dialogue processing method and related equipment
CN115222020A (en) Data processing method and device, electronic equipment and computer readable storage medium
US20220284300A1 (en) Techniques to tune scale parameter for activations in binary neural networks
CN115497459A (en) Model training method and device, electronic equipment and storage medium
CN117218474A (en) Model training method and device, electronic equipment and storage medium
CN117235271A (en) Information extraction method and device, computer storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination