CN116932759A - Text classification model obtaining method and device, storage medium and electronic equipment - Google Patents
Text classification model obtaining method and device, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN116932759A CN116932759A CN202310932443.3A CN202310932443A CN116932759A CN 116932759 A CN116932759 A CN 116932759A CN 202310932443 A CN202310932443 A CN 202310932443A CN 116932759 A CN116932759 A CN 116932759A
- Authority
- CN
- China
- Prior art keywords
- text
- text classification
- loss function
- network
- classification network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 65
- 238000013145 classification model Methods 0.000 title claims abstract description 63
- 230000006835 compression Effects 0.000 claims abstract description 58
- 238000007906 compression Methods 0.000 claims abstract description 58
- 230000006870 function Effects 0.000 claims description 289
- 101150109818 STU1 gene Proteins 0.000 claims description 12
- 101100313259 Schizosaccharomyces pombe (strain 972 / ATCC 24843) tea2 gene Proteins 0.000 claims description 12
- 101100313260 Schizosaccharomyces pombe (strain 972 / ATCC 24843) tea3 gene Proteins 0.000 claims description 12
- 241001122767 Theaceae Species 0.000 claims description 12
- 238000004891 communication Methods 0.000 claims description 3
- 238000013473 artificial intelligence Methods 0.000 abstract description 6
- 238000013459 approach Methods 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 9
- 238000012549 training Methods 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 7
- 238000004590 computer program Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 230000009471 action Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The method, the device, the storage medium and the electronic equipment for obtaining the text classification model can be applied to the field of artificial intelligence or the field of finance. The method and the device are based on a model compression idea, and two text classification networks with different network layers are trained simultaneously by utilizing a problem text data set comprising a plurality of problem texts carrying problem category labels, so that the label distribution data set output by the text classification network with fewer network layers can approach to the text classification network with more network layers, and the two text classification networks are used as a whole to be in game with a classifier, thereby improving the classification precision of a text classification model constructed on the basis of the text classification network with fewer network layers, and realizing accurate classification of customer problems.
Description
Technical Field
The disclosure relates to the field of artificial intelligence, and in particular relates to a method and a device for obtaining a text classification model, a storage medium and electronic equipment.
Background
With the rapid increase in the number of customers of banks, automated question-answering systems of banks often feed back various customer questions.
Because the text multi-classification application is involved in the actual client problem consultation scene, and the complexity of the traditional deep network model is high, a large amount of computer resources are needed to carry out long-time model training on the text classification model, and model deployment and model iteration on common equipment are difficult.
To solve the above problems, the industry currently generally considers constructing a small text classification model based on the model compression concept for text classification. However, the small text classification model obtained through model compression has the defect of low classification precision generally, so that the problem of the client cannot be accurately classified, and the experience of the client on the bank question-answering service is reduced.
Therefore, how to improve the classification accuracy of the text classification model obtained by compressing the model is a technical problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of the above problems, the present disclosure provides a method, an apparatus, a storage medium, and an electronic device for obtaining a text classification model, which overcome or at least partially solve the above problems, and the technical solutions are as follows:
a text classification model acquisition method, comprising:
obtaining a question text data set, wherein the question text data set comprises a plurality of question texts carrying question category labels;
Obtaining a constructed first text classification network and a constructed second text classification network, wherein the number of network layers of the first text classification network is greater than that of the second text classification network;
inputting each question text in the question text data set into the first text classification network and the second text classification network to respectively obtain hidden layer input data and label distribution data sets of the first text classification network and the second text classification network;
obtaining a model compression loss function by utilizing hidden layer input data and a label distribution data set of the first text classification network and the second text classification network;
obtaining a classifier loss function based on the question text dataset using the tag distribution datasets of the first text classification network and the second text classification network;
utilizing the model compression loss function and the classifier loss function to obtain a game loss function;
and adjusting network parameters of the second text classification network based on the game loss function to obtain a target text classification model.
Optionally, the hidden layer input data includes mean square error loss layer input data, relative entropy loss layer input data, and cross entropy loss layer input data, and the obtaining a model compression loss function by using the hidden layer input data and the label distribution data set of the first text classification network and the second text classification network includes:
Obtaining a mean square error loss function between the first text classification network and the second text classification network by utilizing the mean square error loss layer input data of the first text classification network and the second text classification network;
obtaining a relative entropy loss function between the first text classification network and the second text classification network using the relative entropy loss layer input data of the first text classification network and the second text classification network;
obtaining a first cross entropy loss function between the first text classification network and the second text classification network using the cross entropy loss layer input data of the first text classification network and the second text classification network;
obtaining a second cross entropy loss function between the first text classification network and the second text classification network using the tag distribution dataset of the first text classification network and the second text classification network;
obtaining a third cross entropy loss function between the second text classification network and the problem category label set by using a label distribution data set of the second text classification network and the problem category label set, wherein the problem category label set comprises problem category labels corresponding to each problem text in the problem text data set;
And obtaining a model compression loss function by using the mean square error loss function, the relative entropy loss function, the first cross entropy loss function, the second cross entropy loss function and the third cross entropy loss function.
Optionally, the obtaining a model compression loss function using the mean square error loss function, the relative entropy loss function, the first cross entropy loss function, the second cross entropy loss function, and the third cross entropy loss function includes:
according to the formula:
L MC =λ(L CE (P tea ,P stu )+L CE (P data ,P stu ))+(1-λ)(L MSE (P tea1 ,P stu1 )+L KL (P tea2 ,P stu2 )+L CE (P tea3 ,P stu3 ))
calculating a model compression loss function, wherein L MC Compressing the loss function for the model; λ is a weight factor; l (L) MSE (P tea1 ,P stu1 ) P is the mean square error loss function tea1 Inputting data for the mean square error loss layer of the first text classification network, P stu1 Inputting data for the mean square error loss layer of the second text classification network; l (L) KL (P tea2 ,P stu2 ) P is the relative entropy loss function tea2 Inputting data for the relative entropy loss layer of the first text classification network, P stu2 Inputting data for the relative entropy loss layer of the second text classification network; l (L) CE (P tea3 ,P stu3 ) For the first cross entropy loss function, P tea3 Inputting data for the cross entropy loss layer of the first text classification network, P stu3 Said cross entropy for said second text classification networkLosing layer input data; l (L) CE (P tea ,P stu ) P being the second cross entropy loss function tea Distributing a dataset for tags of said first text classification network, P stu Distributing a dataset for tags of the second text classification network; l (L) CE (P data ,P stu ) For the third cross entropy loss function, P data And a set of labels for the problem category.
Optionally, before the adjusting the network parameters of the second text classification network based on the game loss function, the method further includes:
updating the network parameters of the second text classification network based on the mean square error loss function, the relative entropy loss function, and the first cross entropy loss function.
Optionally, the obtaining, based on the question text data set, a classifier loss function using tag distribution data sets of the first text classification network and the second text classification network includes:
for any of the question text in the question text dataset: determining first tag data and second tag data corresponding to the question text in tag distribution data sets of the first text classification network and the second text classification network respectively; combining the first tag data and the question text into a first text tag data pair, and adding a first soft tag to the first text tag data pair; combining the second tag data and the question text into a second text tag data pair, and adding a second soft tag to the second text tag data pair;
Based on the first text label data pair and the second text label data pair corresponding to each question text, obtaining a network classification true probability result;
inputting the first text label data pair and the second text label data pair corresponding to each question text into a classifier to obtain a network classification prediction probability result output by the classifier;
and obtaining a classifier loss function based on the network classification true probability result and the network classification prediction probability result.
Optionally, the obtaining a classifier loss function based on the network classification true probability result and the network classification predicted probability result includes:
according to the formula:
calculating a classifier loss function, wherein L D -a loss function for the classifier; j represents the number of the question text in the question text data set; m represents the number of question texts in the question text data set; i denotes a soft tag, the first soft tag when i=0, the second soft tag when i=1, p (X ij ) Representing the network classification real probability corresponding to the problem text of the j-th bit under the soft label i in the network classification real probability result; And representing the network classification prediction probability corresponding to the problem text of the j-th bit under the soft label i in the network classification prediction probability result.
Optionally, the compressing the loss function and the classifier loss function using the model to obtain a game loss function includes:
according to the formula:
L G =min max μL MC +(1-μ)L D
calculating a game loss function, wherein L G -providing said gaming loss function; mu is game trend parameter; l (L) MC Compressing a loss function for the model; l (L) D And a loss function for the classifier.
A text classification model obtaining apparatus comprising: a question text data set obtaining unit, a text classification network obtaining unit, a data obtaining unit, a model compression loss function obtaining unit, a classifier loss function obtaining unit, a game loss function obtaining unit and a target text classification model obtaining unit,
the question text data set obtaining unit is used for obtaining a question text data set, wherein the question text data set comprises a plurality of question texts carrying question category labels;
the text classification network obtaining unit is used for obtaining a constructed first text classification network and a constructed second text classification network, wherein the number of network layers of the first text classification network is greater than that of the second text classification network;
The data obtaining unit is configured to input each of the question texts in the question text data set into the first text classification network and the second text classification network, and obtain hidden layer input data and a tag distribution data set of the first text classification network and the second text classification network respectively;
the model compression loss function obtaining unit is used for obtaining a model compression loss function by utilizing hidden layer input data and tag distribution data sets of the first text classification network and the second text classification network;
the classifier loss function obtaining unit is used for obtaining a classifier loss function by using the label distribution data sets of the first text classification network and the second text classification network based on the problem text data set;
the game loss function obtaining unit is used for obtaining a game loss function by utilizing the model compression loss function and the classifier loss function;
the target text classification model obtaining unit is used for adjusting network parameters of the second text classification network based on the game loss function to obtain a target text classification model.
A computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the text classification model obtaining method of any of the above.
An electronic device comprising at least one processor, and at least one memory, bus connected to the processor; the processor and the memory complete communication with each other through the bus; the processor is configured to invoke the program instructions in the memory to perform the text classification model acquisition method of any of the above.
By means of the technical scheme, the text classification model obtaining method, the device, the storage medium and the electronic equipment can be applied to the field of artificial intelligence or the field of finance. The method comprises the steps of obtaining a question text data set, wherein the question text data set comprises a plurality of question texts carrying question category labels; obtaining a constructed first text classification network and a constructed second text classification network, wherein the number of network layers of the first text classification network is greater than that of the second text classification network; inputting each question text in the question text data set into a first text classification network and a second text classification network to respectively obtain hidden layer input data and a label distribution data set of the first text classification network and the second text classification network; obtaining a model compression loss function by utilizing hidden layer input data and a label distribution data set of the first text classification network and the second text classification network; based on the problem text data set, obtaining a classifier loss function by using label distribution data sets of the first text classification network and the second text classification network; obtaining a game loss function by using the model compression loss function and the classifier loss function; and adjusting network parameters of the second text classification network based on the game loss function to obtain a target text classification model. The method and the device are based on a model compression idea, and two text classification networks with different network layers are trained simultaneously by utilizing a problem text data set comprising a plurality of problem texts carrying problem category labels, so that the label distribution data set output by the text classification network with fewer network layers can approach to the text classification network with more network layers, and the two text classification networks are used as a whole to be in game with a classifier, thereby improving the classification precision of a text classification model constructed on the basis of the text classification network with fewer network layers, and realizing accurate classification of customer problems.
The foregoing description is merely an overview of the technical solutions of the present disclosure, and may be implemented according to the content of the specification in order to make the technical means of the present disclosure more clearly understood, and in order to make the above and other objects, features and advantages of the present disclosure more clearly understood, the following specific embodiments of the present disclosure are specifically described.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the disclosure. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 is a flow diagram of one implementation of a text classification model acquisition method provided by an embodiment of the present disclosure;
FIG. 2 is a schematic flow chart of a specific implementation of step S400 in a text classification model obtaining method according to an embodiment of the disclosure;
FIG. 3 is a schematic flow chart of a specific implementation of step S500 in a text classification model obtaining method according to an embodiment of the disclosure;
FIG. 4 illustrates a schematic diagram of a text multi-classification framework provided by embodiments of the present disclosure;
Fig. 5 is a schematic structural diagram of a text classification model obtaining apparatus according to an embodiment of the present disclosure;
fig. 6 shows a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As shown in fig. 1, a flowchart of one implementation of a text classification model obtaining method provided by an embodiment of the disclosure may include:
s100, obtaining a question text data set, wherein the question text data set comprises a plurality of question texts carrying question category labels.
Wherein the question text data set is a set of a plurality of question texts. The question text is text data of a question posed by a questioner. The question category label is used for marking the real question category to which the question text belongs. The question category label indicates the question category to which the user desires to divide the corresponding question text. The embodiment of the disclosure can add corresponding problem category labels to each problem text in the problem text data set in advance. For example: assume that the question text is "what is i want to ask about what is included in the particular work content of the post? The question category label added to the question text may be "work content".
And S200, obtaining the constructed first text classification network and the second text classification network, wherein the network layer number of the first text classification network is more than that of the second text classification network.
In general, the more the number of network layers, the more features of different levels can be extracted, so the more the number of network layers, the higher the classification accuracy of the text classification network, but the corresponding deployment and iteration becomes very difficult. Therefore, the embodiment of the disclosure trains a first text classification network with a larger network layer number on a server based on the model compression idea, so that the classification precision is as accurate as possible. And training a second text classification network with a smaller network layer number on the terminal for executing the actual problem text classification task. The entire web training process is optimized by approximating the learning result of the second text classification network to the learning result of the first text classification network.
Alternatively, the first text classification network may be a Resnet101 neural network. The second text classification network is a Resnet18 neural network.
S300, inputting each question text in the question text data set into a first text classification network and a second text classification network, and respectively obtaining hidden layer input data and tag distribution data sets of the first text classification network and the second text classification network.
The hidden layer input data is input data of a specific hidden layer in the text classification network. The specific hidden layer is set with a loss function. It will be appreciated that the first text classification network has a particular hidden layer of the same layout construction as the second text classification network.
The label distribution data set comprises label classification probability of the text classification network on the question category to which the question text belongs. For example: assume that question text numbered i in the question text dataset "i want to ask about what is included in the particular work content of the post? "work content" carrying problematic work tag "D i = [ 'work content': 1]If the question text is input into the first text classification network and the second text classification network respectively, the label classification probability of the first text classification network outputting the question text may be "T" i = [ 'work content': 0.7, 'post': 0.2, 'concrete': 0.1]The label classification probability of the output of the question text by the second text classification network may be "S" i = [ 'post': 0.4, 'work': 0.3, 'content': 0.3]”。
Specifically, in the embodiment of the disclosure, each question text in the question text data set is sequentially input into a first text classification network and a second text classification network at the same time, and tag distribution data sets output by the first text classification network and the second text classification network are respectively obtained through iteration of a plurality of hidden layer networks in the text classification network. In the embodiment of the disclosure, in the iterative process of the problem text passing through the hidden layer network, the input data of the specific hidden layer can be obtained as the hidden layer input data so as to carry out model compression subsequently.
S400, utilizing hidden layer input data and label distribution data sets of the first text classification network and the second text classification network to obtain a model compression loss function.
Specifically, the embodiment of the disclosure may calculate a loss function between two text classification networks based on hidden layer input data, and weight the loss function calculated based on tag distribution data sets of the two text classification networks according to a proportion to obtain a model compression loss function.
S500, based on the problem text data set, using the label distribution data sets of the first text classification network and the second text classification network to obtain a classifier loss function.
The embodiment of the disclosure can unify the label distribution data sets of the first text classification network and the second text classification as a whole, and predict the label classification probability of the problem text data set by the classifier from the first text classification network or the second text classification network.
Specifically, embodiments of the present disclosure may address any of the question text in the question text dataset: and correspondingly combining the problem text with the tag distribution probability in the tag distribution data set obtained after the problem text is input into the first text classification network or the second text classification network, inputting the problem text into a classifier in a data pair combination mode for prediction, and predicting whether the tag distribution probability in the data pair is generated by the first text classification network or the second text classification network. And obtaining a classifier loss function through the network classification true probability result of the data pair corresponding to each question text in the question text data set and the network classification prediction probability result obtained through the classifier output.
S600, obtaining a game loss function by using the model compression loss function and the classifier loss function.
The game loss function is composed of a model compression loss function and a classifier loss function.
According to the embodiment of the disclosure, the first text classification network and the second text classification network can be used as generators, and the maximum and minimum games are carried out between the generators and the classifiers through the model compression loss function and the classifier loss function, so that the first text classification network and the second text classification network can output more similar label distribution data sets, meanwhile, parameters of the classifiers are updated, classification capacity of the classifiers is improved, and the classification capacity of the classifiers is perfected in the process that the generators and the classifiers are mutually balanced.
And S700, adjusting network parameters of the second text classification network based on the game loss function to obtain a target text classification model.
Specifically, the embodiment of the disclosure may compare the numerical deviation between the game loss function and the expected value of the preset game parameter, and adjust the network parameter of the second text classification network with the goal of reducing the numerical deviation, that is, perform a corresponding increasing operation or a corresponding decreasing operation on a specific numerical value of the network parameter of the second text classification network. It can be appreciated that the preset gaming parameter expectations can be set according to actual requirements.
The text classification model obtaining method provided by the disclosure can be applied to the field of artificial intelligence or the field of finance. The method comprises the steps of obtaining a question text data set, wherein the question text data set comprises a plurality of question texts carrying question category labels; obtaining a constructed first text classification network and a constructed second text classification network, wherein the number of network layers of the first text classification network is greater than that of the second text classification network; inputting each question text in the question text data set into a first text classification network and a second text classification network to respectively obtain hidden layer input data and a label distribution data set of the first text classification network and the second text classification network; obtaining a model compression loss function by utilizing hidden layer input data and a label distribution data set of the first text classification network and the second text classification network; based on the problem text data set, obtaining a classifier loss function by using label distribution data sets of the first text classification network and the second text classification network; obtaining a game loss function by using the model compression loss function and the classifier loss function; and adjusting network parameters of the second text classification network based on the game loss function to obtain a target text classification model. The method and the device are based on a model compression idea, and two text classification networks with different network layers are trained simultaneously by utilizing a problem text data set comprising a plurality of problem texts carrying problem category labels, so that the label distribution data set output by the text classification network with fewer network layers can approach to the text classification network with more network layers, and the two text classification networks are used as a whole to be in game with a classifier, thereby improving the classification precision of a text classification model constructed on the basis of the text classification network with fewer network layers, and realizing accurate classification of customer problems.
In order to enable the network output of the second text classification network to be as close as possible to the first text classification network, embodiments of the present disclosure may associate different penalty functions for multiple particular hidden layers in the text classification network.
Alternatively, the hidden layer associated with the mean square error loss function (Mean Squared Error Loss Function, MSE) in both the first text classification network and the second text classification network is referred to as the mean square error loss layer. A hidden layer having a relative entropy loss function (Relative Entropy Loss Function, also referred to as KL divergence loss function) associated in both the first text classification network and the second text classification network is referred to as a relative entropy loss layer. The hidden layer associated with the cross entropy loss function (Cross Entropy Loss Function) in both the first text classification network and the second text classification network is referred to as a cross entropy loss layer.
Alternatively, the layout structure of the specific hidden layers in the first text classification network and the second text classification network may be a mean square error loss layer, a relative entropy loss layer, and a cross entropy loss layer in this order.
Optionally, the hidden layer input data includes mean square error loss layer input data, relative entropy loss layer input data, and cross entropy loss layer input data.
Optionally, based on the method shown in fig. 1, as shown in fig. 2, in the text classification model obtaining method provided in the embodiment of the present disclosure, the step S400 may include:
s410, obtaining a mean square error loss function between the first text classification network and the second text classification network by utilizing the mean square error loss layer input data of the first text classification network and the second text classification network.
Specifically, in the embodiment of the present disclosure, the input data of the mean square error loss layer of the first text classification network and the second text classification network may be substituted into the calculation formula of the mean square error loss function, so as to calculate the mean square error loss function between the first text classification network and the second text classification network.
S420, utilizing the relative entropy loss layer input data of the first text classification network and the second text classification network to obtain a relative entropy loss function between the first text classification network and the second text classification network.
Specifically, the embodiment of the disclosure may substitute the input data of the relative entropy loss layers of the first text classification network and the second text classification network into a calculation formula of the relative entropy loss function, and calculate the relative entropy loss function between the first text classification network and the second text classification network.
S430, obtaining a first cross entropy loss function between the first text classification network and the second text classification network by using the cross entropy loss layer input data of the first text classification network and the second text classification network.
Specifically, in the embodiment of the present disclosure, the cross entropy loss layer input data of the first text classification network and the second text classification network may be substituted into a calculation formula of the cross entropy loss function, so as to calculate a first cross entropy loss function between the first text classification network and the second text classification network.
S440, obtaining a second cross entropy loss function between the first text classification network and the second text classification network by using the label distribution data sets of the first text classification network and the second text classification network.
Specifically, in the embodiment of the present disclosure, the tag distribution data sets of the first text classification network and the second text classification network may be substituted into a calculation formula of the cross entropy loss function, so as to calculate a second cross entropy loss function between the first text classification network and the second text classification network.
S450, obtaining a third cross entropy loss function between the second text classification network and the question category label set by using the label distribution data set of the second text classification network and the question category label set, wherein the question category label set comprises question category labels corresponding to all the question texts in the question text data set.
Specifically, the embodiment of the disclosure may substitute the label distribution data set and the problem category label set of the second text classification network into a calculation formula of the cross entropy loss function, and calculate a third cross entropy loss function between the second text classification network and the problem category label set.
S460, obtaining a model compression loss function by using the mean square error loss function, the relative entropy loss function, the first cross entropy loss function, the second cross entropy loss function and the third cross entropy loss function.
Specifically, the embodiment of the disclosure may weight the mean square error loss function, the relative entropy loss function, the first cross entropy loss function, the second cross entropy loss function, and the third cross entropy loss function according to a proportion, to obtain a model compression loss function.
According to the embodiment of the disclosure, the mean square error loss function, the relative entropy loss function and the cross entropy loss function are respectively associated in the hidden layers in the first text classification network and the second text classification network, the text classification network is used for comparing and learning network performances according to the corresponding hidden layer input data and the output tag distribution data set generated by the problematic text, the difference between the tag distribution data sets of the first text classification network and the second text classification network is accurately estimated, and the network output of the second text classification network can be as close to the first text classification network as possible, so that the classification precision of the second text classification network is improved.
Alternatively, embodiments of the present disclosure may be according to the formula:
L MC =λ(L CE (P tea ,P stu )+L CE (P data ,P stu ))+(1-λ)(L MSE (P tea1 ,P stu1 )+L KL (P tea2 ,P stu2 )+L CE (P tea3 ,P stu3 ))
calculating a model compression loss function, wherein L MC Compressing the loss function for the model; λ is a weight factor; l (L) MSE (P tea1 ,P stu1 ) As a mean square error loss function, P tea1 Inputting data for a mean square error loss layer of a first text classification network, P stu1 Inputting data for a mean square error loss layer of the second text classification network; l (L) KL (P tea2 ,P stu2 ) As a relative entropy loss function, P tea2 Inputting data for a relative entropy loss layer of a first text classification network, P stu2 Inputting data for a relative entropy loss layer of the second text classification network; l (L) CE (P tea3 ,P stu3 ) For the first cross entropy loss function, P tea3 Inputting data for cross entropy loss layer of first text classification network, P stu3 Inputting data for a cross entropy loss layer of the second text classification network; l (L) CE (P tea ,P stu ) For the second cross entropy loss function, P tea Distributing a dataset for tags of a first text classification network, P stu Distributing a dataset for tags of a second text classification network; l (L) CE (P data ,P stu ) For the third cross entropy loss function, P data A set of problem category labels.
According to the embodiment of the disclosure, corresponding weighted summation calculation is performed on the mean square error loss function, the relative entropy loss function, the first cross entropy loss function, the second cross entropy loss function and the third cross entropy loss function according to the proportion, so that an accurate model compression loss function can be obtained.
Optionally, before step S700, embodiments of the present disclosure may further update network parameters of the second text classification network based on the mean square error loss function, the relative entropy loss function, and the first cross entropy loss function.
According to the method and the device for classifying the text, the minimized mean square error loss function, the relative entropy loss function and the first cross entropy loss function can be used as training targets, and the network parameters of the second text classification network are adjusted, so that the network output of the second text classification network after the network parameters are adjusted and updated is more prone to the first text classification network, and the classification precision of the second text classification network is improved.
The embodiment of the disclosure can generate original tag data based on the tag distribution data set output by the first text classification network and the second text classification network and the corresponding problem text in the problem text data set, and then input the tag data into a classifier to judge whether the tag data corresponds to the first text classification network or the second text classification network.
Optionally, based on the method shown in fig. 1, as shown in fig. 3, in the text classification model obtaining method provided in the embodiment of the present disclosure, the step S500 may include:
S510, for any question text in the question text data set: first tag data and second tag data corresponding to the question text in the tag distribution data sets of the first text classification network and the second text classification network are determined, respectively.
S520, combining the first tag data and the question text into a first text tag data pair, and adding a first soft tag to the first text tag data pair.
S530, combining the second tag data and the problem text into a second text tag data pair, and adding a second soft tag to the second text tag data pair.
Assume that the question text dataset is "x= { X 1 ,X 2 ,...,X m The label distribution data set output by the first text classification network is 'T= { T }' 1 ,T 2 ,...,T m The label distribution data set output by the second text classification network is "s= { S }" 1 ,S 2 ,...,S m "for X i 、T i And S is i Respectively in text label data pair "(X) i ,T i ) "AND" (X) i ,S i ) "forms are correspondingly combined and corresponding soft labels" R "are added to the text label data pairs i (T 1 orS i ) "whether the text label data pair was generated by the first text classification network or the second text classification network is predicted by the classifier.
S540, based on the first text label data pair and the second text label data pair corresponding to each question text, obtaining a network classification true probability result.
It can be understood that based on the soft tag carried by the text tag data pair, the network classification real probability of the text tag data pair can be obtained, and then the network classification real probability of each text tag data pair is used as a network classification real probability result.
S550, inputting the first text label data pair and the second text label data pair corresponding to each question text into the classifier to obtain a network classification prediction probability result output by the classifier.
Wherein the classifier is a TextRNN model that solves the text classification problem using a recurrent neural network (Recurrent Neural Network, RNN).
Embodiments of the present disclosure may utilize a Word Embedding layer (Word Embedding) in a classifier to obtain each question text X in a question text dataset i The word embedding of the text label data is further carried out, and then the network classification prediction probability for predicting the text label data is output through a cyclic neural network connected with a softmax activation function, so that a network classification prediction probability result is obtained.
S560, obtaining a classifier loss function based on the network classification true probability result and the network classification prediction probability result.
According to the embodiment of the disclosure, the network classification true probability result and the network classification prediction probability result can be substituted into a calculation formula of the cross entropy loss function, so that the classifier loss function is calculated.
The embodiment of the disclosure is based on the idea of classification problems, uses a corresponding combination form of text label data pairs, predicts the network classification of the text label data pairs through a classifier so as to calculate an accurate classifier loss function by combining the network classification real probability of the text label data pairs, and effectively evaluates the difference between the real data label distribution and a label distribution data set output by a second text classification network.
It will be appreciated that the TextRNN model will adjust its own network parameters appropriately, i.e. perform parameter updating of the classifier, according to the accuracy of each round of prediction.
Alternatively, embodiments of the present disclosure may be according to the formula:
calculating a classifier loss function, wherein L D A loss function for the classifier; j represents the number of the question text in the question text dataset; m represents the number of question texts in the question text dataset; i denotes a soft tag, a first soft tag when i=0, a second soft tag when i=1, p (X ij ) Representing the real probability of the network classification corresponding to the j-th problem text under the soft label i in the real probability result of the network classification;and the network classification prediction probability corresponding to the j-th problem text under the soft label i in the network classification prediction probability result is represented.
According to the embodiment of the disclosure, the accurate classifier loss function can be obtained by calculating the cross entropy of the network classification real probability result and the network classification prediction probability result, so that the difference between the real data label distribution and the label distribution data set output by the second text classification network is effectively evaluated, and an accurate reference basis is provided for the subsequent network parameter adjustment of the second text classification network.
Alternatively, embodiments of the present disclosure may be according to the formula:
L G =min maxμL MC +(1-μ)L D
calculating a game loss function, wherein L G Is a game loss function; mu is game trend parameter; l (L) MC Compressing the loss function for the model; l (L) D Is a classifier loss function.
Wherein the larger the gaming trend parameter, the more likely the gaming process will be to bring the data distribution (i.e., soft labels) generated by the second text classification network closer to the data distribution generated by the first text classification network.
According to the embodiment of the disclosure, the first text classification network and the second text classification network are taken as the whole generator, and the training result of the first text classification network can be learned by the second text classification network more accurately through the game process between the generator and the classifier, so that the classification precision of the second text classification network is improved.
In order to facilitate understanding of the flow of the text classification model obtaining method provided in the embodiment of the present disclosure, the description is herein with reference to fig. 4: fig. 4 is a schematic diagram of a text multi-classification framework provided by the embodiment of the present disclosure, and by using the idea of model compression, a first text classification network with a larger number of network layers is respectively established as a large network, a second text classification network with a smaller number of network layers is established as a small network, and a model compression loss function is calculated based on a loss function calculated by a corresponding hidden layer in each text classification network, so that network outputs of the large network and the small network are similar as much as possible in a training process, thereby improving a classification effect of the small network. Through the idea of game, the training result of the large-scale network can be learned more accurately by the small-scale network through the game process between the large-scale network and the small-scale network and the classifier, so that the classification precision of the small-scale network to the problem text is improved. Meanwhile, the small-sized network can be deployed in a common terminal, training and iteration are carried out in the common terminal, the training speed and the classifying speed of the small-sized network can be greatly improved, the memory consumption is effectively reduced, and the automatic question-answering system carrying the small-sized network can be deployed in a large scale in practical application, so that bank clients can be better served.
Although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
Corresponding to the above method embodiment, the embodiment of the present disclosure further provides a text classification model obtaining device, where the structure of the text classification model obtaining device is shown in fig. 5, and the text classification model obtaining device may include: a question text data set obtaining unit 100, a text classification network obtaining unit 200, a data obtaining unit 300, a model compression loss function obtaining unit 400, a classifier loss function obtaining unit 500, a game loss function obtaining unit 600, and a target text classification model obtaining unit 700.
A question text data set obtaining unit 100, configured to obtain a question text data set, where the question text data set includes a plurality of question texts carrying question category labels.
The text classification network obtaining unit 200 is configured to obtain a constructed first text classification network and a constructed second text classification network, where the number of network layers of the first text classification network is greater than the number of network layers of the second text classification network.
The data obtaining unit 300 is configured to input each question text in the question text data set into the first text classification network and the second text classification network, and obtain hidden layer input data and tag distribution data sets of the first text classification network and the second text classification network, respectively.
A model compression loss function obtaining unit 400, configured to obtain a model compression loss function by using hidden layer input data and tag distribution data sets of the first text classification network and the second text classification network.
The classifier loss function obtaining unit 500 is configured to obtain a classifier loss function based on the question text data set using the tag distribution data sets of the first text classification network and the second text classification network.
A game loss function obtaining unit 600, configured to obtain a game loss function by using the model compression loss function and the classifier loss function.
The target text classification model obtaining unit 700 is configured to adjust network parameters of the second text classification network based on the game loss function, so as to obtain a target text classification model.
Optionally, the hidden layer input data includes mean square error loss layer input data, relative entropy loss layer input data, and cross entropy loss layer input data.
Alternatively, the model compression loss function obtaining unit 400 may include: the method comprises a mean square error loss function obtaining subunit, a relative entropy loss function obtaining subunit, a first cross entropy loss function obtaining subunit, a second cross entropy loss function obtaining subunit, a third cross entropy loss function obtaining subunit and a model compression loss function obtaining subunit.
And the mean square error loss function obtaining subunit is used for obtaining the mean square error loss function between the first text classification network and the second text classification network by utilizing the mean square error loss layer input data of the first text classification network and the second text classification network.
A relative entropy loss function obtaining subunit, configured to obtain a relative entropy loss function between the first text classification network and the second text classification network by using the relative entropy loss layer input data of the first text classification network and the second text classification network.
A first cross entropy loss function obtaining subunit configured to obtain a first cross entropy loss function between the first text classification network and the second text classification network using cross entropy loss layer input data of the first text classification network and the second text classification network.
A second cross entropy loss function obtaining subunit for obtaining a second cross entropy loss function between the first text classification network and the second text classification network using the tag distribution data sets of the first text classification network and the second text classification network.
The third cross entropy loss function obtaining subunit is configured to obtain a third cross entropy loss function between the second text classification network and the question category label set by using the label distribution data set of the second text classification network and the question category label set, where the question category label set includes question category labels corresponding to each question text in the question text data set.
The model compression loss function obtaining subunit is configured to obtain a model compression loss function by using the mean square error loss function, the relative entropy loss function, the first cross entropy loss function, the second cross entropy loss function, and the third cross entropy loss function.
Alternatively, the model compression loss function obtaining subunit may be specifically configured to: l (L) MC =λ(L CE (P tea ,P stu )+L CE (P data ,P stu ))+(1-λ)(L MSE (P tea1 ,P stu1 )+L KL (P tea2 ,P stu2 )+L CE (P tea3 ,P stu3 ))
Calculating a model compression loss function, wherein L MC Compressing the loss function for the model; λ is a weight factor; l (L) MSE (P tea1 ,P stu1 ) As a mean square error loss function, P tea1 Inputting data for a mean square error loss layer of a first text classification network, P stu1 Inputting data for a mean square error loss layer of the second text classification network; l (L) KL (P tea2 ,P stu2 ) As a relative entropy loss function, P tea2 Inputting data for a relative entropy loss layer of a first text classification network, P stu2 Inputting data for a relative entropy loss layer of the second text classification network; l (L) CE (P tea3 ,P stu3 ) For the first cross entropy loss function, P tea3 Inputting data for cross entropy loss layer of first text classification network, P stu3 Inputting data for a cross entropy loss layer of the second text classification network; l (L) CE (P tea ,P stu ) For the second cross entropy loss function, P tea Distributing a dataset for tags of a first text classification network, P stu Distributing a dataset for tags of a second text classification network; l (L) CE (P data ,P stu ) For the third cross entropy loss function, P data A set of problem category labels.
Optionally, the text classification model obtaining device may further include a network parameter updating unit.
The network parameter updating unit is configured to adjust the network parameters of the second text classification network based on the game loss function by using the target text classification model obtaining unit 700, and update the network parameters of the second text classification network based on the mean square error loss function, the relative entropy loss function, and the first cross entropy loss function before obtaining the target text classification model.
Alternatively, the classifier loss function obtaining unit 500 may include: the method comprises a tag data determining subunit, a first text tag data pair combining subunit, a second text tag data pair combining subunit, a true probability result obtaining subunit, a prediction probability result obtaining subunit and a classifier loss function obtaining subunit.
A tag data determination subunit configured to, for any one of the question text in the question text data set: first tag data and second tag data corresponding to the question text in the tag distribution data sets of the first text classification network and the second text classification network are determined, respectively.
And the first text label data pair combining subunit is used for combining the first label data and the question text into a first text label data pair and adding a first soft label to the first text label data pair.
And a second text label data pair combining subunit configured to combine the second label data and the question text into a second text label data pair, and add a second soft label to the second text label data pair.
The true probability result obtaining subunit is configured to obtain a network classification true probability result based on the first text label data pair and the second text label data pair corresponding to each question text.
The prediction probability result obtaining subunit is used for inputting the first text label data pair and the second text label data pair corresponding to each question text into the classifier to obtain a network classification prediction probability result output by the classifier.
The classifier loss function obtaining subunit is used for obtaining a classifier loss function based on the network classification true probability result and the network classification prediction probability result.
Optionally, the classifier loss function obtaining subunit may be specifically configured to:
calculating a classifier loss function, wherein L D A loss function for the classifier; j represents the number of the question text in the question text dataset; m represents the number of question texts in the question text dataset; i denotes a soft tag, a first soft tag when i=0, a second soft tag when i=1, p (X ij ) Representing the real probability of the network classification corresponding to the j-th problem text under the soft label i in the real probability result of the network classification;and the network classification prediction probability corresponding to the j-th problem text under the soft label i in the network classification prediction probability result is represented.
Alternatively, the game loss function obtaining unit 600 may specifically be according to the formula:
L G =min maxμL MC +(1-μ)L D
Calculating a game loss function, wherein L G Is a game loss function; mu is game trend parameter; l (L) MC Compressing the loss function for the model; l (L) D Is a classifier loss function.
The text classification model obtaining device provided by the disclosure can be applied to the field of artificial intelligence or the field of finance. The method comprises the steps of obtaining a question text data set, wherein the question text data set comprises a plurality of question texts carrying question category labels; obtaining a constructed first text classification network and a constructed second text classification network, wherein the number of network layers of the first text classification network is greater than that of the second text classification network; inputting each question text in the question text data set into a first text classification network and a second text classification network to respectively obtain hidden layer input data and a label distribution data set of the first text classification network and the second text classification network; obtaining a model compression loss function by utilizing hidden layer input data and a label distribution data set of the first text classification network and the second text classification network; based on the problem text data set, obtaining a classifier loss function by using label distribution data sets of the first text classification network and the second text classification network; obtaining a game loss function by using the model compression loss function and the classifier loss function; and adjusting network parameters of the second text classification network based on the game loss function to obtain a target text classification model. The method and the device are based on a model compression idea, and two text classification networks with different network layers are trained simultaneously by utilizing a problem text data set comprising a plurality of problem texts carrying problem category labels, so that the label distribution data set output by the text classification network with fewer network layers can approach to the text classification network with more network layers, and the two text classification networks are used as a whole to be in game with a classifier, thereby improving the classification precision of a text classification model constructed on the basis of the text classification network with fewer network layers, and realizing accurate classification of customer problems.
The specific manner in which the individual units perform the operations in relation to the apparatus of the above embodiments has been described in detail in relation to the embodiments of the method and will not be described in detail here.
The text classification model obtaining device includes a processor and a memory, the above-described question text data set obtaining unit 100, the text classification network obtaining unit 200, the data obtaining unit 300, the model compression loss function obtaining unit 400, the classifier loss function obtaining unit 500, the game loss function obtaining unit 600, the target text classification model obtaining unit 700, and the like are stored in the memory as program units, and the above-described program units stored in the memory are executed by the processor to realize the corresponding functions.
The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can set one or more than one of the text classification networks based on the model compression thought by adjusting kernel parameters, and two text classification networks with different network layers are trained simultaneously by utilizing a problem text data set comprising a plurality of problem texts carrying problem category labels, so that the label distribution data set output by the text classification network with fewer network layers can approach to the text classification network with more network layers, and the two text classification networks are used as a whole for game with a classifier, thereby improving the classification precision of a text classification model constructed based on the text classification network with fewer network layers and realizing accurate classification of customer problems.
The embodiment of the present disclosure provides a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the text classification model obtaining method.
The embodiment of the disclosure provides a processor for running a program, wherein the program runs to execute the text classification model obtaining method.
As shown in fig. 6, an embodiment of the present disclosure provides an electronic device 1000, the electronic device 1000 comprising at least one processor 1001, and at least one memory 1002, bus 1003 connected to the processor 1001; wherein, the processor 1001 and the memory 1002 complete communication with each other through the bus 1003; the processor 1001 is configured to call the program instructions in the memory 1002 to perform the text classification model obtaining method described above. The electronic device herein may be a server, a PC, a PAD, a mobile phone, etc.
The present disclosure also provides a computer program product adapted to perform a program initialized with the steps of a text classification model obtaining method when executed on an electronic device.
It should be noted that the method, the device, the storage medium and the electronic equipment for obtaining the text classification model provided by the present disclosure may be used in the artificial intelligence field or the financial field. The foregoing is merely an example, and the application fields of the method, the apparatus, the storage medium and the electronic device for obtaining the text classification model provided in the present disclosure are not limited.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, electronic devices (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, the electronic device includes one or more processors (CPUs), memory, and a bus. The electronic device may also include input/output interfaces, network interfaces, and the like.
The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
In the description of the present disclosure, it should be understood that, if the directions or positional relationships indicated by the terms "upper", "lower", "front", "rear", "left" and "right", etc., are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the positions or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limitations of the present disclosure.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
It will be appreciated by those skilled in the art that embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The foregoing is merely exemplary of the present disclosure and is not intended to limit the present disclosure. Various modifications and variations of this disclosure will be apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present disclosure, are intended to be included within the scope of the claims of the present disclosure.
Claims (10)
1. A method for obtaining a text classification model, comprising:
obtaining a question text data set, wherein the question text data set comprises a plurality of question texts carrying question category labels;
obtaining a constructed first text classification network and a constructed second text classification network, wherein the number of network layers of the first text classification network is greater than that of the second text classification network;
inputting each question text in the question text data set into the first text classification network and the second text classification network to respectively obtain hidden layer input data and label distribution data sets of the first text classification network and the second text classification network;
obtaining a model compression loss function by utilizing hidden layer input data and a label distribution data set of the first text classification network and the second text classification network;
Obtaining a classifier loss function based on the question text dataset using the tag distribution datasets of the first text classification network and the second text classification network;
utilizing the model compression loss function and the classifier loss function to obtain a game loss function;
and adjusting network parameters of the second text classification network based on the game loss function to obtain a target text classification model.
2. The method of claim 1, wherein the hidden layer input data comprises mean square error loss layer input data, relative entropy loss layer input data, and cross entropy loss layer input data, wherein the obtaining a model compression loss function using the hidden layer input data and the tag distribution data set of the first text classification network and the second text classification network comprises:
obtaining a mean square error loss function between the first text classification network and the second text classification network by utilizing the mean square error loss layer input data of the first text classification network and the second text classification network;
obtaining a relative entropy loss function between the first text classification network and the second text classification network using the relative entropy loss layer input data of the first text classification network and the second text classification network;
Obtaining a first cross entropy loss function between the first text classification network and the second text classification network using the cross entropy loss layer input data of the first text classification network and the second text classification network;
obtaining a second cross entropy loss function between the first text classification network and the second text classification network using the tag distribution dataset of the first text classification network and the second text classification network;
obtaining a third cross entropy loss function between the second text classification network and the problem category label set by using a label distribution data set of the second text classification network and the problem category label set, wherein the problem category label set comprises problem category labels corresponding to each problem text in the problem text data set;
and obtaining a model compression loss function by using the mean square error loss function, the relative entropy loss function, the first cross entropy loss function, the second cross entropy loss function and the third cross entropy loss function.
3. The method of claim 2, wherein said obtaining a model compression loss function using said mean square error loss function, said relative entropy loss function, said first cross entropy loss function, said second cross entropy loss function, and said third cross entropy loss function, comprises:
According to the formula:
L MC =λ(L CE (P tea ,P stu )+L CE (P data ,P stu ))+(1-λ)(L MSE (P tea1 ,P stu1 )+L KL (P tea2 ,P stu2 )+L CE (P tea3 ,P stu3 ))
calculating a model compression loss function, wherein L MC Compressing the loss function for the model; lambda (lambda)Is a weight factor; l (L) MSE (P tea1 ,P stu1 ) P is the mean square error loss function tea1 Inputting data for the mean square error loss layer of the first text classification network, P stu1 Inputting data for the mean square error loss layer of the second text classification network; l (L) KL (P tea2 ,P stu2 ) P is the relative entropy loss function tea2 Inputting data for the relative entropy loss layer of the first text classification network, P stu2 Inputting data for the relative entropy loss layer of the second text classification network; l (L) CE (P tea3 ,P stu3 ) For the first cross entropy loss function, P tea3 Inputting data for the cross entropy loss layer of the first text classification network, P stu3 Inputting data for the cross entropy loss layer of the second text classification network; l (L) CE (P tea ,P stu ) P being the second cross entropy loss function tea Distributing a dataset for tags of said first text classification network, P stu Distributing a dataset for tags of the second text classification network; l (L) CE (P data ,P stu ) For the third cross entropy loss function, P data And a set of labels for the problem category.
4. The method of claim 2, wherein prior to said adjusting network parameters of said second text classification network based on said gaming loss function to obtain a target text classification model, said method further comprises:
Updating the network parameters of the second text classification network based on the mean square error loss function, the relative entropy loss function, and the first cross entropy loss function.
5. The method of claim 1, wherein the obtaining a classifier loss function based on the question text dataset using tag distribution datasets of the first text classification network and the second text classification network comprises:
for any of the question text in the question text dataset: determining first tag data and second tag data corresponding to the question text in tag distribution data sets of the first text classification network and the second text classification network respectively; combining the first tag data and the question text into a first text tag data pair, and adding a first soft tag to the first text tag data pair; combining the second tag data and the question text into a second text tag data pair, and adding a second soft tag to the second text tag data pair;
based on the first text label data pair and the second text label data pair corresponding to each question text, obtaining a network classification true probability result;
Inputting the first text label data pair and the second text label data pair corresponding to each question text into a classifier to obtain a network classification prediction probability result output by the classifier;
and obtaining a classifier loss function based on the network classification true probability result and the network classification prediction probability result.
6. The method of claim 5, wherein the obtaining a classifier loss function based on the network classification true probability result and the network classification predicted probability result comprises:
according to the formula:
calculating a classifier loss function, wherein L D -a loss function for the classifier; j represents the number of the question text in the question text data set; m represents the number of question texts in the question text data set; i denotes a soft tag, the first soft tag when i=0, the second soft tag when i=1, p (X ij ) Representing the network classification trueIn the real probability result, the real probability of the network classification corresponding to the problem text of the j-th bit under the soft label i;and representing the network classification prediction probability corresponding to the problem text of the j-th bit under the soft label i in the network classification prediction probability result.
7. The method of claim 1, wherein said utilizing said model compression loss function and said classifier loss function to obtain a gaming loss function comprises:
according to the formula:
L G =min maxμL MC +(1-μ)L D
calculating a game loss function, wherein L G -providing said gaming loss function; mu is game trend parameter; l (L) MC Compressing a loss function for the model; l (L) D And a loss function for the classifier.
8. A text classification model obtaining apparatus, characterized by comprising: a question text data set obtaining unit, a text classification network obtaining unit, a data obtaining unit, a model compression loss function obtaining unit, a classifier loss function obtaining unit, a game loss function obtaining unit and a target text classification model obtaining unit,
the question text data set obtaining unit is used for obtaining a question text data set, wherein the question text data set comprises a plurality of question texts carrying question category labels;
the text classification network obtaining unit is used for obtaining a constructed first text classification network and a constructed second text classification network, wherein the number of network layers of the first text classification network is greater than that of the second text classification network;
The data obtaining unit is configured to input each of the question texts in the question text data set into the first text classification network and the second text classification network, and obtain hidden layer input data and a tag distribution data set of the first text classification network and the second text classification network respectively;
the model compression loss function obtaining unit is used for obtaining a model compression loss function by utilizing hidden layer input data and tag distribution data sets of the first text classification network and the second text classification network;
the classifier loss function obtaining unit is used for obtaining a classifier loss function by using the label distribution data sets of the first text classification network and the second text classification network based on the problem text data set;
the game loss function obtaining unit is used for obtaining a game loss function by utilizing the model compression loss function and the classifier loss function;
the target text classification model obtaining unit is used for adjusting network parameters of the second text classification network based on the game loss function to obtain a target text classification model.
9. A computer-readable storage medium having a program stored thereon, which when executed by a processor implements the text classification model obtaining method according to any one of claims 1 to 7.
10. An electronic device comprising at least one processor, and at least one memory, bus connected to the processor; the processor and the memory complete communication with each other through the bus; the processor is configured to invoke program instructions in the memory to perform the text classification model acquisition method of any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310932443.3A CN116932759A (en) | 2023-07-27 | 2023-07-27 | Text classification model obtaining method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310932443.3A CN116932759A (en) | 2023-07-27 | 2023-07-27 | Text classification model obtaining method and device, storage medium and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116932759A true CN116932759A (en) | 2023-10-24 |
Family
ID=88377025
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310932443.3A Pending CN116932759A (en) | 2023-07-27 | 2023-07-27 | Text classification model obtaining method and device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116932759A (en) |
-
2023
- 2023-07-27 CN CN202310932443.3A patent/CN116932759A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10929614B2 (en) | Automated contextual dialog generation for cognitive conversation | |
KR102302609B1 (en) | Neural Network Architecture Optimization | |
US11550871B1 (en) | Processing structured documents using convolutional neural networks | |
US20220121906A1 (en) | Task-aware neural network architecture search | |
CN110852447A (en) | Meta learning method and apparatus, initialization method, computing device, and storage medium | |
CN109766454A (en) | A kind of investor's classification method, device, equipment and medium | |
CN106170800A (en) | Student DNN is learnt via output distribution | |
CN112785086A (en) | Credit overdue risk prediction method and device | |
WO2020220692A1 (en) | Deep neural network and training therefor | |
CN110046799A (en) | Decision optimization method and device | |
CN113011646A (en) | Data processing method and device and readable storage medium | |
CN112785005A (en) | Multi-target task assistant decision-making method and device, computer equipment and medium | |
CN115130536A (en) | Training method of feature extraction model, data processing method, device and equipment | |
Verma et al. | Real-time classification of national and international students for ICT and mobile technology: an experimental study on Indian and Hungarian university | |
CN115186192A (en) | Information processing method, device, storage medium and equipment | |
CN113743678B (en) | User credit score prediction method and related equipment | |
CN115423597A (en) | Truck ETC financial data intelligence wind control decision big data system and method | |
Wang | Artificial Intelligence‐Driven Model for Production Innovation of Sports News Dissemination | |
CN116975686A (en) | Method for training student model, behavior prediction method and device | |
CN115129902B (en) | Media data processing method, device, equipment and storage medium | |
CN116932759A (en) | Text classification model obtaining method and device, storage medium and electronic equipment | |
CN115953245A (en) | Stock trend prediction method and device based on sequence-to-graph | |
CN115795153A (en) | CTR recommendation method based on feature interaction and score integration | |
CN115018627A (en) | Credit risk evaluation method and device, storage medium and electronic equipment | |
CN104636489B (en) | The treating method and apparatus of attribute data is described |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |