CN115994225B - Text classification method and device, storage medium and electronic equipment - Google Patents

Text classification method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN115994225B
CN115994225B CN202310273838.7A CN202310273838A CN115994225B CN 115994225 B CN115994225 B CN 115994225B CN 202310273838 A CN202310273838 A CN 202310273838A CN 115994225 B CN115994225 B CN 115994225B
Authority
CN
China
Prior art keywords
sample
preset
target
candidate
parameter vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310273838.7A
Other languages
Chinese (zh)
Other versions
CN115994225A (en
Inventor
苏海波
李霖枫
杜晓梦
刘译璟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Percent Technology Group Co ltd
Original Assignee
Beijing Percent Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Percent Technology Group Co ltd filed Critical Beijing Percent Technology Group Co ltd
Priority to CN202310273838.7A priority Critical patent/CN115994225B/en
Publication of CN115994225A publication Critical patent/CN115994225A/en
Application granted granted Critical
Publication of CN115994225B publication Critical patent/CN115994225B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The disclosure relates to a text classification method, a device, a storage medium and electronic equipment, and relates to the technical field of computers, wherein the method comprises the following steps: and acquiring the target text. And obtaining target input data according to a target text and a target classification template, wherein the target classification template comprises a target parameter vector and a target natural language template, the target parameter vector is obtained by training a first preset network model according to first training sample data, the first training sample data is marked with sample data of categories, and the first preset network model comprises a preset parameter vector and a preset classification model. Inputting target input data into a preset target text classification model to obtain target text categories output by the target text classification model, wherein the target text classification model is obtained by training a second preset network model according to second training sample data, the second training sample data is sample data of unlabeled categories, and the second preset network model comprises target parameter vectors and preset classification models.

Description

Text classification method and device, storage medium and electronic equipment
Technical Field
The disclosure relates to the field of computer technology, and in particular relates to a text classification method, a text classification device, a storage medium and electronic equipment.
Background
Aiming at the multi-classification problem, the existing typical small sample learning methods comprise a natural language template method PET (Pattern-Exploting Training) and a parameter vector template method P-Tuning, and aiming at a marked sample data set, a corresponding model is trained. The natural language template method PET needs manual template construction, the difference of different template effects is larger, and the template learned by the parameter vector template method P-Tuning lacks of interpretability. In addition, the two methods only aim at the marked sample data set to carry out model training, and a large amount of unmarked sample data cannot be fully utilized.
Disclosure of Invention
The disclosure aims to provide a text classification method, a text classification device, a storage medium and electronic equipment, which are used for improving the accuracy of text classification.
According to a first aspect of embodiments of the present disclosure, there is provided a method of classifying text, the method comprising:
acquiring a target text;
obtaining target input data according to the target text and a target classification template, wherein the target classification template comprises a target parameter vector and a target natural language template, the target parameter vector is obtained by training a first preset network model according to first training sample data, the first training sample data is sample data marked with categories, and the first preset network model comprises a preset parameter vector and a preset classification model;
Inputting the target input data into a preset target text classification model to obtain a target text class output by the target text classification model, wherein the target text classification model is obtained by training a second preset network model according to second training sample data, the second training sample data is sample data of unlabeled classes, and the second preset network model comprises the target parameter vector and the preset classification model.
Optionally, the first training sample data includes at least one preset classification template; the target parameter vector and the target text classification model are determined by:
training the first preset network model according to the first training sample data aiming at each preset classification template to obtain candidate parameter vectors corresponding to the preset classification templates;
training a standby network model corresponding to the candidate parameter vector according to the second training sample data aiming at each candidate parameter vector to obtain a candidate text classification model corresponding to the candidate parameter vector, wherein the standby network model comprises the candidate vector parameters and the preset classification model;
And determining the target parameter vector and the target text classification model from the candidate parameter vector and the candidate text classification model according to a preset verification data set, wherein the preset verification data set comprises a sample verification text and a sample verification category corresponding to the sample verification text.
Optionally, the first training sample data includes first sample input data and a first sample category corresponding to the first sample input data; training the first preset network model according to the first training sample data, and obtaining candidate parameter vectors corresponding to the preset classification templates comprises the following steps:
training the first preset network model according to the first sample input data and the first sample category to obtain the candidate parameter vector.
Optionally, the first sample input data includes the preset classification template and a first sample, and the preset classification template includes a preset parameter vector and a preset natural language template; training the first preset network model according to the first sample input data and the first sample category, and obtaining the candidate parameter vector includes:
Obtaining the first sample input data according to the first sample text and the preset classification template;
and taking the first sample input data as the input of the first preset network model, taking the first sample class as the output of the first preset network model, and training the first preset network model to obtain the candidate parameter vector.
Optionally, the second training sample data includes second sample input data and second sample output data corresponding to the second sample input data; training the standby network model corresponding to the candidate parameter vector according to the second training sample data, and obtaining a candidate text classification model corresponding to the candidate parameter vector includes:
and training the standby network model according to the second sample input data and the second sample output data aiming at each candidate parameter vector to obtain the candidate text classification model.
Optionally, the second sample input data includes a candidate classification template and a second sample text, the candidate classification template including the candidate parameter vector and the preset natural language template; the second sample output data is a text extracted from a preset sample text, and the second sample text is a text obtained after the second sample output data is extracted from the preset sample text; training the standby network model according to the second sample input data and the second sample output data to obtain the candidate text classification model comprises the following steps:
Obtaining the second sample input data according to the second sample text and the candidate classification templates;
and taking the second sample input data as the input of the standby network model, taking the second sample output data as the output of the standby network model, and training the standby network model to obtain the candidate text classification model.
Optionally, the determining the target parameter vector and the target text classification model from the candidate parameter vector and the candidate text classification model according to a preset verification data set includes:
for each candidate parameter vector, verifying the candidate classification templates corresponding to the text and the candidate parameter vector according to the sample to obtain verification input data;
taking the verification input data as the input of the candidate text classification model corresponding to the candidate parameter vector, and obtaining a target verification category output by the candidate text classification model;
determining the classification accuracy of each candidate network model according to the target verification category and the sample verification category, wherein the candidate network model comprises the candidate parameter vector and the candidate text classification model corresponding to the candidate parameter vector;
And taking the candidate parameter vector in the candidate network model with the highest classification accuracy as the target parameter vector, and taking the candidate text classification model in the candidate network model with the highest classification accuracy as the target text classification model.
According to a second aspect of embodiments of the present disclosure, there is provided a text classification apparatus, the apparatus comprising:
the acquisition module is used for acquiring the target text;
the input module is used for obtaining target input data according to the target text and the target classification template, the target classification template comprises a target parameter vector and a target natural language template, the target parameter vector is obtained by training a first preset network model according to first training sample data, the first training sample data is sample data marked with categories, and the first preset network model comprises a preset parameter vector and a preset classification model;
the classification module is used for inputting the target input data into a preset target text classification model to obtain a target text class output by the target text classification model, the target text classification model is obtained by training a second preset network model according to second training sample data, the second training sample data is sample data of unlabeled classes, and the second preset network model comprises the target parameter vector and the preset classification model.
Optionally, the first training sample data includes at least one preset classification template; the target parameter vector and the target text classification model are determined by:
training the first preset network model according to the first training sample data aiming at each preset classification template to obtain candidate parameter vectors corresponding to the preset classification templates;
training a standby network model corresponding to the candidate parameter vector according to the second training sample data aiming at each candidate parameter vector to obtain a candidate text classification model corresponding to the candidate parameter vector, wherein the standby network model comprises the candidate vector parameters and the preset classification model;
and determining the target parameter vector and the target text classification model from the candidate parameter vector and the candidate text classification model according to a preset verification data set, wherein the preset verification data set comprises a sample verification text and a sample verification category corresponding to the sample verification text.
Optionally, the first training sample data includes first sample input data and a first sample category corresponding to the first sample input data; training the first preset network model according to the first training sample data, and obtaining candidate parameter vectors corresponding to the preset classification templates comprises the following steps:
Training the first preset network model according to the first sample input data and the first sample category to obtain the candidate parameter vector.
Optionally, the first sample input data includes the preset classification template and a first sample, and the preset classification template includes a preset parameter vector and a preset natural language template; training the first preset network model according to the first sample input data and the first sample category, and obtaining the candidate parameter vector includes:
obtaining the first sample input data according to the first sample text and the preset classification template;
and taking the first sample input data as the input of the first preset network model, taking the first sample class as the output of the first preset network model, and training the first preset network model to obtain the candidate parameter vector.
Optionally, the second training sample data includes second sample input data and second sample output data corresponding to the second sample input data; training the standby network model corresponding to the candidate parameter vector according to the second training sample data, and obtaining a candidate text classification model corresponding to the candidate parameter vector includes:
And training the standby network model according to the second sample input data and the second sample output data aiming at each candidate parameter vector to obtain the candidate text classification model.
Optionally, the second sample input data includes a candidate classification template and a second sample text, the candidate classification template including the candidate parameter vector and the preset natural language template; the second sample output data is a text extracted from a preset sample text, and the second sample text is a text obtained after the second sample output data is extracted from the preset sample text; training the standby network model according to the second sample input data and the second sample output data to obtain the candidate text classification model comprises the following steps:
obtaining the second sample input data according to the second sample text and the candidate classification templates;
and taking the second sample input data as the input of the standby network model, taking the second sample output data as the output of the standby network model, and training the standby network model to obtain the candidate text classification model.
Optionally, the determining the target parameter vector and the target text classification model from the candidate parameter vector and the candidate text classification model according to a preset verification data set includes:
for each candidate parameter vector, verifying the candidate classification templates corresponding to the text and the candidate parameter vector according to the sample to obtain verification input data;
taking the verification input data as the input of the candidate text classification model corresponding to the candidate parameter vector, and obtaining a target verification category output by the candidate text classification model;
determining the classification accuracy of each candidate network model according to the target verification category and the sample verification category, wherein the candidate network model comprises the candidate parameter vector and the candidate text classification model corresponding to the candidate parameter vector;
and taking the candidate parameter vector in the candidate network model with the highest classification accuracy as the target parameter vector, and taking the candidate text classification model in the candidate network model with the highest classification accuracy as the target text classification model.
According to a third aspect of embodiments of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method described in the first aspect of the present disclosure.
According to a fourth aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the steps of the method in the first aspect of the disclosure.
Through the technical scheme, the method and the device for classifying the target text obtain the target text first, obtain target input data according to the target text and the target classification template comprising the target parameter vector and the target natural language template, and then input the target input data into a preset target text classification model to obtain the target text class output by the target text classification model. The target parameter vector is obtained by training a first preset network model according to first training sample data, wherein the first training sample data are sample data marked with categories, and the first preset network model comprises a preset parameter vector and a preset classification model. The target text classification model is obtained by training a second preset network model according to second training sample data, wherein the second training sample data is sample data of unlabeled categories, and the second preset network model comprises a target parameter vector and a preset classification model. According to the method, the target text category corresponding to the target text is determined through the target parameter vector and the target text classification model which are obtained through pre-training, the classification methods of the natural language template method and the parameter vector template method are fused, a large amount of unlabeled training data is fully utilized, and a more accurate text classification result can be obtained.
Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification, illustrate the disclosure and together with the description serve to explain, but do not limit the disclosure.
Fig. 1 is a flow chart illustrating a method of classifying text according to an exemplary embodiment.
FIG. 2 is a flow chart illustrating a method of determining a target parameter vector and a target text classification model according to an exemplary embodiment.
Fig. 3 is a flow chart illustrating a method of determining candidate parameter vectors according to an exemplary embodiment.
FIG. 4 is a flowchart illustrating a method of determining a candidate text classification model according to an exemplary embodiment.
FIG. 5 is a flowchart illustrating a method of determining a target parameter vector and a target text classification model according to an exemplary embodiment.
Fig. 6 is a block diagram illustrating a text classification apparatus according to an exemplary embodiment.
Fig. 7 is a block diagram of an electronic device, according to an example embodiment.
Detailed Description
Specific embodiments of the present disclosure are described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the disclosure, are not intended to limit the disclosure.
Before introducing the classification method, the device, the storage medium and the electronic equipment of the text shown in the disclosure, an application scenario related to an embodiment of the disclosure is first described. Aiming at the multi-classification problem of the complaint text, the input is the complaint text of the user, such as 'someone parks in disorder', 'visible garbage everywhere in a cell', and the like, the output is the classification of the complaint text, and the classification number can be multiple, including 'illegal parking', 'green garbage', 'street lamp is not bright', and the like. At present, aiming at the multi-classification problem, the existing typical small sample learning method comprises a natural language template method PET and a parameter vector template method P-Tuning, and aiming at a marked sample data set, a corresponding model is trained. The natural language template method PET is to combine a natural language template constructed manually with an MLM model (English: masked Language Model, chinese: mask language model) of a BERT (English: bidirectional Encoder Representation from Transformers, chinese: based on a bi-directional representation coding algorithm) model to convert tasks into complete blank filling for small sample learning, but the template is required to be constructed manually, and the difference of different template effects is relatively large. The parametric vector template method P-Tuning automatically learns the best template by using the representation of the unused words in the pre-trained model, which automatically learns the template parameters, but the learned template lacks interpretability and cannot be generated by human beings from a priori knowledge.
Fig. 1 is a flow chart illustrating a method of classifying text according to an exemplary embodiment, which may include the following steps, as shown in fig. 1.
And step 101, acquiring a target text.
And 102, obtaining target input data according to the target text and the target classification template. The target classification template comprises a target parameter vector and a target natural language template, the target parameter vector is obtained by training a first preset network model according to first training sample data, the first training sample data is sample data marked with categories, and the first preset network model comprises a preset parameter vector and a preset classification model.
For example, the target text may be first obtained from the complaint text submitted by the user, and then the target text and the target classification template are spliced to obtain the target input data. The target classification templates may include target parameter vectors and target natural language templates, among others. For example, the target classification template may be "[ u ] 1 ][u 2 ]...[u n ]This is [ M ] 1 ][M 2 ][M 3 ][M 4 ]Information. ", wherein, [ u ] 1 ][u 2 ]...[u n ]For the target parameter vector, "this is [ M ] 1 ][M 2 ][M 3 ][M 4 ]Information. "as target natural language template, [ M ] 1 ][M 2 ][M 3 ][M 4 ]Representing the text that was MASK. In this way, the parameter vector in the parameter vector template is fused with the natural language template to obtain the target classification template, so that the priori knowledge of human beings can be utilized, and template parameters can be automatically learned based on the labeling data of small samples. In some embodiments, the target parameter vector may be trained on a first preset network model based on first training sample data, where the first training sample data is sample data labeled with a class, and the first preset network model may include a preset parameter vector and a preset classification model, where the preset classification model may be, for example, a BERT model.
And step 103, inputting the target input data into a preset target text classification model to obtain the target text category output by the target text classification model. The target text classification model is obtained by training a second preset network model according to second training sample data, the second training sample data is sample data of unlabeled categories, and the second preset network model comprises a target parameter vector and a preset classification model.
For example, after obtaining the target input data, the target input data may be input into a preset target text classification model, so as to obtain a target text category output by the target text classification model. In some embodiments, the target text classification model may be obtained by training a second preset network model according to the second training sample data, where the second preset network model may include a target parameter vector and a preset classification model, that is, the second preset network model is a model obtained by replacing a preset parameter vector in the first preset network model with the target parameter vector. The second training sample data can be sample data with unlabeled categories, and the model training is performed by using the sample data with unlabeled categories to finely adjust the parameters of the preset classification model, so that the semi-supervised learning effect can be achieved.
In summary, the present disclosure firstly obtains a target text, obtains target input data according to the target text and a target classification template including a target parameter vector and a target natural language template, and then inputs the target input data into a preset target text classification model to obtain a target text category output by the target text classification model. The target parameter vector is obtained by training a first preset network model according to first training sample data, wherein the first training sample data are sample data marked with categories, and the first preset network model comprises a preset parameter vector and a preset classification model. The target text classification model is obtained by training a second preset network model according to second training sample data, wherein the second training sample data is sample data of unlabeled categories, and the second preset network model comprises a target parameter vector and a preset classification model. According to the method, the target text category corresponding to the target text is determined through the target parameter vector and the target text classification model which are obtained through pre-training, the classification methods of the natural language template method and the parameter vector template method are fused, a large amount of unlabeled training data is fully utilized, and a more accurate text classification result can be obtained.
Fig. 2 is a flowchart illustrating a method of determining a target parameter vector and a target text classification model according to an exemplary embodiment, as shown in fig. 2, the target parameter vector and the target text classification model being determined by the following steps.
Step 201, training a first preset network model according to the first training sample data for each preset classification template to obtain candidate parameter vectors corresponding to the preset classification templates.
Step 202, training a standby network model corresponding to each candidate parameter vector according to the second training sample data to obtain a candidate text classification model corresponding to the candidate parameter vector, wherein the standby network model comprises candidate vector parameters and a preset classification model.
And 203, determining a target parameter vector and a target text classification model from the candidate parameter vector and the candidate text classification model according to a preset verification data set, wherein the preset verification data set comprises a sample verification text and a sample verification category corresponding to the sample verification text.
By way of example, the first training sample data may include at least one preset classification template therein, wherein,the preset classification templates may include preset parameter vectors and preset natural language templates, and it may be understood that the preset classification templates are templates obtained by fusing the preset parameter vectors and the preset natural language templates, and the preset natural language templates in each preset classification template may be different. For example, the preset classification templates may include: "[ u ] 1 ][u 2 ]...[u n ]This is [ M ] 1 ][M 2 ][M 3 ][M 4 ]Information "," [ u ] 1 ][u 2 ]...[u n ]Belonging to [ M1 ]][M2][M3][M4]Category "," [ u ] 1 ][u 2 ]...[u n ]Attribution [ M1 ]][M2][M3][M4]And (5) classification. [ u ] 1 ][u 2 ]...[u n ]For a preset parameter vector, "this is [ M ] 1 ][M 2 ][M 3 ][M 4 ]Information "" belong to [ M1 ]][M2][M3][M4]Category "and" attribution [ M1 ]][M2][M3][M4]The classification is a preset natural language template.
For each preset classification template, training the first preset network model according to the first training sample data to obtain candidate parameter vectors corresponding to the preset classification templates. Wherein each candidate parameter vector and the preset classification Model can form a standby network Model, and under the condition that the preset classification templates have P numbers, the number of the candidate parameter vectors obtained through training can be P, and correspondingly, the standby network Model can have a Model 1 ,Model 2 ,......,Model P And P standby network models comprise a second preset network model.
For each candidate parameter vector, training the standby network model corresponding to the candidate parameter vector according to the second training sample data to obtain a candidate text classification model corresponding to the candidate parameter vector. Each candidate parameter vector and the candidate text classification model corresponding to the candidate parameter vector may form a candidate network model. The network Model to be used has Model 1 ,Model 2 ,......,Model P In the case of P total, a Model can be obtained h 1 ,Model h 2 ......,Model h P There are P candidate network models in total. Due to training the first preset network modeThe first training sample data used in the model process is data marked with categories, and the second training sample used in the process of training the standby network model is data unmarked with categories, so that the candidate network model generated by training can achieve the effect of semi-supervised learning, and the classification accuracy of the target text classification model obtained by training is higher.
And finally, according to a preset verification data set, determining the target parameter vector and the target text classification model with highest classification accuracy from the candidate parameter vector and the candidate text classification model. The preset verification data set may include a sample verification text and a sample verification category corresponding to the sample verification text. In some embodiments, a target network model with the highest classification accuracy may be determined from the candidate network models according to a preset verification data set, then the candidate parameter vector included in the target network model is taken as the target parameter vector, and the candidate text classification model included in the target network model is taken as the target text classification model.
In one embodiment, one implementation of step 201 may be: training a first preset network model according to the first sample input data and the first sample category to obtain candidate parameter vectors.
Fig. 3 is a flowchart illustrating a method of determining candidate parameter vectors according to an exemplary embodiment, and step 201 may be implemented by the following steps, as shown in fig. 3.
And step 2011, obtaining first sample input data according to the first sample text and a preset classification template.
Step 2012, taking the first sample input data as input of a first preset network model, taking the first sample category as output of the first preset network model, and training the first preset network model to obtain candidate parameter vectors.
For example, the first training sample data may include first sample input data and a first sample category corresponding to the first sample input data, and the first sample input data may include a preset classification template and a first sample text. The preset classification templates and the first text sample can be splicedFirst sample input data is obtained. With the preset classification template as "[ u ] 1 ][u 2 ]...[u n ]This is [ M1 ]][M2][M3][M4]Information. "first sample text is" someone parks in a cell in disorder "as an example, [ u ] 1 ][u 2 ]...[u n ]For a preset parameter vector, "this is [ M1 ]][M2][M3][M4]Information. "is a preset natural language template, the first sample input data is" [ u ] 1 ][u 2 ]...[u n ]This is [ M1 ]][M2][M3][M4]Information. Some district has someone to park in disorder. "[ u ] 1 ][u 2 ]...[u n ]This is [ M1 ]][M2][M3][M4]Information. The first sample category corresponding to the someone parking in disorder in a certain district can be "illegal parking", namely "[ M1 ]][M2][M3][M4]The corresponding text is "parking in violation".
And then taking the first sample input data as the input of a first preset network model, taking the first sample class as the output of the first preset network model, and training the first preset network model according to a preset first loss function so as to obtain candidate parameter vectors. In one embodiment, in training the first preset network model, a pytorch framework may be employed, using Adam optimizer as the optimizer, and the learning rate may be set to 1e-4. First training sample data can be loaded, data batch is carried out according to preset batch_size, and total M batches of data are assumed, wherein the batch_size can be 8, namely, the number of samples which are transmitted to a program for training at a time is 8, and the initial value of a preset parameter vector can be a random value.
Taking a preset classification model as a BERT model as an example, for a preset parameter vector, LSTM (Long Short-Term Memory, chinese: long-Short Term Memory network) can be used for processing to obtain a hidden state vector h corresponding to the preset parameter vector j . Then a batch of first sample input data is input into an Embedding layer of the BERT model to be converted into a vector form, wherein, for a preset parameter vector [ u ] j ]The hidden state vector h can be used j And performing replacement. And inputting the vector output by the coding layer into an MLM model in the BERT model to obtain the category of the predicted first sample text. According to the preset firstA loss function can obtain a loss function value corresponding to the batch of data, and an updated parameter vector u can be solved according to an Adam optimizer of pytorch and learning rate setting, so that a preset parameter vector is updated. The first loss function may be shown in equation 1, for example.
Figure SMS_1
(equation 1)
Wherein L is 1 As a first loss function, u= ([ u) 1 ],[u 2 ],...,[u n ]) For the preset parameter vector, x is the first sample text, y is the first sample class, D train1 For the first training sample data, LSTM (u) is a hidden state vector h obtained after the preset parameter vector is processed by the LSTM j
In other embodiments, the training steps may be repeated with the M batches of data as training data, so that the parameter u is updated M times, and an epoch operation is completed, that is, a complete training is performed on the first preset network model by using the first training sample data. According to the above procedure, the training may be performed for epoch_size=50 times, and the parameter u is updated multiple times, thereby obtaining candidate parameter vectors.
In another embodiment, one implementation of step 202 may be: and training the network model to be used according to the second sample input data and the second sample output data aiming at each candidate parameter vector to obtain a candidate text classification model.
FIG. 4 is a flowchart illustrating a method of determining a candidate text classification model according to an exemplary embodiment, as shown in FIG. 4, step 202 may be implemented by the following steps.
Step 2021, obtaining second sample input data according to the second sample text and the candidate classification templates.
And step 2022, taking the second sample input data as the input of the standby network model, and taking the second sample output data as the output of the standby network model, and training the standby network model to obtain the candidate text classification model.
For example, the second training sample data may include second sample input data and second sample output data corresponding to the second sample input data, where the second sample input data may include a candidate classification template and second sample text, and the candidate classification template may include a candidate parameter vector and a preset natural language template, and it may be understood that the candidate classification template is a template obtained by replacing a preset parameter vector in the preset classification template with the candidate parameter vector.
The candidate classification templates and the second sample texts can be spliced to obtain second sample input data, the second sample input data are used as input of a standby network model, the second sample output data are used as output of the standby network model, and the network model to be used is trained to obtain a candidate text classification model. The second sample output data may be text extracted from a preset sample text, and the second sample text may be text obtained after the second sample output data is extracted from the preset sample text. Taking a preset sample text as an example of ' a certain cell is shut down and is required to recover as soon as possible ', the ' as soon as possible ' can be extracted as second sample output data, and ' a certain cell is shut down and is required to recover [ M1] [ M2] as second sample text.
In one embodiment, in training the second preset network model, a pytorch framework may be employed, using Adam optimizer as the optimizer, and the learning rate may be set to 1e-4. First training sample data can be loaded, data batch is carried out according to batch_size, total N batches of data are assumed, wherein the batch_size can be 32, namely, the number of samples which are transmitted to a program for training at a time is 32, and parameters para in a preset classification model can be obtained through training i bert And updating to obtain a candidate text classification model.
Taking a preset classification model as a BERT model as an example, aiming at each batch of second training sample data, a loss function value corresponding to the batch of data can be obtained according to a preset second loss function, and an updated parameter par can be solved according to an Adam optimizer of a pyrach and learning rate settinga i bert Thereby for parameter para i bert And updating. The second loss function may be represented by equation 2, for example.
Figure SMS_2
(equation 2)
Wherein L is 2 As a second loss function, z is a second sample text, w is second sample output data, D train2 For the second training sample data, u i As candidate parameter vectors, lstm (u i ) Is the vector obtained after the candidate parameter vector is processed by LSTM.
In other embodiments, N batches of data may be used as training data, respectively, and the above training steps may be repeated for the parameter para i bert Updating for N times to complete one epoch operation, namely, performing one complete training on the standby network model by using the second training sample data. According to the above procedure, one can perform epoch_size=50 training by applying to parameter para i bert Multiple updates are carried out, and the final parameter para i bert And taking the candidate text classification model as a parameter of a preset classification model, thereby obtaining the candidate text classification model. Thus, the parameter para of the preset classification model is obtained through the unlabeled second training sample data i bert And fine adjustment is performed, a large amount of unlabeled sample data is fully utilized, and the semi-supervised learning effect is achieved.
Fig. 5 is a flowchart illustrating a method of determining a target parameter vector and a target text classification model according to an exemplary embodiment, and step 203 may be implemented by the following steps, as shown in fig. 5.
Step 2031, for each candidate parameter vector, obtaining verification input data according to the sample verification text and the candidate classification template corresponding to the candidate parameter vector.
Step 2032, taking the verification input data as the input of the candidate text classification model corresponding to the candidate parameter vector, and obtaining the target verification category output by the candidate text classification model.
Step 2033, determining the classification accuracy of each candidate network model according to the target verification category and the sample verification category, where the candidate network models include candidate parameter vectors and candidate text classification models corresponding to the candidate parameter vectors.
Step 2034, taking the candidate parameter vector in the candidate network model with the highest classification accuracy as the target parameter vector, and taking the candidate text classification model in the candidate network model with the highest classification accuracy as the target text classification model.
For example, after obtaining at least one candidate parameter vector and a candidate text classification model corresponding to each candidate parameter vector, a target parameter vector and a target text classification model with highest classification accuracy may be determined from the at least one candidate parameter vector and the candidate text classification model. If the candidate parametric vector and the candidate text classification model are one, the candidate parametric vector and the candidate text classification model may be regarded as the target parametric vector and the target text classification model. If there are a plurality of candidate parameter vectors and candidate text classification models, the target parameter vector and the target text classification model can be determined according to the classification accuracy of each candidate network model, wherein each candidate network model comprises one candidate parameter vector and the candidate text classification model corresponding to the candidate parameter vector.
In some embodiments, for each candidate network model, first, a sample verification text and a candidate classification template corresponding to a candidate parameter vector in the candidate network model may be spliced to obtain verification input data. And then taking the verification input data as the input of the candidate text classification model corresponding to the candidate parameter vector to obtain the target verification category output by the candidate text classification model. And further determining the classification accuracy corresponding to each candidate network model according to the matching degree of the target verification category and the sample verification category. The higher the matching degree of the target verification category and the sample verification category is, the higher the classification accuracy of the candidate network model corresponding to the target verification category is, and correspondingly, the lower the matching degree of the target verification category and the sample verification category is, the lower the classification accuracy of the candidate network model corresponding to the target verification category is. After determining the classification accuracy of each candidate network model, the candidate network model with the highest classification accuracy may be taken as the target network model. And simultaneously, taking the candidate parameter vector in the target network model as the target parameter vector, and taking the candidate text classification model in the target network model as the target text classification model.
In summary, the present disclosure firstly obtains a target text, obtains target input data according to the target text and a target classification template including a target parameter vector and a target natural language template, and then inputs the target input data into a preset target text classification model to obtain a target text category output by the target text classification model. The target parameter vector is obtained by training a first preset network model according to first training sample data, wherein the first training sample data are sample data marked with categories, and the first preset network model comprises a preset parameter vector and a preset classification model. The target text classification model is obtained by training a second preset network model according to second training sample data, wherein the second training sample data is sample data of unlabeled categories, and the second preset network model comprises a target parameter vector and a preset classification model. According to the method, the target text category corresponding to the target text is determined through the target parameter vector and the target text classification model which are obtained through pre-training, the classification methods of the natural language template method and the parameter vector template method are fused, a large amount of unlabeled training data is fully utilized, and a more accurate text classification result can be obtained.
Fig. 6 is a block diagram of a text classification apparatus according to an exemplary embodiment, and as shown in fig. 6, the apparatus 300 may include the following modules.
The obtaining module 301 is configured to obtain a target text.
And the input module 302 is configured to obtain target input data according to the target text and the target classification template. The target classification template comprises a target parameter vector and a target natural language template, the target parameter vector is obtained by training a first preset network model according to first training sample data, the first training sample data is sample data marked with categories, and the first preset network model comprises a preset parameter vector and a preset classification model.
The classification module 303 is configured to input the target input data into a preset target text classification model, so as to obtain a target text category output by the target text classification model. The target text classification model is obtained by training a second preset network model according to second training sample data, the second training sample data is sample data of unlabeled categories, and the second preset network model comprises a target parameter vector and a preset classification model.
In one embodiment, the first training sample data comprises at least one preset classification template. The target parameter vector and the target text classification model are determined in the following manner.
And training the first preset network model according to the first training sample data aiming at each preset classification template to obtain candidate parameter vectors corresponding to the preset classification templates.
And training a standby network model corresponding to the candidate parameter vector according to the second training sample data aiming at each candidate parameter vector to obtain a candidate text classification model corresponding to the candidate parameter vector, wherein the standby network model comprises candidate vector parameters and a preset classification model.
And determining a target parameter vector and a target text classification model from the candidate parameter vector and the candidate text classification model according to a preset verification data set, wherein the preset verification data set comprises a sample verification text and a sample verification category corresponding to the sample verification text.
In another embodiment, the first training sample data includes first sample input data and a first sample class corresponding to the first sample input data. Training a first preset network model according to the first training sample data, and obtaining candidate parameter vectors corresponding to a preset classification template comprises the following steps: training a first preset network model according to the first sample input data and the first sample category to obtain candidate parameter vectors.
In another embodiment, the first sample input data includes a preset classification template and a first sample text, the preset classification template including a preset parameter vector and a preset natural language template. Training a first preset network model according to the first sample input data and the first sample category to obtain candidate parameter vectors.
And obtaining first sample input data according to the first sample text and a preset classification template.
And taking the first sample input data as input of a first preset network model, taking the first sample category as output of the first preset network model, and training the first preset network model to obtain candidate parameter vectors.
In another embodiment, the second training sample data includes second sample input data and second sample output data corresponding to the second sample input data. Training the standby network model corresponding to the candidate parameter vector according to the second training sample data, and obtaining the candidate text classification model corresponding to the candidate parameter vector comprises the following steps: and training the network model to be used according to the second sample input data and the second sample output data aiming at each candidate parameter vector to obtain a candidate text classification model.
In another embodiment, the second sample input data includes a candidate classification template and a second sample text, the candidate classification template including a candidate parameter vector and a preset natural language template. The second sample output data is a text extracted from a preset sample text, and the second sample text is a text obtained after the second sample output data is extracted from the preset sample text. Training the network model to be used according to the second sample input data and the second sample output data to obtain a candidate text classification model.
And obtaining second sample input data according to the second sample text and the candidate classification templates.
And taking the second sample input data as the input of the standby network model, taking the second sample output data as the output of the standby network model, and training the network model to be used to obtain a candidate text classification model.
In another embodiment, determining the target parameter vector and the target text classification model from the candidate parameter vector and the candidate text classification model according to the preset validation data set comprises the following steps.
And aiming at each candidate parameter vector, obtaining verification input data according to the sample verification text and the candidate classification template corresponding to the candidate parameter vector.
And taking the verification input data as the input of the candidate text classification model corresponding to the candidate parameter vector, and obtaining the target verification category output by the candidate text classification model.
And determining the classification accuracy of each candidate network model according to the target verification category and the sample verification category, wherein the candidate network models comprise candidate parameter vectors and candidate text classification models corresponding to the candidate parameter vectors.
And taking the candidate parameter vector in the candidate network model with the highest classification accuracy as a target parameter vector, and taking the candidate text classification model in the candidate network model with the highest classification accuracy as a target text classification model.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
In summary, the present disclosure firstly obtains a target text, obtains target input data according to the target text and a target classification template including a target parameter vector and a target natural language template, and then inputs the target input data into a preset target text classification model to obtain a target text category output by the target text classification model. The target parameter vector is obtained by training a first preset network model according to first training sample data, wherein the first training sample data are sample data marked with categories, and the first preset network model comprises a preset parameter vector and a preset classification model. The target text classification model is obtained by training a second preset network model according to second training sample data, wherein the second training sample data is sample data of unlabeled categories, and the second preset network model comprises a target parameter vector and a preset classification model. According to the method, the target text category corresponding to the target text is determined through the target parameter vector and the target text classification model which are obtained through pre-training, the classification methods of the natural language template method and the parameter vector template method are fused, a large amount of unlabeled training data is fully utilized, and a more accurate text classification result can be obtained.
Fig. 7 is a block diagram of an electronic device, according to an example embodiment. For example, electronic device 400 may be provided as a server. Referring to fig. 7, the electronic device 400 includes a processor 422, which may be one or more in number, and a memory 432 for storing a computer program executable by the processor 422. The computer program stored in memory 432 may include one or more modules each corresponding to a set of instructions. Further, the processor 422 may be configured to execute the computer program to perform the text classification method described above.
In addition, the electronic device 400 may further include a power supply component 426 and a communication component 450, the power supply component 426 may be configured to perform power management of the electronic device 400, and the communication component 450 may be configured to enable communication of the electronic device 400, e.g., wired or wireless communication. In addition, the electronic device 400 may also include an input/output interface 458. The electronic device 400 may operate based on an operating system stored in the memory 432.
In another exemplary embodiment, a computer readable storage medium is also provided comprising program instructions which, when executed by a processor, implement the steps of the text classification method described above. For example, the non-transitory computer readable storage medium may be the memory 432 described above including program instructions executable by the processor 422 of the electronic device 400 to perform the text classification method described above.
In another exemplary embodiment, a computer program product is also provided, comprising a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-described text classification method when executed by the programmable apparatus.
The preferred embodiments of the present disclosure have been described in detail above with reference to the accompanying drawings, but the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solutions of the present disclosure within the scope of the technical concept of the present disclosure, and all the simple modifications belong to the protection scope of the present disclosure.
In addition, the specific features described in the foregoing embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, the present disclosure does not further describe various possible combinations.
Moreover, any combination between the various embodiments of the present disclosure is possible as long as it does not depart from the spirit of the present disclosure, which should also be construed as the disclosure of the present disclosure.

Claims (8)

1. A method of classifying text, the method comprising:
Acquiring a target text;
obtaining target input data according to the target text and a target classification template, wherein the target classification template comprises a target parameter vector and a target natural language template, the target parameter vector is obtained by training a first preset network model according to first training sample data, the first training sample data is sample data marked with categories, and the first preset network model comprises a preset parameter vector and a preset classification model;
inputting the target input data into a preset target text classification model to obtain a target text class output by the target text classification model, wherein the target text classification model is obtained by training a second preset network model according to second training sample data, the second training sample data is sample data of unlabeled classes, and the second preset network model comprises the target parameter vector and the preset classification model;
the first training sample data comprises at least one preset classification template, wherein the preset classification template comprises a preset parameter vector and a preset natural language template; the target parameter vector and the target text classification model are determined by:
Training the first preset network model according to the first training sample data aiming at each preset classification template to obtain candidate parameter vectors corresponding to the preset classification templates;
training a standby network model corresponding to the candidate parameter vector according to the second training sample data aiming at each candidate parameter vector to obtain a candidate text classification model corresponding to the candidate parameter vector, wherein the standby network model comprises the candidate parameter vector and the preset classification model;
determining the target parameter vector and the target text classification model from the candidate parameter vector and the candidate text classification model according to a preset verification data set, wherein the preset verification data set comprises a sample verification text and a sample verification category corresponding to the sample verification text;
the second training sample data comprises second sample input data and second sample output data corresponding to the second sample input data; training the standby network model corresponding to the candidate parameter vector according to the second training sample data, and obtaining a candidate text classification model corresponding to the candidate parameter vector includes:
And training the standby network model according to the second sample input data and the second sample output data aiming at each candidate parameter vector to obtain the candidate text classification model.
2. The method of claim 1, wherein the first training sample data comprises first sample input data and a first sample class to which the first sample input data corresponds; training the first preset network model according to the first training sample data, and obtaining candidate parameter vectors corresponding to the preset classification templates comprises the following steps:
training the first preset network model according to the first sample input data and the first sample category to obtain the candidate parameter vector.
3. The method of claim 2, wherein the first sample input data comprises the preset classification template and a first sample text; training the first preset network model according to the first sample input data and the first sample category, and obtaining the candidate parameter vector includes:
obtaining the first sample input data according to the first sample text and the preset classification template;
And taking the first sample input data as the input of the first preset network model, taking the first sample class as the output of the first preset network model, and training the first preset network model to obtain the candidate parameter vector.
4. The method of claim 1, wherein the second sample input data comprises a candidate classification template and a second sample text, the candidate classification template comprising the candidate parameter vector and the preset natural language template; the second sample output data is a text extracted from a preset sample text, and the second sample text is a text obtained after the second sample output data is extracted from the preset sample text; training the standby network model according to the second sample input data and the second sample output data to obtain the candidate text classification model comprises the following steps:
obtaining the second sample input data according to the second sample text and the candidate classification templates;
and taking the second sample input data as the input of the standby network model, taking the second sample output data as the output of the standby network model, and training the standby network model to obtain the candidate text classification model.
5. The method of claim 4, wherein determining the target parameter vector and the target text classification model from the candidate parameter vector and the candidate text classification model according to a preset validation data set comprises:
for each candidate parameter vector, verifying the candidate classification templates corresponding to the text and the candidate parameter vector according to the sample to obtain verification input data;
taking the verification input data as the input of the candidate text classification model corresponding to the candidate parameter vector, and obtaining a target verification category output by the candidate text classification model;
determining the classification accuracy of each candidate network model according to the target verification category and the sample verification category, wherein the candidate network model comprises the candidate parameter vector and the candidate text classification model corresponding to the candidate parameter vector;
and taking the candidate parameter vector in the candidate network model with the highest classification accuracy as the target parameter vector, and taking the candidate text classification model in the candidate network model with the highest classification accuracy as the target text classification model.
6. A text classification apparatus, the apparatus comprising:
the acquisition module is used for acquiring the target text;
the input module is used for obtaining target input data according to the target text and the target classification template, the target classification template comprises a target parameter vector and a target natural language template, the target parameter vector is obtained by training a first preset network model according to first training sample data, the first training sample data is sample data marked with categories, and the first preset network model comprises a preset parameter vector and a preset classification model;
the classification module is used for inputting the target input data into a preset target text classification model to obtain a target text class output by the target text classification model, the target text classification model is obtained by training a second preset network model according to second training sample data, the second training sample data is sample data of unlabeled class, and the second preset network model comprises the target parameter vector and the preset classification model;
the first training sample data comprises at least one preset classification template, wherein the preset classification template comprises a preset parameter vector and a preset natural language template; the target parameter vector and the target text classification model are determined by:
Training the first preset network model according to the first training sample data aiming at each preset classification template to obtain candidate parameter vectors corresponding to the preset classification templates;
training a standby network model corresponding to the candidate parameter vector according to the second training sample data aiming at each candidate parameter vector to obtain a candidate text classification model corresponding to the candidate parameter vector, wherein the standby network model comprises the candidate parameter vector and the preset classification model;
determining the target parameter vector and the target text classification model from the candidate parameter vector and the candidate text classification model according to a preset verification data set, wherein the preset verification data set comprises a sample verification text and a sample verification category corresponding to the sample verification text;
the second training sample data comprises second sample input data and second sample output data corresponding to the second sample input data; training the standby network model corresponding to the candidate parameter vector according to the second training sample data, and obtaining a candidate text classification model corresponding to the candidate parameter vector includes:
And training the standby network model according to the second sample input data and the second sample output data aiming at each candidate parameter vector to obtain the candidate text classification model.
7. A non-transitory computer readable storage medium, having stored thereon a computer program, characterized in that the program when executed by a processor implements the steps of the method according to any of claims 1-5.
8. An electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the steps of the method of any one of claims 1-5.
CN202310273838.7A 2023-03-20 2023-03-20 Text classification method and device, storage medium and electronic equipment Active CN115994225B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310273838.7A CN115994225B (en) 2023-03-20 2023-03-20 Text classification method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310273838.7A CN115994225B (en) 2023-03-20 2023-03-20 Text classification method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN115994225A CN115994225A (en) 2023-04-21
CN115994225B true CN115994225B (en) 2023-06-27

Family

ID=85992278

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310273838.7A Active CN115994225B (en) 2023-03-20 2023-03-20 Text classification method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN115994225B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116738298B (en) * 2023-08-16 2023-11-24 杭州同花顺数据开发有限公司 Text classification method, system and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020199591A1 (en) * 2019-03-29 2020-10-08 平安科技(深圳)有限公司 Text categorization model training method, apparatus, computer device, and storage medium
CN112966712A (en) * 2021-02-01 2021-06-15 北京三快在线科技有限公司 Language model training method and device, electronic equipment and computer readable medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11847414B2 (en) * 2020-04-24 2023-12-19 Deepmind Technologies Limited Robustness to adversarial behavior for text classification models
CN113901807A (en) * 2021-08-30 2022-01-07 重庆德莱哲企业管理咨询有限责任公司 Clinical medicine entity recognition method and clinical test knowledge mining method
CN113688244A (en) * 2021-08-31 2021-11-23 中国平安人寿保险股份有限公司 Text classification method, system, device and storage medium based on neural network
CN114896395A (en) * 2022-04-26 2022-08-12 阿里巴巴(中国)有限公司 Language model fine-tuning method, text classification method, device and equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020199591A1 (en) * 2019-03-29 2020-10-08 平安科技(深圳)有限公司 Text categorization model training method, apparatus, computer device, and storage medium
CN112966712A (en) * 2021-02-01 2021-06-15 北京三快在线科技有限公司 Language model training method and device, electronic equipment and computer readable medium

Also Published As

Publication number Publication date
CN115994225A (en) 2023-04-21

Similar Documents

Publication Publication Date Title
US20230025317A1 (en) Text classification model training method, text classification method, apparatus, device, storage medium and computer program product
CN110705301B (en) Entity relationship extraction method and device, storage medium and electronic equipment
CN112115267B (en) Training method, device, equipment and storage medium of text classification model
CN110766038B (en) Unsupervised landform classification model training and landform image construction method
CN110866093A (en) Machine question-answering method and device
CN115994225B (en) Text classification method and device, storage medium and electronic equipment
CN111159415A (en) Sequence labeling method and system, and event element extraction method and system
JP2022512065A (en) Image classification model training method, image processing method and equipment
CN113128232B (en) Named entity identification method based on ALBERT and multiple word information embedding
WO2023137911A1 (en) Intention classification method and apparatus based on small-sample corpus, and computer device
CN112329476A (en) Text error correction method and device, equipment and storage medium
CN112100401A (en) Knowledge graph construction method, device, equipment and storage medium for scientific and technological service
CN114896395A (en) Language model fine-tuning method, text classification method, device and equipment
CN113220828B (en) Method, device, computer equipment and storage medium for processing intention recognition model
CN114529351A (en) Commodity category prediction method, device, equipment and storage medium
CN112100355A (en) Intelligent interaction method, device and equipment
CN111898337A (en) Single-sentence abstract defect report title automatic generation method based on deep learning
CN109982272B (en) Fraud short message identification method and device
CN114170484B (en) Picture attribute prediction method and device, electronic equipment and storage medium
CN114564942B (en) Text error correction method, storage medium and device for supervision field
CN116363446A (en) Zero sample image classification method, device, terminal and medium
CN115587173A (en) Dialog text prediction method, device, equipment and storage medium
CN110414515B (en) Chinese character image recognition method, device and storage medium based on information fusion processing
CN113434722A (en) Image classification method, device, equipment and computer readable storage medium
CN113239190A (en) Document classification method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant