CN117725211A - Text classification method and system based on self-constructed prompt template - Google Patents
Text classification method and system based on self-constructed prompt template Download PDFInfo
- Publication number
- CN117725211A CN117725211A CN202311619300.3A CN202311619300A CN117725211A CN 117725211 A CN117725211 A CN 117725211A CN 202311619300 A CN202311619300 A CN 202311619300A CN 117725211 A CN117725211 A CN 117725211A
- Authority
- CN
- China
- Prior art keywords
- template
- dictionary
- model
- word
- text classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 71
- 239000013598 vector Substances 0.000 claims abstract description 64
- 238000012549 training Methods 0.000 claims abstract description 53
- 238000013507 mapping Methods 0.000 claims abstract description 24
- 230000006870 function Effects 0.000 claims description 47
- 238000012795 verification Methods 0.000 claims description 16
- 238000002372 labelling Methods 0.000 claims description 15
- 238000010276 construction Methods 0.000 claims description 13
- 238000006243 chemical reaction Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 10
- 230000000694 effects Effects 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 5
- 239000011159 matrix material Substances 0.000 claims description 5
- 230000007246 mechanism Effects 0.000 claims description 5
- 230000009466 transformation Effects 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 9
- 230000008451 emotion Effects 0.000 description 8
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000013145 classification model Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 206010027940 Mood altered Diseases 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000000873 masking effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000009966 trimming Methods 0.000 description 1
Landscapes
- Machine Translation (AREA)
Abstract
A text classification method and system based on self-constructed prompt template is disclosed, which comprises constructing a label mapping dictionary by using self-attrition method, using a small number of labeled samples and unsupervised learning; generating a template by using a soft prompting method, and converting the template as follows: using an average value of word vectors to replace word vectors, selecting prompt word labels in a cosine similarity mode, using a transducer model to randomly mask input prompt words, and using disorder input for position vector codes in the transducer model; text classification is performed using templates. According to the text classification method, the self-generated prompt template can better fit with the pre-training language model, and compared with a traditional text classification method, generalization and accuracy of text classification are improved.
Description
Technical Field
The invention relates to the technical field of text classification, in particular to a text classification method and system based on a self-constructed prompt template.
Background
Text classification is always one of the most focused tasks in the field of natural language processing, the traditional text classification task is roughly divided into two types, and a text classification method based on a self-training model is used, namely a large number of labeled samples are utilized for training in a machine learning or deep learning modeling mode, and classification is carried out on target sentences through the trained models, wherein the text classification model mainly comprises SVM, textCnn, RNN, transorformer, GCN and other text classification models. The other type is to perform fine tuning and text classification on the existing pre-training models, and in recent years, along with the open source of large language models such as gpt3, ernie and the like, the effect of emotion classification models obtained by performing fine tuning by using the pre-training models is greatly improved compared with that of self-training models.
In the existing research of text classification based on a pre-training model, most classification methods are to perform Encode conversion on words or characters of input text, and then perform full-connection and softmax operation on specific high-dimensional vectors after Encode to obtain corresponding class labels. However, this trimming method has two distinct disadvantages: first, data annotation cost is high: to fully exploit the performance of a pre-trained model requires a large amount of labeling data, which makes labeling and training of the model costly. Second, the vector compression of the encodings may not effectively exploit the mutual information between the texts to cause information loss problems, for example, the encoding often represented by [ CLS ] in text classification using bert represents vector compression, and the model may lose some details or contextual information. Third, the iteration efficiency of the model through the data space of the supervised training is inversely proportional to the model size, i.e., the larger the pre-trained model, the more difficult it is to migrate to the data space.
Disclosure of Invention
In order to solve the technical problems that in the prior art, the text classification data marking cost is high, information loss caused by mutual information among texts cannot be effectively utilized, and the like, the invention provides a text classification method and a text classification system based on a self-constructed prompt template, and aims to solve the technical problems.
According to a first aspect of the present invention, there is provided a text classification method based on a self-constructed alert template, including:
s1: constructing a label mapping dictionary by using a self-attrition method and a small number of mark samples and an unsupervised learning mode;
s2: generating a template by using a soft prompting method, and converting the template as follows: the average value of the word vectors is used for replacing the word vectors, the cosine similarity mode is used for selecting the prompt word labels, the transducer model is used for carrying out random masking on the input prompt words, and the position vector codes in the transducer model are input in disorder:
s3: text classification is performed using templates.
In a specific embodiment, S1 construction of a tag mapping dictionary specifically includes:
s11: extracting a small amount of samples from the samples to be detected for manual labeling to form a labeled data set D 1 ;
S12: weighting scoring the words by using an attribute mechanism of the base class model, and adding words with the word dictionary weights larger than alpha into a class dictionary to obtain a class dictionary W= { W 11 ,w 12 ,...w k1 …w km W, where km The m-th word representing the k-th category, m being the minimum dictionary size for each category of the tag mapping dictionary, α representing the weight threshold:
s13: labeling the data by using the trained models and the class dictionary;
s14: repeating the steps, verifying the effect of each 2-3 epochs in the verification set by using a new training model, and stopping iteration when the F1 value of the verification set is larger than a preset numerical value or the dictionary size of a certain category is larger than the preset word number m.
In a specific embodiment, a text classifier of a base class is trained using the labeled dataset and the TextCnn-attribute model, and a portion of the data is extracted to form a verification set of class number balance.
In a specific embodiment, the generating of the template by the soft prompting method specifically includes: the constructed input templates are as follows: s is S 1 =[u 1 ,u 2 ,…[mask],...u m ,v 1 ,v 2 ,....v n ],u i Token, v for a hint template to be generated i Token, [ mask ] for input training text]For words in the category correspondence dictionary, the input training text is randomly extracted from the training data.
In a specific embodiment, the formula for the transformation in S2 is: z i =transformers(u i ,e j ),h i =Relu(MLU(z i ) The transfomers are a transform model, the MLU and the Relu are multi-layer perceptron and the Relu activation functions passed after Bi_LSTM encoding, the model is called a hint encoder for encoding a hint template, except for [ mask ]]Other token uses initialization of random word vector, [ mask ]]The corresponding vector performs the operation of ebadd with the words of the corresponding class label, where e j Vectors that are position coded.
In a particular embodiment, the objective function of the model includes a loss function L1 of the predicted word and a loss function L of the predicted tag 2 The overall loss function is the sum of the loss functions of both, l=l 1 +L 2 , Where u represents the length of the word, m represents the number of words in the dictionary corresponding to the tag, and p u (y) one-hot coding for model prediction, p u (y u ) Is a predictive probability matrix; l (L) 2 For the loss function between the calculated class label and the predicted class label +.>Wherein p is k =argmax k (cos(c i ,z km )),/>cos (. Cndot.) represents cosine similarity function, p k Representing the probability of selecting the category corresponding to the word with the highest similarity as the kth category.
According to a second aspect of the present invention, a computer-readable storage medium is presented, on which one or more computer programs are stored which, when executed by a computer processor, implement the above-described method.
According to a third aspect of the present invention, there is provided a text classification system based on a self-constructed hint template, comprising:
tag mapping dictionary construction unit: configuring a method for utilizing self-attrition, and constructing a label mapping dictionary by utilizing a small number of label samples and an unsupervised learning mode;
template generation conversion unit: the method is configured for generating templates by using a soft prompt method, and converting the templates as follows: using an average value of word vectors to replace word vectors, selecting prompt word labels in a cosine similarity mode, using a transducer model to randomly mask input prompt words, and using disorder input for position vector codes in the transducer model;
text classification unit: configured for text classification using templates.
In some specific embodiments, the tag mapping dictionary construction unit is specifically configured to extract a small number of samples from the samples to be tested for manual labeling to form the tagged dataset D 1 The method comprises the steps of carrying out a first treatment on the surface of the Weighting scoring words by using an attribute mechanism of the base class model, and weighting the word model with a weight greater than that of the word modelThe words of alpha are added into a category dictionary to obtain a category dictionary W= { W 11 ,w 12 ,...w k1 …w km W, where km The m-th word representing the kth category, m being the minimum dictionary size of each category of the tag mapping dictionary, α representing a weight threshold; labeling the data by using the trained models and the class dictionary; repeating the steps, using a new training model for every 2-3 epochs to verify the effect of the epochs in a verification set, stopping iteration when the F1 value of the verification set is larger than a preset numerical value or the dictionary size of a certain class is larger than the preset word number m, training a text classifier of a base class by using the marked data set and the textCnn-attribute model, and extracting part of data to form the verification set with balanced class numbers.
In some specific embodiments, the generating the template by using the method of soft prompt in the template generating conversion unit specifically includes: the constructed input templates are as follows: s is S 1 =[u 1 ,u 2 ,…[mask],...u m ,v 1 ,v 2 ,.... vn ],u i Token, v for a hint template to be generated i Token, [ mask ] for input training text]For words in the category correspondence dictionary, the input training text is randomly extracted from the training data.
In some specific embodiments, the formula for the transformation is: z i =transformers(u i ,e j ),h i =Relu(MLU(z i ) The transfomers are a transform model, the MLU and the Relu are multi-layer perceptron and the Relu activation functions passed after Bi_LSTM encoding, the model is called a hint encoder for encoding a hint template, except for [ mask ]]Other token uses initialization of random word vector, [ mask ]]The corresponding vector performs the operation of ebadd with the words of the corresponding class label, where e j Vectors that are position coded.
In some particular embodiments, the objective function of the model includes a loss function L1 of the predicted word and a loss function L of the predicted tag 2 The overall loss function is the sum of the loss functions of both, l=l 1 +L 2 , Where u represents the length of the word, m represents the number of words in the dictionary corresponding to the tag, and p u (y) one-hot coding for model prediction, p u (y u ) Is a predictive probability matrix; l (L) 2 For the loss function between the calculated class label and the predicted class label +.>Wherein p is k =argmax k (cos(c i ,z km )),/>cos (. Cndot.) represents cosine similarity function, p k Representing the probability of selecting the category corresponding to the word with the highest similarity as the kth category.
The invention provides a text classification method and a text classification system based on a self-construction prompt template, and provides a self-construction method of a class label dictionary, and a method for constructing a prompt template model by using a transducer model. The text classification method based on prompt learning overcomes the defects that the conventional text classification model is low in training efficiency and a large amount of training data is needed in the training process, can effectively utilize the resources of the trained pre-training model, and can better understand the semantics and the context information of the text based on the model of prompt learning. Through learning the mapping from prompt information to target category, the model can learn richer semantic expression, thereby improving the accuracy and generalization capability of text classification, and finally, the prompt learning can better resist resistance attack by introducing different prompt information in the training process based on the model of prompt learning. This is a very important advantage in some application scenarios where sensitivity or security requirements are high. The text classification based on prompt learning can be widely applied to various scenes such as task migration of a large model, emotion analysis, case related analysis and the like.
Drawings
The accompanying drawings are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and together with the description serve to explain the principles of the invention. Many of the intended advantages of other embodiments and embodiments will be readily appreciated as they become better understood by reference to the following detailed description. Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings, in which:
FIG. 1 is a flow chart of a text classification method based on a self-constructing hint template according to one embodiment of the present application;
FIG. 2 is a flow chart of tag correspondence dictionary training in one particular embodiment of the present application;
FIG. 3 is a flow chart of a text classification method based on a self-constructing hint template in accordance with one particular embodiment of the present application;
FIG. 4 is a model framework diagram of one particular embodiment of the present application;
FIG. 5 is a framework diagram of a text classification system based on self-constructing hint templates according to one embodiment of the present application;
fig. 6 is a schematic diagram of a computer system suitable for use in implementing an electronic device of an embodiment of the present application.
Detailed Description
The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
FIG. 1 illustrates a flow chart of a text classification method based on a self-constructing hint template according to embodiments of the present application. As shown in fig. 1, the method comprises the steps of:
s1: using self-attrition method, a label mapping dictionary is built with a small number of label samples and an unsupervised learning approach.
In a specific embodiment, fig. 2 shows a flowchart of tag correspondence dictionary training according to a specific embodiment of the present application, as shown in fig. 2, including the following steps:
201: labeling the sample;
202: a TextCnn-attribute model; the labeled samples are input into the TextCnn-intent model training.
203: training a dictionary; and carrying out label classification on the data and giving the confidence of the class.
204: confidence level:
205: judging whether a label sample is fake or not; if yes, enter 207, if no enter 206 (determine unlabeled text)
207: and judging whether the iteration stop condition is met, if so, entering a dictionary corresponding to the label at 208, otherwise, returning to the labeling sample at 201.
In a specific embodiment, the class corresponding to the label of the text classification is set as K, and a small number of samples are extracted from the samples to be tested to be manually marked to form a data set D1 with the label. Training a text classifier of a base class by using the marked data set and the TextCnn-attribute model, and extracting part of data to form a verification set with balanced class numbers.
Setting the minimum dictionary size of each category of the label mapping dictionary as m and a weight threshold alpha, weighting and scoring words by using an attribute mechanism of a base model, and adding words with the word dictionary weight greater than alpha into the category dictionary to obtain a category dictionary W= { W 11 ,w 12 ,...w k1 …w km },w km Represents the mth word of the kth category.
The data is labeled using the model and class dictionary that have been trained. Firstly, using a text classifier to classify data and give confidence of the category, using a classification dictionary to give hit keywords of the text to be detected, and then labeling samples with higher confidence (higher than beta) or samples with lower confidence (lower than beta) but hit keywords more than t. And finally, adding the label sample into the marked data set, and adding the non-label sample into the data set to be tested.
Repeating the steps, verifying the effect of each 2-3 epochs in the verification set by using a new training model, and stopping iteration when the F1 value of the verification set is larger than a specified numerical value or the dictionary size of a certain category is larger than the preset word number m.
The method can use a small amount of labeling data to perform unsupervised model training, and can efficiently construct a required label mapping dictionary.
S2: generating a template by using a soft prompting method, and converting the template as follows: the average value of the word vectors is used for replacing the word vectors, the cosine similarity mode is used for selecting the prompt word labels, the transducer model is used for carrying out random masking on the input prompt words, and the position vector codes in the transducer model are input in disorder.
In a specific embodiment, the prompt templates in general prompt learning are mainly divided into hard prompts (hardprompt) and soft prompts (soft prompt), the hard prompts are processed by fine tuning the model by utilizing a constructed prompt corpus, but the disadvantage of the hard prompts is that fine tuning of the model is too dependent on the construction of prompt sentences, and the influence of word changes in the model is very great. In order to avoid the influence of automatic template construction, the invention adopts a soft prompt method to generate the template, and the constructed input template is set as follows: s is S 1 =[u 1 ,u 2 ,…[mask],...u m ,v 1 ,v 2 ,....v n ],u i Token, v for a hint template to be generated i Token, [ mask ] for input training text]For words in the category correspondence dictionary, the input training text is randomly extracted from the training data. The invention adopts soft prompt to construct the template, except for mask]The out-of-position words are all token masks.
In a specific embodiment, the general training method of the prompt model adoptsText+hint template approach constructs training set, e.g. [ CLS ]]I today have very bad mood [ SEP]This is a mask][mask]Statement ", in which [ mask ]]The mask is the hint word that the model needs to predict, and represents a positive emotion sample if the generated word is positive and a negative emotion sample if the generated word is negative. How effectively a generated template is built that fits the current pre-trained model is very important, and the location of the hint words plays a vital role in the overall generated template, such as "[ CLS ]]I today have very bad mood [ SEP]This is a mask][mask]The statement "and" [ CLS]I are very bad in mood today [ SEP ]]But this is not a mask][mask]Statement ", it can be seen that the modification of part of the statement and the sentence pattern by the prompting template has a great influence on the result of the model, and in order to avoid the defect of the prompting template, the prompting template is improved in three aspects: first, the word vector is replaced by the average value of the word vector, and the selection of the prompt word label is performed by using a cosine similarity mode. Second, a transducer model is used to randomly mask the entered hint words, because word templates that may not be suitable for humans but may be converted to suitable word vectors for models by iteration of the model. Third, the position vector codes in the transducer model are input out of order, so that the [ MASK ] is reduced]Influence of the position of the words on the final model result. Let token of the constructed input template be u i The specific conversion formula is as follows: z i =transformers(u i ,e j ),h i =Relu(MLU(z i ) Where transfomers are transfomer models, MLU and Relu are multi-layer perceptron and Relu activation functions that pass after bi_lstm encoding. This constructed model is called a hint encoder, and the main function is to encode the hint template in terms of encoding the encoding, except for the mask]Other token uses initialization of random word vectors. And [ mask ]]The corresponding vector performs the ebadd operation by using the words of the corresponding class label, wherein e j For position-coded vectors, the transformation is normally performed by inputting the vectors according to the sequence of words, and the application adopts an out-of-order methodThe sequence is entered into a transducer for the purpose of preventing [ mask ] in the template model]Influence of position on the final result of the model.
S3: text classification is performed using templates.
With continued reference to fig. 3, fig. 3 shows a flowchart of tag correspondence dictionary training of a specific embodiment of the present application, as shown in fig. 3, mainly comprising the following steps:
301: labeling the sample;
302: a TextCnn-attribute model;
303: training a category dictionary;
304: preprocessing data;
305: prompting a template model;
306: pre-training a language model;
307: multiple category labels.
In a specific embodiment, the values of the overall category judgment of the model and the generated template judgment are final category words, but only one is the process of performing MLM, the other is the process of label prediction, the pre-training language model is LLM (Large Language Model), and the input sentence is x, so that the overall input process is as follows:y j =LLM(concat(h 0:i ,v i+1:m )),i∈1,T,v i representing vectors after model ebedding, h i Representing the content of the template generation model encode, concat represents the merging of the two matrices. y is j For predicted word vectors ([ mask ]]Corresponding word vector) in order to effectively represent all words, a word vector is selected as the final prediction unit. The category formula for converting a word vector into a word vector is: /> n is word length, and the constructed prompting template vector and the input sentenceThe sub-are spliced and put into the pre-training model together. The pre-training language model (Large language Model) LLM is selected, the input of Chinese word vectors is required to be supported according to the assumption that the large model is required to be used, and the input sentence is x, so that the whole input process is as follows: v i =LLM_embedding(x i ),Let the vector of the class-corresponding dictionary after LLM_emmbedding and word vector conversion be z= { z 11 ,z 12 ,...z km K represents the nth category and m represents the mth word. Finally, the objective function of the model is divided into two parts, and the loss function L of the words is predicted 1 And predicting a loss function L of the tag 2 . For L 1 The method is still a classification process, and is equivalent to determining whether the continuously generated words are the same as the category target words when the pre-training language model is generated, and the specific formula is as follows:
where u represents the length of the word, m represents the number of words in the dictionary corresponding to the tag, and p u (y) one-hot coding for model prediction, p u (y u ) Is a predictive probability matrix. The vector generated by the LSTM enables further iterations of template generation, i.e., by generating templates that are constructed to modify the feature vector fed into LLM by fine-tuning of the LSTM even though they do not necessarily fit the input sentence.
L 2 Model-generated mask is set for loss function between calculated class label and predicted class label]The word vector corresponding to the vector is c i I=1, 2 … m, calculate L 2 The formula of (2) is as follows:p k =argmax k (cos(c i ,z km ));/>wherein cos (-) represents cosine similarity function for calculating similarity between predicted word vector and target word vector, p k Representing probability of selecting category corresponding to word with maximum similarity as kth category, L 2 I.e. the cross entropy loss function between the kth class probability and the true class label. The final overall loss function is the sum of the loss functions of both, l=l 1 +L 2 . The whole model frame diagram is shown in fig. 4, hotel comments are taken as an example, and generated word vectors are mapped into corresponding category labels through similarity calculation and label mapping through LLM+ prompt templates.
In order to verify the effectiveness of prompt learning, the invention utilizes different pre-training models to respectively perform fine-tuning and prompt template training so as to verify the effectiveness of the method. The invention firstly compares the difference between the text classification method based on fine adjustment and the text classification method based on prompt learning. Three pre-trained language models of bert-base, iarge-bert and engine are respectively selected for comparison, and the models are built by adopting a pallet framework. Public data weibo emotion classification data sets were used as tests. The specific results are shown in table 1 below. From the results, the accuracy and F1 of the prompt learning are obviously improved compared with the fine-tuning mode text classification.
Table 1 Fine tuning vs. prompt learning methods (weibo emotion classification data set)
In order to further verify the effectiveness of the method, the soft prompt template and the fixed prompt template of the self-constructed template are used for respectively comparing the results of the models, and in order to make the comparison clearer, three models of bert, bert-large and ernie are still adopted for comparison analysis. And adopting the published data set weibo emotion four-classification data set to carry out verification analysis. The specific results are shown in table 2 below:
table 2 Fine tuning vs. prompt learning methods (weibo emotion four-class dataset)
The hard prompt template is a self-constructed template, and compared with the common prompt template, the self-constructed dictionary and the soft prompt are adopted to obviously improve the text classification accuracy and recall rate.
FIG. 5 shows a schematic diagram of a text classification system based on a self-constructed hint template according to an embodiment of the present application, as shown in FIG. 3, the system includes a tag mapping dictionary construction unit 301, a template generation conversion unit 302, and a text classification unit 303, wherein the tag mapping dictionary construction unit 301 is configured to construct a tag mapping dictionary using self-attitution, using a small number of tag samples, and using unsupervised learning; the template generation conversion unit 302 is configured to perform generation of a template by using a method of soft hint, and perform conversion of the template as follows: using an average value of word vectors to replace word vectors, selecting prompt word labels in a cosine similarity mode, using a transducer model to randomly mask input prompt words, and using disorder input for position vector codes in the transducer model; the text classification unit 303 is configured to classify text using templates.
Referring now to FIG. 6, a schematic diagram of a computer system suitable for use in implementing embodiments of the present application is shown. The electronic device shown in fig. 6 is only an example and should not impose any limitation on the functionality and scope of use of the embodiments of the present application.
As shown in fig. 6, the computer system includes a Central Processing Unit (CPU) 601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Liquid Crystal Display (LCD) or the like, a speaker or the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 601. It should be noted that the computer readable storage medium of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present application may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments described in the present application may be implemented by software, or may be implemented by hardware.
As another aspect, the present application also provides a computer-readable storage medium that may be included in the electronic device described in the above embodiments; or may exist alone without being incorporated into the electronic device. The computer-readable storage medium carries one or more programs that, when executed by the electronic device, cause the electronic device to: constructing a label mapping dictionary by using a self-attrition method and a small number of mark samples and an unsupervised learning mode; generating a template by using a soft prompting method, and converting the template as follows: using an average value of word vectors to replace word vectors, selecting prompt word labels in a cosine similarity mode, using a transducer model to randomly mask input prompt words, and using disorder input for position vector codes in the transducer model; text classification is performed using templates.
The foregoing description is only of the preferred embodiments of the present application and is presented as a description of the principles of the technology being utilized. It will be appreciated by persons skilled in the art that the scope of the invention referred to in this application is not limited to the specific combinations of features described above, but it is intended to cover other embodiments in which any combination of features described above or equivalents thereof is possible without departing from the spirit of the invention. Such as the above-described features and technical features having similar functions (but not limited to) disclosed in the present application are replaced with each other.
Claims (12)
1. A text classification method based on a self-constructed prompt template is characterized by comprising the following steps:
s1: constructing a label mapping dictionary by using a self-attrition method and a small number of mark samples and an unsupervised learning mode;
s2: generating a template by using a soft prompting method, and converting the template as follows: using an average value of word vectors to replace word vectors, selecting prompt word labels in a cosine similarity mode, adopting a transducer model to randomly mask input prompt words, and adopting disorder input for position vector codes in the transducer model;
s3: and classifying the text by using the template.
2. The text classification method based on the self-construction hint template according to claim 1, wherein the S1 construction of the tag mapping dictionary specifically includes:
s11: extracting a small amount of samples from the samples to be detected for manual labeling to form a labeled data set D 1 ;
S12: weighting scoring the words by using an attribute mechanism of the base class model, and adding words with the word dictionary weights larger than alpha into a class dictionary to obtain a class dictionary W= { W 11 ,w 12 ,...w k1. ...w km W, where km The m-th word representing the kth category, m being the minimum dictionary size of each category of the tag mapping dictionary, α representing a weight threshold;
s13: labeling the data by using the trained models and the class dictionary;
s14: repeating the steps, verifying the effect of each 2-3 epochs in the verification set by using a new training model, and stopping iteration when the F1 value of the verification set is larger than a preset numerical value or the dictionary size of a certain category is larger than the preset word number m.
3. The text classification method based on self-constructing a hint template of claim 2 wherein a text classifier of a base class is trained using a labeled dataset and a TextCnn-intent model, and a portion of the data is extracted to form a verification set of class number balance.
4. The text classification method based on the self-constructed prompting template according to claim 1, wherein the generating of the template by the soft prompting method specifically comprises: the constructed input templates are as follows: s is S 1 =[u 1 ,u 2 ,...[mask],...u m ,v 1 ,v 2 ,....v n ],u i Token, v for a hint template to be generated i Token, [ mask ] for input training text]For words in the category correspondence dictionary, the input training text is randomly extracted from the training data.
5. The text classification method based on self-constructing a hint template according to claim 4, wherein the formula of the transformation in S2 is: z i =transformers(u i ,e j ),h i =Relu(MLU(z i ) The transfomers are a transform model, the MLU and the Relu are multi-layer perceptron and the Relu activation functions passed after Bi_LSTM encoding, the model is called a hint encoder for encoding a hint template, except for [ mask ]]Other token uses initialization of random word vector, [ mask ]]The corresponding vector performs the operation of ebadd with the words of the corresponding class label, where e j Vectors that are position coded.
6. The method of claim 1, wherein the objective function of the model comprises a loss function L of the predicted word 1 And predicting tag lossLoss function L 2 The overall loss function is the sum of the loss functions of both, l=l 1 +L 2 ,Where u represents the length of the word, m represents the number of words in the dictionary corresponding to the tag, and p u (y) one-hot coding for model prediction, p u (y u ) Is a predictive probability matrix; l (L) 2 For the loss function between the calculated class label and the predicted class label +.>Wherein p is k =argmax k (cos(c i ,z km )),/> cos (. Cndot.) represents cosine similarity function, p l Representing the probability of selecting the category corresponding to the word with the highest similarity as the kth category.
7. A computer readable storage medium having stored thereon one or more computer programs, which when executed by a computer processor implement the method of any of claims 1-6.
8. A text classification system based on a self-constructing hint template, comprising:
tag mapping dictionary construction unit: configuring a method for utilizing self-attrition, and constructing a label mapping dictionary by utilizing a small number of label samples and an unsupervised learning mode;
template generation conversion unit: the method is configured to generate a template by using a soft prompt method, and perform the following conversion on the template: using an average value of word vectors to replace word vectors, selecting prompt word labels in a cosine similarity mode, adopting a transducer model to randomly mask input prompt words, and adopting disorder input for position vector codes in the transducer model;
text classification unit: is configured to use the template for text classification.
9. The text classification system based on self-constructing prompt template according to claim 8, wherein the label mapping dictionary construction unit is specifically configured to extract a small number of samples from the samples to be tested for manual labeling to form a labeled dataset D 1 The method comprises the steps of carrying out a first treatment on the surface of the Weighting scoring the words by using an attribute mechanism of the base class model, and adding words with the word dictionary weights larger than alpha into a class dictionary to obtain a class dictionary W= { W 11 ,w 12 ,...w l1. ...w km W, where km The m-th word representing the kth category, m being the minimum dictionary size of each category of the tag mapping dictionary, α representing a weight threshold; labeling the data by using the trained models and the class dictionary; repeating the steps, using a new training model for every 2-3 epochs to verify the effect of the epochs in a verification set, stopping iteration when the F1 value of the verification set is larger than a preset numerical value or the dictionary size of a certain class is larger than the preset word number m, training a text classifier of a base class by using the marked data set and the textCnn-attribute model, and extracting part of data to form the verification set with balanced class numbers.
10. The text classification system based on self-constructing a hint template of claim 8, wherein the generating of the template by the soft hint method in the template generating conversion unit specifically includes: the constructed input templates are as follows: s is S 1 =[u 1 ,u 2 ,...[mask],...u m ,v 1 ,v 2 ,....v n ],u i Token, v for a hint template to be generated i Token, [ mask ] for input training text]For words in the category correspondence dictionary, the input training text is randomly extracted from the training dataAnd (3) generating.
11. The self-constructing alert template based text classification system as recited in claim 8 wherein the converted formula is: z i =transformers(u i ,e j ),h i =Relu(MLU(z i ) The transfomers are a transform model, the MLU and the Relu are multi-layer perceptron and the Relu activation functions passed after Bi_LSTM encoding, the model is called a hint encoder for encoding a hint template, except for [ mask ]]Other token uses initialization of random word vector, [ mask ]]The corresponding vector performs the operation of ebadd with the words of the corresponding class label, where e j Vectors that are position coded.
12. The self-constructing hint template-based text classification system of claim 8, wherein the objective function of the model includes a loss function L of the predicted term 1 And predicting a loss function L of the tag 2 The overall loss function is the sum of the loss functions of both, l=l 1 +L 2 ,Where u represents the length of the word, m represents the number of words in the dictionary corresponding to the tag, and p u (y) one-hot coding for model prediction, p u (y u ) Is a predictive probability matrix; l (L) 2 For a loss function between the calculated class label and the predicted class label,wherein p is k =argmax k (cos(c i ,z km )),/> cos (·) represents a cosine similarity function,p k representing the probability of selecting the category corresponding to the word with the highest similarity as the kth category.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311619300.3A CN117725211A (en) | 2023-11-30 | 2023-11-30 | Text classification method and system based on self-constructed prompt template |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311619300.3A CN117725211A (en) | 2023-11-30 | 2023-11-30 | Text classification method and system based on self-constructed prompt template |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117725211A true CN117725211A (en) | 2024-03-19 |
Family
ID=90206236
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311619300.3A Pending CN117725211A (en) | 2023-11-30 | 2023-11-30 | Text classification method and system based on self-constructed prompt template |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117725211A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117972097A (en) * | 2024-03-29 | 2024-05-03 | 长城汽车股份有限公司 | Text classification method, classification device, electronic equipment and storage medium |
-
2023
- 2023-11-30 CN CN202311619300.3A patent/CN117725211A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117972097A (en) * | 2024-03-29 | 2024-05-03 | 长城汽车股份有限公司 | Text classification method, classification device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112084337B (en) | Training method of text classification model, text classification method and equipment | |
US11210470B2 (en) | Automatic text segmentation based on relevant context | |
CN112784578B (en) | Legal element extraction method and device and electronic equipment | |
CN111738004A (en) | Training method of named entity recognition model and named entity recognition method | |
CN108536754A (en) | Electronic health record entity relation extraction method based on BLSTM and attention mechanism | |
CN109766277A (en) | A kind of software fault diagnosis method based on transfer learning and DNN | |
CN110334186B (en) | Data query method and device, computer equipment and computer readable storage medium | |
CN110597961A (en) | Text category labeling method and device, electronic equipment and storage medium | |
CN112100401B (en) | Knowledge graph construction method, device, equipment and storage medium for science and technology services | |
CN111930939A (en) | Text detection method and device | |
CN116151256A (en) | Small sample named entity recognition method based on multitasking and prompt learning | |
CN116775872A (en) | Text processing method and device, electronic equipment and storage medium | |
CN114021582B (en) | Spoken language understanding method, device, equipment and storage medium combined with voice information | |
CN115408525B (en) | Letters and interviews text classification method, device, equipment and medium based on multi-level label | |
CN112906398B (en) | Sentence semantic matching method, sentence semantic matching system, storage medium and electronic equipment | |
CN117725211A (en) | Text classification method and system based on self-constructed prompt template | |
CN115952791A (en) | Chapter-level event extraction method, device and equipment based on machine reading understanding and storage medium | |
CN116661805B (en) | Code representation generation method and device, storage medium and electronic equipment | |
CN116070632A (en) | Informal text entity tag identification method and device | |
CN113705222B (en) | Training method and device for slot identification model and slot filling method and device | |
CN115496067A (en) | Entity recognition model training method and device and entity recognition method and device | |
CN117787283A (en) | Small sample fine granularity text named entity classification method based on prototype comparison learning | |
CN116431813A (en) | Intelligent customer service problem classification method and device, electronic equipment and storage medium | |
CN116562291A (en) | Chinese nested named entity recognition method based on boundary detection | |
CN116186241A (en) | Event element extraction method and device based on semantic analysis and prompt learning, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |