CN111382271A - Training method and device of text classification model and text classification method and device - Google Patents

Training method and device of text classification model and text classification method and device Download PDF

Info

Publication number
CN111382271A
CN111382271A CN202010156375.2A CN202010156375A CN111382271A CN 111382271 A CN111382271 A CN 111382271A CN 202010156375 A CN202010156375 A CN 202010156375A CN 111382271 A CN111382271 A CN 111382271A
Authority
CN
China
Prior art keywords
text
texts
text classification
model
comprehensive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010156375.2A
Other languages
Chinese (zh)
Other versions
CN111382271B (en
Inventor
刘俊宏
马良庄
张望舒
温祖杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010156375.2A priority Critical patent/CN111382271B/en
Publication of CN111382271A publication Critical patent/CN111382271A/en
Application granted granted Critical
Publication of CN111382271B publication Critical patent/CN111382271B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification provides a training method of a text classification model, which comprises the following steps: firstly, acquiring N original texts and N corresponding text category labels, wherein N is a positive integer greater than 1; then, splicing the N original texts to obtain spliced texts; then, respectively carrying out one-hot coding on the N text category labels to obtain N category label vectors; then, carrying out average processing on the N category label vectors to obtain a comprehensive label vector; then, inputting the spliced text into a text classification model to obtain a comprehensive classification result; and training the text classification model based on the comprehensive classification result and the comprehensive label vector. In addition, an embodiment of the present specification further provides a text classification method, where the method includes: and acquiring target texts to be classified, copying the target texts to obtain N target texts, splicing the target texts, and inputting the spliced target texts into the text classification model obtained by using the training method to obtain a text classification result of the target texts.

Description

Training method and device of text classification model and text classification method and device
Technical Field
One or more embodiments of the present disclosure relate to the field of natural language processing technologies, and in particular, to a method and an apparatus for training a text classification model, and a method and an apparatus for text classification.
Background
In many scenarios, text classification is involved. For example, in an internet forum, a post posted by a user needs to be classified to be displayed in a forum section of a corresponding category (e.g., family emotion). With the development of machine learning, training a machine learning model for classifying texts becomes a research hotspot.
However, the accuracy of the current text classification results obtained by using machine learning models is limited. Therefore, a reasonable scheme is needed to effectively improve the accuracy of the text classification result.
Disclosure of Invention
One or more embodiments of the present specification describe a text classification method and apparatus, which introduce the concept of mixup (a data enhancement scheme) applied to the field of picture processing into text classification to enhance text data, thereby improving the accuracy of text classification results.
According to a first aspect, there is provided a method for training a text classification model, comprising: acquiring N original texts and N corresponding text category labels, wherein N is a positive integer greater than 1; splicing the N original texts to obtain spliced texts; respectively carrying out one-hot coding on the N text category labels to obtain N category label vectors; carrying out average processing on the N category label vectors to obtain a comprehensive label vector; inputting the spliced text into a text classification model to obtain a comprehensive classification result aiming at the N original texts; and training the text classification model based on the comprehensive classification result and the comprehensive label vector.
In one embodiment, training the text classification model based on the integrated classification result and the integrated label vector comprises: determining cross entropy loss based on the comprehensive classification result and the comprehensive label vector; and adjusting model parameters in the text classification model by using the cross entropy loss.
In one embodiment, the N original texts are historical user session texts collected in a customer service scenario, the N text category labels are standard question categories or standard question category identifiers, and the text classification model is a question prediction model.
In one embodiment, the text classification model is based on a deep neural network DNN, a recurrent neural network RNN, a long short term memory network LSTM, a transform model, or a Bert model.
According to a second aspect, there is provided a text classification method comprising: acquiring a target text to be classified; copying the target texts to obtain N target texts; after splicing the N target texts, inputting a text classification model obtained by training through the method of claim 1 to obtain a text classification result for the target texts.
According to a third aspect, there is provided a training apparatus for a text classification model, comprising: the acquiring unit is configured to acquire N original texts and corresponding N text category labels, wherein N is a positive integer greater than 1; the splicing unit is configured to splice the N original texts to obtain spliced texts; the encoding unit is configured to perform one-hot encoding on the N text category labels respectively to obtain N category label vectors; the averaging unit is configured to average the N category label vectors to obtain a comprehensive label vector; the prediction unit is configured to input the spliced text into a text classification model to obtain a comprehensive classification result aiming at the N original texts; and the training unit is configured to train the text classification model based on the comprehensive classification result and the comprehensive label vector.
According to a fourth aspect, there is provided a text classification apparatus comprising: the acquisition unit is configured to acquire a target text to be classified; the copying unit is configured to copy the target texts to obtain N target texts; and the prediction unit is configured to splice the N target texts, and then input a text classification model obtained by training through the device of the third aspect to obtain a text classification result for the target texts.
According to a fifth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first or second aspect.
According to a sixth aspect, there is provided a computing device comprising a memory having stored therein executable code, and a processor which, when executing the executable code, implements the method of the first or second aspect.
In summary, with the training method of the text classification model and the text classification method provided in the embodiments of the present specification, massive training data can be constructed without literally processing the text content, so that the information of the original text is retained, and therefore, all the problems faced by the above text enhancement based on synonym replacement do not exist, the model performance of the text classification model can be effectively improved, and the accuracy, reliability, and usability of the prediction result are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 illustrates a block diagram of a training process for a text classification model according to one embodiment;
FIG. 2 illustrates a block diagram of a process of using a text classification model according to one embodiment;
FIG. 3 illustrates a flow diagram of a method of training a text classification model according to one embodiment;
FIG. 4 illustrates a flow diagram of a text classification method according to one embodiment;
FIG. 5 illustrates an exemplary diagram of a text classification method according to an example;
FIG. 6 illustrates a block diagram of a training apparatus for a text classification model according to one embodiment;
FIG. 7 illustrates a block diagram of a text classification device according to one embodiment.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
In the field of machine learning, in order to improve the model performance of a machine learning model (or called a prediction model), a data enhancement technique may be employed. Specifically, based on the collected training data, a virtual sample can be constructed through a data enhancement technology to enrich the training data, so that the prediction performance of the prediction model is improved, and meanwhile, the accuracy and the usability of the prediction result are improved.
At present, the enhancement of text data is mainly based on the text enhancement of synonym replacement, the enhancement mode needs to manually specify a synonym word list, and replacement errors caused by synonym word list errors or word multiplicity exist, so that the text data obtained by the enhancement mode has very limited improvement on the performance of a text classification model.
In addition, the image data enhancement technology of mixup has better effect in the task of image classification, but the mixup technology is not applied to texts. For this reason, one of the most core problems is that the text can not be linearly interpolated like the image, for example, the average value of two pixels is also one pixel, but there is no average word between two words to represent the middle value of two words, which is also the inevitable result caused by the discontinuity of the text.
Based on the observation and analysis, the inventor provides a text classification method by using the idea of mixup technology for reference, and in the method, massive training data can be constructed without performing literal processing on text contents, so that the information of an original text is reserved, all the problems of text enhancement based on synonym replacement do not exist, the model performance of a text classification model can be effectively improved, and the accuracy, the reliability and the usability of a prediction result are improved.
In particular, in the above text classification method, a training phase and/or a using phase of a text classification model are involved. For a training stage, in an embodiment, fig. 1 shows a block diagram of a training process of a text classification model according to an embodiment, as shown in fig. 1, in the model training stage, firstly, according to a predetermined number N (N is a positive integer greater than or equal to 2), N original labeling samples are selected from an original labeling data set, wherein each original labeling sample includes a corresponding original text and a corresponding text category label; secondly, on one hand, splicing N original texts included in N original labeling samples to obtain spliced texts, inputting the spliced texts into a text classification model to obtain a prediction result, on the other hand, respectively carrying out one-hot encoding on N category labels to correspondingly obtain N label vectors, and then carrying out average processing on the N label vectors to obtain an average vector; then, based on the prediction result and the average vector, model parameters in the text classification model are adjusted. Thus, by repeating the process shown in fig. 1, multiple iterations of the text classification model can be performed until the model converges, and the finally used text classification model is obtained.
Based on the text classification model obtained by the training, text classification can be realized. Specifically, for the stage of using the model, in an embodiment, fig. 2 shows a block diagram of a process of using the text classification model according to an embodiment, as shown in fig. 2, in the stage of using the model, first, target texts to be classified are obtained, then, according to the predetermined number N, the N target texts are spliced into target spliced texts, and then, the target spliced texts are input into the trained text classification model, so as to obtain a text classification result of the target texts. Therefore, a more accurate classification result for the target text can be obtained.
The following describes specific implementation steps of the above model training method and the using method in conjunction with specific embodiments. In particular, fig. 3 shows a flowchart of a method for training a text classification model according to an embodiment, and an execution subject of the method may be any device, equipment, platform, or equipment cluster having computing and processing capabilities. As shown in fig. 3, the method comprises the steps of:
step S310, obtaining N original texts and N corresponding text type labels, wherein N is a positive integer greater than 1; step S320, splicing the N original texts to obtain spliced texts; step S330, carrying out one-hot coding on the N text category labels respectively to obtain N category label vectors; step S340, carrying out average processing on the N category label vectors to obtain a comprehensive label vector; step S350, inputting the spliced text into a text classification model to obtain a comprehensive classification result aiming at the N original texts; and S360, training the text classification model based on the comprehensive classification result and the comprehensive label vector.
The steps are as follows:
first, in step S310, N original texts and corresponding N text category labels are acquired. Wherein N is a positive integer greater than 1, is a predetermined value, and may be specifically set to 2 or 3, and the like.
It should be noted that the original text and the text category label may correspond to any text classification scenario. In one embodiment, the original text may be a historical user session collected in a customer service scenario, and the text category label may be a standard question category or a standard question category identifier, and may be obtained by online collection (e.g., user feedback) or manual labeling. It should be understood that the standard question generally refers to a standard question, referred to as a standard question for short, which is summarized by a high-frequency question for a user. In a specific embodiment, the standard question category may refer to a text corresponding to the standard question, for example, the original text "how to open the flower" corresponds to the standard question category "how to open the flower". In a particular embodiment, wherein the standard issue category identification is used to uniquely identify the standard issue category, it may be comprised of numbers and/or letters. In one example, the standard issue category identification may be a numerical number, for example, if the standard issue categories total 3, then the 3 categories may be numbered 1, 2, and 3, respectively.
In another embodiment, the original text may be a content information text, and the corresponding text type label may be an information type or information type identifier. In one embodiment, the original text and information categories may be collected from a news website or a content recommendation platform. In one example, the original text may include a news article with the introductory phrase "hearty citizen's gold is not ambiguous" and the corresponding category of information may be "social news".
The source and content of the original text and text category labels are described above. Accordingly, it can be obtained, and in one embodiment, N labeled samples including N original texts and N text category labels can be selected according to a predetermined number N from an original labeled data set including a large number of original texts and text category labels.
In the above, N original texts and corresponding N text category labels may be obtained. Next, through steps S320 to S340, the acquired N original texts and N text category labels are processed into a training sample for the text classification model.
Specifically, on the one hand, in step S320, the N original texts are spliced to obtain a spliced text. It should be noted that N original texts may be spliced in any order, that is, there is no requirement on the splicing order.
In one embodiment, this step may include: respectively determining text vectors corresponding to the N original texts to obtain N text vectors, and splicing the N text vectors to obtain spliced vectors corresponding to the spliced texts. In a specific embodiment, determining the text vector corresponding to the original text may be implemented by word segmentation, word embedding, and the like, which may specifically refer to related prior art and will not be described in detail herein.
In another embodiment, this step may include: firstly, preprocessing any first original text in the N original texts to obtain a first preprocessed text with a preset number of characters (such as 20 or 30), and then splicing the N preprocessed texts corresponding to the N original texts obtained through preprocessing to obtain the spliced text. In a specific embodiment, the preprocessing may include: when the number of characters of the first original text is smaller than the preset number of characters, filling preset characters (such as 0) in sequence to obtain a first preprocessed text; and when the number of characters of the first original text is greater than the preset number of characters, intercepting the first original text, and only reserving the text with the preset number of characters as a first preprocessed text.
The N original texts can be spliced to obtain spliced texts.
On the other hand, in step S330, the N text category labels are subjected to unique hot encoding, so as to obtain N category label vectors. In step S340, the N category label vectors are averaged to obtain an integrated label vector.
It should be understood that One-Hot encoding, i.e., One-Hot encoding, also known as One-bit-efficient encoding, uses an M-bit status register to encode M states, each having its own independent register bit, and only One of which is active at any One time. In one embodiment, when the text category labels always share L classes, the text category labels may be encoded as L-dimensional vectors, where the value of one dimension is different from the value of the remaining dimensions. In one example, assuming that the text category label is a text category number, which includes 1, 2 and 3, the 3 types of labels may be encoded as (1, 0, 0), (0, 1, 0) and (0, 0, 1) in sequence. Thus, N text category labels are subjected to one-hot encoding respectively, and N category label vectors can be obtained.
Further, the N category label vectors may be averaged to obtain a comprehensive label vector. It should be noted that the reason why the averaging process is selected here is that the original texts have the same status in the concatenated text, so that the text category labels of different original texts are assigned the same value in the integrated label vector. It should be understood that the integrated label vector may indicate the classification result of the concatenated text, or alternatively, may indicate the integrated classification result of the N original texts.
According to a specific example, assuming that the text category labels have 4 total classes, N is 2, and 2 text category labels are 2 and 4, respectively, the 2 text category labels can be encoded as (0, 1, 0, 0) and (0, 0, 0, 1), respectively; further, the two category label vectors are averaged to obtain a composite label vector of (0, 0.5, 0, 0.5).
According to another specific example, assuming that the text category labels have 3 total classes, N is 2, and 2 text category labels are 1 and 1, respectively, the 2 text category labels can be coded as (1, 0, 0) and (1, 0, 0), respectively; further, the two category label vectors are averaged to obtain a comprehensive label vector of (1, 0, 0).
In this manner, a comprehensive tag vector corresponding to the N text category tags may be obtained.
The spliced text and the comprehensive label vector can be obtained to form a training sample. Based on this, in step S350, the stitched text is input into a text classification model, and a comprehensive classification result for the N original texts is obtained.
In one embodiment, the original text is a historical user session collected in a customer service scenario, and the text classification label is a question category. In another embodiment, the original text is a content information text, and the text classification label is an information type, and accordingly, the text classification model may be an information type prediction model.
On the other hand, in one embodiment, the text classification model may be based on an artificial neural network, a decision tree algorithm, a bayesian algorithm, or the like. In a specific embodiment, the text classification model may be based on DNN (Deep Neural Network), transform model, Bert (Bidirectional encoderpressation from transforms, transform bi-directionally coded representation) model, RNN (Recurrent Neural Network), LSTM (Long Short-Term Memory), and the like.
Therefore, the spliced text is input into the text classification model, and a comprehensive classification result aiming at the N original texts can be obtained. In one embodiment, the integrated classification result may be a classification result vector, wherein the included vector elements are probabilities that the concatenated text belongs to each text category. In one example, assuming that the text classes have 5 classes, the comprehensive classification result may be (0.1, 0.6, 0.1, 0.1, 0.1), and thus the classification result vector may know that the probabilities of the corresponding concatenated texts belonging to classes 1-5 are 0.1, 0.6, 0.1, 0.1, and 0.1, respectively. In another embodiment, the comprehensive classification result may be a category to which the concatenated text belongs, specifically, the category is: and the spliced text belongs to the category corresponding to the maximum value in the probabilities of all the categories. In one example, it is assumed that probabilities of the stitched text belonging to the categories 1 to 5 are respectively 0.8, 0.05, 0.04, 0.05 and 0.06, so that the stitched text can be determined to belong to the category 1, and (1, 0, 0, 0, 0) is determined as the comprehensive classification result.
In the above, the comprehensive label vector and the comprehensive classification result can be obtained respectively. Then, in step S360, the text classification model is trained based on the comprehensive classification result and the comprehensive label vector for classification of the target text.
In one embodiment, the text classification model may be adapted using the composite classification result, the composite label vector, and a preselected loss function. In a specific embodiment, the loss function may be a cross-entropy loss function, and accordingly, the step may include: determining cross entropy loss based on the comprehensive classification result and the comprehensive label vector; and adjusting model parameters in the text classification model by using the cross entropy loss. In addition, the specific reference adjusting method can be referred to in the prior art and is not described in detail.
In the above, the iterative training of the text classification model can be realized by executing the steps S310 to S360, and the finally trained text classification model can be obtained by repeating the steps S310 to S360 for many times until the model converges.
In summary, with the training method of the text classification model provided in the embodiments of the present specification, massive training data can be constructed without literally processing the text content, so that the information of the original text is retained, and therefore, all the problems faced by the text enhancement based on synonym replacement do not exist, and the model performance of the text classification model can be effectively improved.
It should be noted that after the trained text classification model is obtained, the target text can be classified by using the trained text classification model. In particular, fig. 4 shows a flowchart of a text classification method according to an embodiment, and an execution subject of the method may be any device, equipment, platform, or equipment cluster having computing and processing capabilities. As shown in fig. 4, the method comprises the steps of:
step S410, obtaining a target text to be classified; step S420, copying the target texts to obtain N target texts; step S430, after splicing the N target texts, inputting a text classification model obtained by training according to the method of claim 1, and obtaining a text classification result for the target texts.
In view of the above steps, in one embodiment, for a target text to be classified, N text bits at an input end of a text classification model may be simultaneously set as the target text, so that classification prediction of the target text may be achieved.
From the foregoing, the text classification model used therein has excellent model performance, and therefore, the text classification result obtained by the above text classification method has higher accuracy and reliability.
In the following, a text classification method is described with reference to a specific example, which includes training and using a model. Fig. 5 is a diagram illustrating an example of a text classification method according to an example, and as shown in fig. 5, the categories of the text classification are 5 in total, and the table shows that 2 (N ═ 2 above) samples are combined as one sample. In the training stage of the classifier (or called text classification model), the text level one and the text level two can be respectively set as the texts in the sample 1 and the sample 2 to obtain the text classification result, wherein the text classification result comprises the spliced texts corresponding to the sample combinations (1, 2) and the probability of belonging to each category, and then the parameters in the classifier are adjusted by using the text classification result and the comprehensive label vectors (0, 0.5, 0.5, 0, 0) corresponding to the sample combinations (1, 2). Or after the text classification results corresponding to the sample combinations (1, 2) and the sample combinations (1, 1) are obtained, the two text classification results and the two comprehensive label vectors corresponding to the two combinations are utilized to adjust parameters of the classifier. Thus, training of the classifier can be realized. Further, in the using stage of the classifier, for the text to be predicted (or called target text to be classified), the text position one and the text position two at the input end of the classifier are simultaneously set as the text to be predicted, so as to obtain the classification result of the text to be predicted.
In summary, with the training method of the text classification model and the text classification method provided in the embodiments of the present specification, massive training data can be constructed without literally processing the text content, so that the information of the original text is retained, and therefore, all the problems faced by the above text enhancement based on synonym replacement do not exist, the model performance of the text classification model can be effectively improved, and the accuracy, reliability, and usability of the prediction result are improved.
Further, in essence, the text classification method adds regularization to the effect of the model, and improves the generalization capability of the model, so that the text classification effect is obviously improved. For the regularization, taking N ═ 2 as an example for explanation, originally, the space between two samples A, B in the model is a very complex and singular shape, and after the text classification method introducing the mixup idea is adopted, it is specified that A, B midpoints of the two samples must be mapped to the label of 1/2a +1/2B, so that a constraint is added to the training of the model.
Actually, the effect test is performed on the text classification method, 10000 customer service robot marking data are used as training data, and 25000 customer service robot marking data are used as test data. Scheme 1: adopting the text classification method for introducing the mixup idea on the training set; scheme 2: the conventional classification method. Compared with the model trained by the scheme 2, the model trained by the scheme 1 has the advantage that the accuracy on the test set is high by 5.2%, and the method is a very large improvement.
Corresponding to the training method and the classification method, the embodiment of the specification also discloses a training device and a classification device. The method comprises the following specific steps:
fig. 6 is a diagram illustrating a structure of a training apparatus for a text classification model according to an embodiment, and as shown in fig. 6, the training apparatus 600 includes:
an obtaining unit 610 configured to obtain N original texts and N corresponding text category labels, where N is a positive integer greater than 1; a splicing unit 620 configured to splice the N original texts to obtain spliced texts; an encoding unit 630, configured to perform unique hot encoding on the N text category labels, respectively, to obtain N category label vectors; an averaging unit 640, configured to average the N category label vectors to obtain a comprehensive label vector; the prediction unit 650 is configured to input the spliced text into a text classification model, so as to obtain a comprehensive classification result for the N original texts; a training unit 660 configured to train the text classification model based on the integrated classification result and the integrated label vector.
In one embodiment, the training unit 660 is specifically configured to: determining cross entropy loss based on the comprehensive classification result and the comprehensive label vector; and adjusting model parameters in the text classification model by using the cross entropy loss.
In one embodiment, the N original texts are historical user session texts collected in a customer service scenario, the N text category labels are standard question categories or standard question category identifiers, and the text classification model is a question prediction model.
In one embodiment, the text classification model is based on a deep neural network DNN, a recurrent neural network RNN, a long short term memory network LSTM, a transform model, or a Bert model.
Fig. 7 is a diagram illustrating a structure of a text classification apparatus according to an embodiment, and as shown in fig. 7, the classification apparatus 700 includes:
an obtaining unit 710 configured to obtain a target text to be classified; a copying unit 720, configured to copy the target texts to obtain N target texts; the prediction unit 730 is configured to splice the N target texts, and input a text classification model obtained by training in the apparatus shown in fig. 6 to obtain a text classification result for the target texts.
According to an embodiment of a further aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 3 or 4.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in connection with fig. 3 or fig. 4.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (12)

1. A training method of a text classification model comprises the following steps:
acquiring N original texts and N corresponding text category labels, wherein N is a positive integer greater than 1;
splicing the N original texts to obtain spliced texts;
respectively carrying out one-hot coding on the N text category labels to obtain N category label vectors;
carrying out average processing on the N category label vectors to obtain a comprehensive label vector;
inputting the spliced text into a text classification model to obtain a comprehensive classification result aiming at the N original texts;
and training the text classification model based on the comprehensive classification result and the comprehensive label vector.
2. The method of claim 1, wherein training the text classification model based on the composite classification result and the composite label vector comprises:
determining cross entropy loss based on the comprehensive classification result and the comprehensive label vector;
and adjusting model parameters in the text classification model by using the cross entropy loss.
3. The method of claim 1, wherein the N original texts are historical user session texts collected in a customer service scenario, the N text category labels are standard question categories or standard question category identifications, and the text classification model is a question prediction model.
4. The method of claim 1, wherein the text classification model is based on a Deep Neural Network (DNN), a Recurrent Neural Network (RNN), a long short term memory network (LSTM), a Transformer model, or a Bert model.
5. A method of text classification, comprising:
acquiring a target text to be classified;
copying the target texts to obtain N target texts;
after splicing the N target texts, inputting a text classification model obtained by training through the method of claim 1 to obtain a text classification result for the target texts.
6. An apparatus for training a text classification model, comprising:
the acquiring unit is configured to acquire N original texts and corresponding N text category labels, wherein N is a positive integer greater than 1;
the splicing unit is configured to splice the N original texts to obtain spliced texts;
the encoding unit is configured to perform one-hot encoding on the N text category labels respectively to obtain N category label vectors;
the averaging unit is configured to average the N category label vectors to obtain a comprehensive label vector;
the prediction unit is configured to input the spliced text into a text classification model to obtain a comprehensive classification result aiming at the N original texts;
and the training unit is configured to train the text classification model based on the comprehensive classification result and the comprehensive label vector.
7. The apparatus of claim 6, wherein the training unit is specifically configured to:
determining cross entropy loss based on the comprehensive classification result and the comprehensive label vector;
and adjusting model parameters in the text classification model by using the cross entropy loss.
8. The device of claim 6, wherein the N original texts are historical user session texts collected in a customer service scene, the N text category labels are standard question categories or standard question category identifications, and the text classification model is a question prediction model.
9. The apparatus of claim 6, wherein the text classification model is based on a Deep Neural Network (DNN), a Recurrent Neural Network (RNN), a Long Short Term Memory (LSTM), a transform model, or a Bert model.
10. A text classification apparatus comprising:
the acquisition unit is configured to acquire a target text to be classified;
the copying unit is configured to copy the target texts to obtain N target texts;
the prediction unit is configured to, after N target texts are spliced, input a text classification model obtained by the training of the apparatus of claim 6, and obtain a text classification result for the target texts.
11. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed in a computer, causes the computer to perform the method of any of claims 1-5.
12. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that when executed by the processor implements the method of any of claims 1-5.
CN202010156375.2A 2020-03-09 2020-03-09 Training method and device of text classification model, text classification method and device Active CN111382271B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010156375.2A CN111382271B (en) 2020-03-09 2020-03-09 Training method and device of text classification model, text classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010156375.2A CN111382271B (en) 2020-03-09 2020-03-09 Training method and device of text classification model, text classification method and device

Publications (2)

Publication Number Publication Date
CN111382271A true CN111382271A (en) 2020-07-07
CN111382271B CN111382271B (en) 2023-05-23

Family

ID=71219960

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010156375.2A Active CN111382271B (en) 2020-03-09 2020-03-09 Training method and device of text classification model, text classification method and device

Country Status (1)

Country Link
CN (1) CN111382271B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418278A (en) * 2020-11-05 2021-02-26 中保车服科技服务股份有限公司 Multi-class object detection method, terminal device and storage medium
CN112990443A (en) * 2021-05-06 2021-06-18 北京芯盾时代科技有限公司 Neural network evaluation method and device, electronic device, and storage medium
CN113434685A (en) * 2021-07-06 2021-09-24 中国银行股份有限公司 Information classification processing method and system
CN113806542A (en) * 2021-09-18 2021-12-17 上海幻电信息科技有限公司 Text analysis method and system
CN113836303A (en) * 2021-09-26 2021-12-24 平安科技(深圳)有限公司 Text type identification method and device, computer equipment and medium
CN114218941A (en) * 2021-11-30 2022-03-22 深圳市查策网络信息技术有限公司 News label labeling method
CN114491040A (en) * 2022-01-28 2022-05-13 北京百度网讯科技有限公司 Information mining method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5675710A (en) * 1995-06-07 1997-10-07 Lucent Technologies, Inc. Method and apparatus for training a text classifier
CN109344403A (en) * 2018-09-20 2019-02-15 中南大学 A kind of document representation method of enhancing semantic feature insertion
KR20190072823A (en) * 2017-12-18 2019-06-26 한국과학기술원 Domain specific dialogue acts classification for customer counseling of banking services using rnn sentence embedding and elm algorithm
CN110008342A (en) * 2019-04-12 2019-07-12 智慧芽信息科技(苏州)有限公司 Document classification method, apparatus, equipment and storage medium
CN110210513A (en) * 2019-04-23 2019-09-06 深圳信息职业技术学院 Data classification method, device and terminal device
CN110717039A (en) * 2019-09-17 2020-01-21 平安科技(深圳)有限公司 Text classification method and device, electronic equipment and computer-readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5675710A (en) * 1995-06-07 1997-10-07 Lucent Technologies, Inc. Method and apparatus for training a text classifier
KR20190072823A (en) * 2017-12-18 2019-06-26 한국과학기술원 Domain specific dialogue acts classification for customer counseling of banking services using rnn sentence embedding and elm algorithm
CN109344403A (en) * 2018-09-20 2019-02-15 中南大学 A kind of document representation method of enhancing semantic feature insertion
CN110008342A (en) * 2019-04-12 2019-07-12 智慧芽信息科技(苏州)有限公司 Document classification method, apparatus, equipment and storage medium
CN110210513A (en) * 2019-04-23 2019-09-06 深圳信息职业技术学院 Data classification method, device and terminal device
CN110717039A (en) * 2019-09-17 2020-01-21 平安科技(深圳)有限公司 Text classification method and device, electronic equipment and computer-readable storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418278A (en) * 2020-11-05 2021-02-26 中保车服科技服务股份有限公司 Multi-class object detection method, terminal device and storage medium
CN112990443A (en) * 2021-05-06 2021-06-18 北京芯盾时代科技有限公司 Neural network evaluation method and device, electronic device, and storage medium
CN113434685A (en) * 2021-07-06 2021-09-24 中国银行股份有限公司 Information classification processing method and system
CN113434685B (en) * 2021-07-06 2024-05-28 中国银行股份有限公司 Information classification processing method and system
CN113806542A (en) * 2021-09-18 2021-12-17 上海幻电信息科技有限公司 Text analysis method and system
CN113806542B (en) * 2021-09-18 2024-05-17 上海幻电信息科技有限公司 Text analysis method and system
CN113836303A (en) * 2021-09-26 2021-12-24 平安科技(深圳)有限公司 Text type identification method and device, computer equipment and medium
CN114218941A (en) * 2021-11-30 2022-03-22 深圳市查策网络信息技术有限公司 News label labeling method
CN114491040A (en) * 2022-01-28 2022-05-13 北京百度网讯科技有限公司 Information mining method and device
CN114491040B (en) * 2022-01-28 2022-12-02 北京百度网讯科技有限公司 Information mining method and device

Also Published As

Publication number Publication date
CN111382271B (en) 2023-05-23

Similar Documents

Publication Publication Date Title
CN111382271B (en) Training method and device of text classification model, text classification method and device
CN110717325B (en) Text emotion analysis method and device, electronic equipment and storage medium
CN110580308B (en) Information auditing method and device, electronic equipment and storage medium
WO2023241410A1 (en) Data processing method and apparatus, and device and computer medium
CN113408287B (en) Entity identification method and device, electronic equipment and storage medium
CN111738169A (en) Handwriting formula recognition method based on end-to-end network model
CN113408507B (en) Named entity identification method and device based on resume file and electronic equipment
CN117540221B (en) Image processing method and device, storage medium and electronic equipment
CN111368066B (en) Method, apparatus and computer readable storage medium for obtaining dialogue abstract
CN111738807B (en) Method, computing device, and computer storage medium for recommending target objects
CN114416979A (en) Text query method, text query equipment and storage medium
CN111651674B (en) Bidirectional searching method and device and electronic equipment
CN117893859A (en) Multi-mode text image classification method and device, electronic equipment and storage medium
CN111563380A (en) Named entity identification method and device
CN112183655A (en) Document multi-label classification method and device
CN110851597A (en) Method and device for sentence annotation based on similar entity replacement
CN117592490A (en) Non-autoregressive machine translation method and system for accelerating glance training
CN116311322A (en) Document layout element detection method, device, storage medium and equipment
CN116416637A (en) Medical document information extraction method and device, electronic equipment and readable medium
CN114626392B (en) End-to-end text image translation model training method
CN113887724A (en) Text training enhancement method and system based on deep learning
CN112528674A (en) Text processing method, model training method, device, equipment and storage medium
CN114138995B (en) Small sample cross-modal retrieval method based on countermeasure learning
CN117113268B (en) Multi-scale data fusion method, device, medium and electronic equipment
CN117540306B (en) Label classification method, device, equipment and medium for multimedia data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant