CN115409026A - Text classification method and related product - Google Patents

Text classification method and related product Download PDF

Info

Publication number
CN115409026A
CN115409026A CN202211063246.4A CN202211063246A CN115409026A CN 115409026 A CN115409026 A CN 115409026A CN 202211063246 A CN202211063246 A CN 202211063246A CN 115409026 A CN115409026 A CN 115409026A
Authority
CN
China
Prior art keywords
keyword
text classification
current iteration
model
model under
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211063246.4A
Other languages
Chinese (zh)
Inventor
武悦娇
任君翔
刘浪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pacific Insurance Technology Co Ltd
Original Assignee
Pacific Insurance Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pacific Insurance Technology Co Ltd filed Critical Pacific Insurance Technology Co Ltd
Priority to CN202211063246.4A priority Critical patent/CN115409026A/en
Publication of CN115409026A publication Critical patent/CN115409026A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a text classification method and a related product. The method comprises the following steps: extracting keywords from the text data based on a keyword identification model under the current iteration turn to obtain a first keyword vector of the text data; updating the text classification model under the current iteration turn by using the keyword identification model under the current iteration turn to obtain a new text classification model; inputting the first keyword vector into a new text classification model to obtain a second keyword vector output by the new text classification model; and continuously updating the keyword identification model under the current iteration turn by using the new text classification model to obtain the keyword identification model under the next iteration turn, and circularly iterating until the text classification model is converged. Through the cooperative training between the keyword recognition model and the text classification model, the keyword recognition model and the text classification model can be mutually supported, so that the text classification effect is improved.

Description

Text classification method and related product
Technical Field
The present application relates to the field of natural language processing technologies, and in particular, to a method for controlling a database connection pool and a related product.
Background
The text classification refers to a process of classifying texts into a certain preset class or a certain set of classes by using a computer according to a preset classification system. In particular, there are many methods for text classification, and generally, categories of text can be matched according to abstract descriptions and/or keywords in the text, so as to classify the text into corresponding categories, and the abstract descriptions and/or keywords can be labeled on the text as tags.
At present, the conventional text classification method performs text classification only by matching keywords. In this case, if the definition of the keyword is broad, the semantic environment is not sufficiently understood and the information amount is not complete. Therefore, the text classification with such a wide range of keywords may result in low accuracy and poor classification effect.
Disclosure of Invention
The embodiment of the application provides a text classification method and a related product, which can improve the accuracy of text classification and ensure the text classification effect.
In a first aspect, an embodiment of the present application provides a text classification method, including:
extracting keywords from the text data based on a keyword identification model under the current iteration turn to obtain a first keyword vector of the text data;
updating the text classification model under the current iteration turn by using the keyword identification model under the current iteration turn to obtain a new text classification model under the current iteration turn;
inputting the first keyword vector to the new text classification model in the current iteration turn to obtain a second keyword vector output by the new text classification model in the current iteration turn;
and continuously updating the keyword identification model under the current iteration turn by using the new text classification model under the current iteration turn to obtain the keyword identification model under the next iteration turn, and circularly iterating until the text classification model is converged.
Optionally, the keyword recognition model under the current iteration round includes a first embedding layer; the first embedding layer is an embedding layer after the first keyword vector is output; the text classification model under the current iteration turn comprises a second embedding layer; the second embedding layer is an initial embedding layer of the text classification model under the current iteration turn;
the updating the text classification model under the current iteration round by using the keyword identification model under the current iteration round to obtain a new text classification model under the current iteration round comprises the following steps:
replacing the second embedding layer with the first embedding layer.
Optionally, the new text classification model under the current iteration turn includes a third embedding layer; the third embedding layer is the first embedding layer after the second keyword vector is output;
the step of continuously updating the keyword recognition model in the current iteration turn by using the new text classification model in the current iteration turn to obtain the keyword recognition model in the next iteration turn comprises the following steps:
replacing the first embedding layer with the third embedding layer.
Optionally, the keyword recognition model under the current iteration round is obtained through the following steps:
determining a data set used for training a keyword recognition model under the current iteration turn; the data set comprises a plurality of sample data marked with first keywords respectively;
acquiring an initial model for extracting keywords, and training the initial model by using the data set to obtain a trained initial model;
analyzing the data set based on the trained initial model to obtain a second keyword related to the first keyword;
and labeling the data set by the second keyword to obtain a new data set, continuously training the initial model by the new data set, and circularly iterating until the initial model is converged to obtain the keyword recognition model under the current iteration.
Optionally, the determining a data set for training the keyword recognition model in the current iteration turn includes:
acquiring a keyword candidate set; the keyword candidate set comprises first keywords corresponding to the plurality of sample data respectively;
screening the first keywords to obtain a keyword library;
and labeling the sample data by using the keyword library to obtain the data set.
Optionally, the method further comprises:
coding the first keyword vector to obtain a coding result;
and classifying and identifying the coding result, and decoding to obtain the category of the first keyword vector.
Optionally, the method further comprises:
performing convolution processing on the second keyword vector to obtain a plurality of feature matrixes corresponding to the second keyword vector;
performing pooling operation on the plurality of feature matrixes to obtain a plurality of pooled feature matrixes;
and performing text classification by using the plurality of pooled feature matrixes to obtain a text classification result.
In a second aspect, an embodiment of the present application provides a text classification apparatus, including:
the first keyword vector acquisition module is used for extracting keywords from the text data based on a keyword identification model under the current iteration turn to obtain a first keyword vector of the text data;
the text classification model updating module is used for updating the text classification model under the current iteration turn by using the keyword identification model under the current iteration turn to obtain a new text classification model under the current iteration turn;
the second keyword vector acquisition module is used for inputting the first keyword vector to the new text classification model under the current iteration turn to obtain a second keyword vector output by the new text classification model under the current iteration turn;
and the keyword identification model updating module is used for continuously updating the keyword identification model under the current iteration round by using the new text classification model under the current iteration round to obtain the keyword identification model under the next iteration round, and repeating the iteration until the text classification model is converged.
In a third aspect, an embodiment of the present application provides a text classification device, where the device includes: a processor, a memory, a system bus;
the processor and the memory are connected through the system bus;
the memory is for storing one or more programs, the one or more programs including instructions, which when executed by the processor, cause the processor to perform any of the text classification methods described above.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed on a terminal device, the instructions cause the terminal device to perform any one of the above text classification methods.
According to the technical scheme, the embodiment of the application has the following advantages:
in the embodiment of the application, after the keyword recognition model in the current iteration round is used for extracting the keywords from the text data to obtain the first keyword vector of the text data, the text classification model in the current iteration round can be updated by using the keyword recognition model in the current iteration round to obtain a new text classification model in the current iteration round, the first keyword vector is input into the new text classification model in the current iteration round to obtain the second keyword vector output by the new text classification model in the current iteration round, so that the keyword recognition model in the current iteration round can be continuously updated by using the new text classification model in the current iteration round to obtain the keyword recognition model in the next iteration round, and the iteration is performed circularly until the text classification model converges. Therefore, through the cooperative training between the keyword recognition model and the text classification model, the data sharing between the two models can be realized, so that the keyword recognition model and the text classification model support each other, and the text classification is not performed only by using the keywords, thereby improving the accuracy of the text classification and improving the text classification effect.
Drawings
Fig. 1 is a flowchart of a text classification method according to an embodiment of the present application;
fig. 2 is a flowchart of an obtaining manner of a keyword recognition model in a current iteration turn according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a text classification apparatus according to an embodiment of the present application.
Detailed Description
As described hereinbefore, the inventors found in the study for text classification that: at present, the conventional text classification method performs text classification only by matching keywords. In this case, if the definition of the keyword is broad, the semantic environment is not sufficiently understood, and the information amount is not complete. Therefore, the text classification with such a wide range of keywords may result in low accuracy and poor classification effect.
In order to solve the above problem, an embodiment of the present application provides a text classification method. The method can comprise the following steps: after extracting keywords from the text data based on the keyword identification model in the current iteration round to obtain a first keyword vector of the text data, updating the text classification model in the current iteration round by using the keyword identification model in the current iteration round to obtain a new text classification model in the current iteration round, inputting the first keyword vector into the new text classification model in the current iteration round to obtain a second keyword vector output by the new text classification model in the current iteration round, and thus continuously updating the keyword identification model in the current iteration round by using the new text classification model in the current iteration round to obtain a keyword identification model in the next iteration round and repeating until the text classification model converges.
Therefore, through the cooperative training between the keyword recognition model and the text classification model, the data sharing between the two models can be realized, so that the keyword recognition model and the text classification model support each other, and the text classification is not performed only by using the keywords, thereby improving the accuracy of the text classification and improving the text classification effect.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
Fig. 1 is a flowchart of a text classification method according to an embodiment of the present application. As shown in fig. 1, the text classification method provided in the embodiment of the present application may include:
s101: and extracting keywords from the text data based on the keyword identification model under the current iteration turn to obtain a first keyword vector of the text data.
In the embodiment of the present application, the keyword recognition model may be implemented by adopting a structure of BERT (bidirectional encoder Representation from transforms, based on converters) model + GP (Global Pointer) model. Because the maximum text length supported by the BERT model is 512, when the text length of the text data is greater than 512, the text data can be segmented firstly, so that the application scene is expanded, and the long text data can also be applied to the scheme. Specifically, in the embodiment of the present application, the text data may be segmented according to paragraphs, and if the length of a paragraph is still greater than 512, the text data is segmented again according to punctuation marks, so as to obtain a plurality of segmented text data. Further, in the process of preprocessing the text data, a character "CLS" may be spliced at the start position of each segmented text data, wherein a vector generated after the character "CLS" passes through the BERT model may represent semantic information of the whole text data, thereby facilitating a subsequent text classification task. In addition, the character "SEP" may be concatenated to an end position of each of the segmented text data, so that the character "SEP" segments a plurality of segmented text data. Therefore, the segmented text data can be input into the BERT model, and the first keyword vector output by the BERT model is obtained.
Further, in the embodiment of the present application, after the first keyword vector is obtained, the first keyword vector may be further processed through the keyword recognition model in the current iteration turn, so as to implement keyword recognition. Specifically, the first keyword vector may be input to the GP model for encoding to obtain an encoding result; and classifying and identifying the coding result, and decoding to obtain the category of the first keyword vector.
S102: and updating the text classification model under the current iteration turn by using the keyword identification model under the current iteration turn to obtain a new text classification model under the current iteration turn.
In the embodiment of the present application, the Text classification model may be implemented by using a Text classification model based on a Convolutional Neural network (Text-based Neural network) model. Based on this, in the embodiment of the present application, the process of acquiring a new text classification model in the current iteration turn, that is, S102, may not be specifically limited, and for convenience of understanding, a possible implementation is provided below for description.
In one possible implementation, the keyword recognition model at the current iteration turn may include a first embedding layer that is after the first keyword vector is output. The text classification model under the current iteration round may include a second embedding layer that is an initial embedding layer of the text classification model under the current iteration round. Correspondingly, S102 may specifically include: the second embedding layer is replaced with the first embedding layer. Therefore, after the first embedding layer replaces the second embedding layer, the embedding layer in the new text classification model in the current iteration turn is the first embedding layer, namely the embedding layer capable of outputting the first keyword vector after the keyword recognition. As the first embedded layer can acquire enough text characteristics after being identified by the keywords, the second embedded layer is replaced by the first embedded layer, so that the text classification model under the current iteration can acquire more text characteristic information, and the text classification model is better supported by the keyword identification model, so that the text classification is not only performed by the keywords, and the accuracy of text classification is further improved, and the text classification effect is improved.
S103: and inputting the first keyword vector to the new text classification model in the current iteration turn to obtain a second keyword vector output by the new text classification model in the current iteration turn.
In the embodiment of the application, after the second keyword vector is obtained, the second keyword vector may be further processed through a new text classification model in the current iteration turn, so as to implement text classification. In particular, the new text classification model under the current iteration round may include a third embedding layer, a convolutional layer, a pooling layer, and a fully-connected layer. Wherein the third embedding layer is the first embedding layer after the second keyword vector is output. Correspondingly, the second keyword vector output by the third embedding layer can be input into the convolution layer for convolution processing, and a plurality of feature matrices corresponding to the second keyword vector output by the convolution layer are obtained; inputting the plurality of feature matrixes into a pooling layer for pooling operation to obtain a plurality of pooled feature matrixes output by the pooling layer; and inputting the plurality of pooled feature matrixes into the full-link layer for text classification to obtain a text classification result output by the full-link layer. Here, the full connection layer may be a first full connection layer and a second full connection layer that are connected in sequence, where the first full connection layer may be configured to project the pooled feature matrix and output a projection result; the second full connection layer can classify the texts of the projection results and output the text classification results.
S104: and continuously updating the keyword recognition model under the current iteration round by using the new text classification model under the current iteration round to obtain the keyword recognition model under the next iteration round, and circularly iterating until the text classification model is converged.
In the embodiment of the present application, the process of obtaining the keyword recognition model in the next iteration, that is, S104, may not be specifically limited, and for convenience of understanding, the following provides a possible implementation manner for explanation.
In a possible implementation manner, S104 may specifically include: the first embedding layer is replaced with a third embedding layer. Therefore, when the third embedding layer replaces the first embedding layer, the embedding layer in the keyword recognition model in the next iteration round is the third embedding layer, namely the embedding layer can output the second keyword vector after text classification. Because the third embedding layer can obtain enough text classification information after text classification, the third embedding layer replaces the first embedding layer, so that the keyword recognition model in the next iteration can better extract keywords based on the text information, and the accuracy of keyword recognition can be improved.
In the embodiment of the application, after the keyword recognition model in the next iteration round is obtained, the loop iteration can be continued. Specifically, for ease of understanding, the current iteration round is denoted as i, which is a positive integer. Correspondingly, the new text classification model of iteration i round can be used as the text classification model of iteration i +1 round, and the text classification model of iteration i +1 round is continuously updated by the keyword identification model of iteration i +1 round, so that the new text classification model of iteration i +1 round is obtained. And then, continuously updating the keyword recognition model of the iteration i +1 round by using the new text classification model of the iteration i +1 round to obtain the keyword recognition model of the iteration i +2 round, and circularly iterating until the text classification model converges. Therefore, through the cooperative training between the keyword recognition model and the text classification model, the data sharing between the two models can be realized, and a better iteration attribute is brought to the whole model, so that the keyword recognition model and the text classification model are mutually supported, and the text classification is not carried out only by using the keywords, thereby improving the accuracy of the text classification and improving the text classification effect. In addition, in the embodiment of the present application, whether the text classification model converges or not may be determined by a cross entropy loss function.
Based on the relevant contents of S101 to S104, in this embodiment, after extracting keywords from text data based on the keyword recognition model in the current iteration round to obtain a first keyword vector of the text data, the text classification model in the current iteration round may be updated by using the keyword recognition model in the current iteration round to obtain a new text classification model in the current iteration round, and the first keyword vector is input to the new text classification model in the current iteration round to obtain a second keyword vector output by the new text classification model in the current iteration round. Therefore, through the cooperative training between the keyword recognition model and the text classification model, the data sharing between the two models can be realized, so that the keyword recognition model and the text classification model support each other, and the text classification is not performed only by using the keywords, thereby improving the accuracy of the text classification and improving the text classification effect.
In addition, in order to enable the keyword recognition model to better meet the text classification requirement and realize the function of keyword mining, and thus lay a foundation for the effect of subsequent text classification, the embodiment of the application can provide an obtaining mode of the keyword recognition model under the current iteration turn, which specifically comprises S201-S204. S201 to S204 are described below with reference to the embodiments and the drawings, respectively.
Fig. 2 is a flowchart of an obtaining manner of a keyword recognition model in a current iteration provided in the embodiment of the present application. As shown in fig. 2, S201 to S204 may specifically include:
s201: a data set for training a keyword recognition model under a current iteration turn is determined.
The data set may include a plurality of sample data each labeled with a first keyword.
In the embodiment of the present application, the determination process for the data set, that is, S201, may not be specifically limited. For ease of understanding, the following description is made in connection with one possible embodiment.
In a possible implementation, S201 may specifically include: acquiring a keyword candidate set; the keyword candidate set comprises first keywords corresponding to a plurality of sample data respectively; screening the first keywords to obtain a keyword library; and labeling a plurality of sample data by using a keyword library to obtain a data set. The keyword candidate set can be obtained through a TF-IDF (term Frequency-Inverse text Frequency index) model. The first keywords are screened, so that wrong keywords can be removed, correct keywords are reserved, and a keyword library is constructed. Accordingly, a data set can be obtained by performing label returning on a plurality of sample data through the keyword library so as to train the keyword recognition model.
S202: and acquiring an initial model for extracting the keywords, and training the initial model by using the data set to obtain a trained initial model.
S203: and analyzing the data set based on the trained initial model to obtain a second keyword related to the first keyword.
S204: and labeling the data set by using the second keyword to obtain a new data set, continuously training the initial model by using the new data set, and performing loop iteration until the initial model is converged to obtain a keyword recognition model under the current iteration round.
In the embodiment of the application, by predicting the data set again, the second keyword related to the first keyword can be further mined. Further, the data set is continuously subjected to label returning through the excavated second key words to obtain a new data set, retraining and re-predicting are continuously performed according to the new data set, and therefore the key word excavation effect can be achieved through continuous loop iteration. Here, in the embodiment of the present application, it may also be determined whether the initial model converges through a cross entropy loss function, so as to obtain the keyword recognition model in the current iteration.
Based on the text classification method provided by the embodiment, the embodiment of the application further provides a text classification device. The text classification device is described below with reference to the embodiments and the drawings, respectively.
Fig. 3 is a schematic structural diagram of a text classification apparatus according to an embodiment of the present application. Referring to fig. 3, a text classification apparatus 300 according to an embodiment of the present application is provided. Specifically, the text classification device 300 includes:
a first keyword vector obtaining module 301, configured to perform keyword extraction on the text data based on the keyword recognition model in the current iteration turn, to obtain a first keyword vector of the text data;
the text classification model updating module 302 is configured to update the text classification model in the current iteration turn with the keyword identification model in the current iteration turn to obtain a new text classification model in the current iteration turn;
a second keyword vector obtaining module 303, configured to input the first keyword vector to the new text classification model in the current iteration turn, and obtain a second keyword vector output by the new text classification model in the current iteration turn;
and the keyword identification model updating module 304 is configured to continuously update the keyword identification model in the current iteration round with the new text classification model in the current iteration round to obtain the keyword identification model in the next iteration round, and perform iteration circularly until the text classification model converges.
In this embodiment of the application, through cooperation of the first keyword vector obtaining module 301, the text classification model updating module 302, the second keyword vector obtaining module 303, and the keyword recognition model updating module 304, after extracting keywords from the text data based on the keyword recognition model in the current iteration turn to obtain the first keyword vector of the text data, the text classification model in the current iteration turn may be updated by the keyword recognition model in the current iteration turn to obtain a new text classification model in the current iteration turn, and the first keyword vector is input to the new text classification model in the current iteration turn to obtain the second keyword vector output by the new text classification model in the current iteration turn. Therefore, through the cooperative training between the keyword recognition model and the text classification model, the data sharing between the two models can be realized, so that the keyword recognition model and the text classification model support each other, and the text classification is not performed only by using the keywords, thereby improving the accuracy of the text classification and improving the text classification effect.
As an implementation mode, in order to improve the accuracy of text classification and ensure the text classification effect, the keyword recognition model under the current iteration turn comprises a first embedding layer; the first embedding layer is the embedding layer after the first keyword vector is output; the text classification model under the current iteration turn comprises a second embedding layer; the second embedding layer is the initial embedding layer of the text classification model under the current iteration turn. Accordingly, the text classification model updating module 302 may specifically include:
a second embedding layer replacing module for replacing the second embedding layer with the first embedding layer.
As an implementation mode, in order to improve the accuracy of text classification and ensure the text classification effect, a new text classification model under the current iteration turn comprises a third embedded layer; the third embedding layer is the first embedding layer after the second keyword vector is output. Accordingly, the keyword recognition model updating module 304 may specifically include:
a first embedding layer replacing module for replacing the first embedding layer with a third embedding layer.
As an implementation manner, in order to improve the accuracy of text classification and ensure the text classification effect, the keyword recognition model in the current iteration turn may be obtained through the following modules:
the data set determining module is used for determining a data set used for training a keyword recognition model under the current iteration turn; the data set comprises a plurality of sample data marked with first keywords respectively;
the initial model training module is used for acquiring an initial model for extracting keywords and training the initial model by using a data set to obtain a trained initial model;
the second keyword acquisition module is used for analyzing the data set based on the trained initial model to obtain second keywords related to the first keywords;
and the keyword recognition model acquisition module is used for labeling the data set with the second keyword to obtain a new data set, continuing training the initial model with the new data set, and performing loop iteration until the initial model converges to obtain a keyword recognition model under the current iteration turn.
As an embodiment, in order to improve accuracy of text classification and ensure a text classification effect, the data set determining module may specifically include:
the keyword candidate set acquisition module is used for acquiring a keyword candidate set; the keyword candidate set comprises first keywords corresponding to a plurality of sample data respectively;
the keyword bank obtaining module is used for screening the first keywords to obtain a keyword bank;
and the data set acquisition module is used for labeling a plurality of sample data by using the keyword library to obtain a data set.
As an embodiment, in order to improve the accuracy of text classification and ensure the text classification effect, the text classification apparatus 300 may further include:
the coding module is used for coding the first keyword vector to obtain a coding result;
and the keyword classification module is used for classifying and identifying the coding result and decoding to obtain the category of the first keyword vector.
As an embodiment, in order to improve the accuracy of text classification and ensure the text classification effect, the text classification apparatus 300 may further include:
the convolution operation module is used for carrying out convolution processing on the second keyword vector to obtain a plurality of feature matrixes corresponding to the second keyword vector;
the pooling operation module is used for pooling the plurality of feature matrixes to obtain a plurality of pooled feature matrixes;
and the text classification module is used for performing text classification by using the plurality of pooled feature matrixes to obtain a text classification result.
Further, an embodiment of the present application further provides a text classification device, including: a processor, a memory, a system bus;
the processor and the memory are connected through the system bus;
the memory is for storing one or more programs, the one or more programs including instructions, which when executed by the processor, cause the processor to perform any of the implementation methods of the text classification method described above.
Further, an embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a terminal device, the instructions cause the terminal device to execute any implementation method of the text classification method.
As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that all or part of the steps in the above embodiment methods can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solutions of the present application or portions contributing to the prior art may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network communication device such as a media gateway, etc.) to execute the method described in the embodiments or some portions of the embodiments of the present application.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of text classification, comprising:
extracting keywords from the text data based on a keyword identification model under the current iteration turn to obtain a first keyword vector of the text data;
updating the text classification model under the current iteration turn by using the keyword identification model under the current iteration turn to obtain a new text classification model under the current iteration turn;
inputting the first keyword vector to the new text classification model in the current iteration turn to obtain a second keyword vector output by the new text classification model in the current iteration turn;
and continuously updating the keyword identification model under the current iteration round by using the new text classification model under the current iteration round to obtain the keyword identification model under the next iteration round, and circularly iterating until the text classification model is converged.
2. The method of claim 1, wherein the keyword recognition model at the current iteration round comprises a first embedding layer; the first embedding layer is an embedding layer after the first keyword vector is output; the text classification model under the current iteration turn comprises a second embedding layer; the second embedding layer is an initial embedding layer of the text classification model under the current iteration turn;
the updating the text classification model under the current iteration round by using the keyword identification model under the current iteration round to obtain a new text classification model under the current iteration round comprises the following steps:
replacing the second embedding layer with the first embedding layer.
3. The method of claim 2, wherein the new text classification model at the current iteration turn comprises a third embedding layer; the third embedding layer is the first embedding layer after the second keyword vector is output;
the step of continuously updating the keyword recognition model in the current iteration turn by using the new text classification model in the current iteration turn to obtain the keyword recognition model in the next iteration turn comprises the following steps:
replacing the first embedding layer with the third embedding layer.
4. The method of claim 1, wherein the keyword recognition model for the current iteration is obtained by:
determining a data set used for training a keyword recognition model under the current iteration turn; the data set comprises a plurality of sample data marked with first keywords respectively;
acquiring an initial model for extracting keywords, and training the initial model by using the data set to obtain a trained initial model;
analyzing the data set based on the trained initial model to obtain a second keyword related to the first keyword;
and labeling the data set by the second keyword to obtain a new data set, continuously training the initial model by the new data set, and performing loop iteration until the initial model is converged to obtain a keyword recognition model under the current iteration round.
5. The method of claim 4, wherein determining the dataset used to train the keyword recognition model for the current iteration comprises:
acquiring a keyword candidate set; the keyword candidate set comprises first keywords corresponding to the plurality of sample data respectively;
screening the first keywords to obtain a keyword library;
and labeling the sample data by using the keyword library to obtain the data set.
6. The method of any of claims 1 to 5, further comprising:
coding the first keyword vector to obtain a coding result;
and classifying and identifying the coding result, and decoding to obtain the category of the first keyword vector.
7. The method of any of claims 1 to 5, further comprising:
performing convolution processing on the second keyword vector to obtain a plurality of feature matrixes corresponding to the second keyword vector;
pooling the plurality of feature matrices to obtain a plurality of pooled feature matrices;
and performing text classification by using the plurality of pooled feature matrixes to obtain a text classification result.
8. A text classification apparatus, comprising:
the first keyword vector acquisition module is used for extracting keywords from the text data based on the keyword identification model under the current iteration turn to obtain a first keyword vector of the text data;
the text classification model updating module is used for updating the text classification model under the current iteration turn by using the keyword identification model under the current iteration turn to obtain a new text classification model under the current iteration turn;
the second keyword vector acquisition module is used for inputting the first keyword vector to the new text classification model under the current iteration turn to obtain a second keyword vector output by the new text classification model under the current iteration turn;
and the keyword identification model updating module is used for continuously updating the keyword identification model under the current iteration round by using the new text classification model under the current iteration round to obtain the keyword identification model under the next iteration round, and repeating the iteration until the text classification model is converged.
9. A text classification device, characterized in that the device comprises: a processor, a memory, a system bus;
the processor and the memory are connected through the system bus;
the memory is to store one or more programs, the one or more programs including instructions, which when executed by the processor, cause the processor to perform the text classification method of any of claims 1 to 7.
10. A computer-readable storage medium having stored therein instructions that, when run on a terminal device, cause the terminal device to perform the text classification method of any one of claims 1 to 7.
CN202211063246.4A 2022-08-31 2022-08-31 Text classification method and related product Pending CN115409026A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211063246.4A CN115409026A (en) 2022-08-31 2022-08-31 Text classification method and related product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211063246.4A CN115409026A (en) 2022-08-31 2022-08-31 Text classification method and related product

Publications (1)

Publication Number Publication Date
CN115409026A true CN115409026A (en) 2022-11-29

Family

ID=84163197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211063246.4A Pending CN115409026A (en) 2022-08-31 2022-08-31 Text classification method and related product

Country Status (1)

Country Link
CN (1) CN115409026A (en)

Similar Documents

Publication Publication Date Title
US11636341B2 (en) Processing sequential interaction data
CN111950287B (en) Entity identification method based on text and related device
CN110852110B (en) Target sentence extraction method, question generation method, and information processing apparatus
CN110532381A (en) A kind of text vector acquisition methods, device, computer equipment and storage medium
CN112711660A (en) Construction method of text classification sample and training method of text classification model
CN113961685A (en) Information extraction method and device
CN112784009B (en) Method and device for mining subject term, electronic equipment and storage medium
CN113449489B (en) Punctuation mark labeling method, punctuation mark labeling device, computer equipment and storage medium
CN113971210B (en) Data dictionary generation method and device, electronic equipment and storage medium
CN112732862B (en) Neural network-based bidirectional multi-section reading zero sample entity linking method and device
CN114661861B (en) Text matching method and device, storage medium and terminal
CN114691525A (en) Test case selection method and device
CN118171149B (en) Label classification method, apparatus, device, storage medium and computer program product
CN111859950A (en) Method for automatically generating lecture notes
WO2020146784A1 (en) Converting unstructured technical reports to structured technical reports using machine learning
US11216621B2 (en) Foreign language machine translation of documents in a variety of formats
CN112732743B (en) Data analysis method and device based on Chinese natural language
CN112632948B (en) Case document ordering method and related equipment
CN112765976A (en) Text similarity calculation method, device and equipment and storage medium
CN111325021A (en) Method and device for identifying business state of WeChat public number
CN112966501B (en) New word discovery method, system, terminal and medium
CN112528674B (en) Text processing method, training device, training equipment and training equipment for model and storage medium
CN114416923A (en) News entity linking method and system based on rich text characteristics
CN115409026A (en) Text classification method and related product
CN111626059B (en) Information processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination