CN116738298A - Text classification method, system and storage medium - Google Patents

Text classification method, system and storage medium Download PDF

Info

Publication number
CN116738298A
CN116738298A CN202311028049.3A CN202311028049A CN116738298A CN 116738298 A CN116738298 A CN 116738298A CN 202311028049 A CN202311028049 A CN 202311028049A CN 116738298 A CN116738298 A CN 116738298A
Authority
CN
China
Prior art keywords
text
sample
type
prompt
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311028049.3A
Other languages
Chinese (zh)
Other versions
CN116738298B (en
Inventor
吴东明
温露露
陈超
吴志强
郭昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Tonghuashun Data Development Co ltd
Original Assignee
Hangzhou Tonghuashun Data Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Tonghuashun Data Development Co ltd filed Critical Hangzhou Tonghuashun Data Development Co ltd
Priority to CN202311028049.3A priority Critical patent/CN116738298B/en
Publication of CN116738298A publication Critical patent/CN116738298A/en
Application granted granted Critical
Publication of CN116738298B publication Critical patent/CN116738298B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The application discloses a text classification method, a system and a storage medium, wherein the method comprises the following steps: acquiring the field type of a text to be processed; acquiring a prompt text containing the field type; and processing the text to be processed and the prompt text to obtain the conclusion type of the text to be processed.

Description

Text classification method, system and storage medium
Technical Field
The present application relates to the field of text processing, and in particular, to a text classification method, system, and storage medium.
Background
According to the difference of application scenes, as a basic task in NLP (Natural Language Processing ), text classification is classified into emotion analysis, topic judgment, natural language reasoning, and the like. Text classification is widely used in the financial field, for example: the financial participants hope to judge the emotion tendency of the currency policy through a text classification algorithm so as to infer the trending of the bond interest rate; or analyzing the financial information by using a text classification algorithm to judge the market emotion, thereby achieving the purpose of predicting stock market fluctuation.
The common text classification method directly takes an original text as input, and the text belongs to the field which is not visible to a text classification model, so that the problem of semantic dissimilarity is brought. For example, the emotional tendency of "productivity enhancement" in the general field is positive, but the emotional tendency to the financial field is negative. This is because the increase in productivity causes the bank to increase in interest rate, resulting in a decrease in money fluidity, which is advantageous for the stock market.
Because of the limited labeling data in the financial domain, data enhancement methods are commonly used to augment training data. If the true accuracy of the annotation data itself is problematic, then the enhancement data set created using the annotation data will deliver the original error, even amplify the error.
Based on this, a more accurate text classification method with a wider application range is needed.
Disclosure of Invention
One aspect of the present specification provides a text classification method, the method comprising: acquiring the field type of a text to be processed; acquiring a prompt text containing the field type; and processing the text to be processed and the prompt text to obtain the conclusion type of the text to be processed.
Another aspect of the present specification provides a text classification system, the system comprising: the first acquisition module is used for acquiring the field type of the text to be processed; the second acquisition module is used for acquiring prompt texts containing the field types; and the determining module is used for processing the text to be processed and the prompt text to obtain the conclusion type of the text to be processed.
Another aspect of the present description provides a computer-readable storage medium storing computer instructions that when executed by a processor implement a text classification method.
Another aspect of the present specification provides a text classification model training method, the method comprising: acquiring a first type sample text, wherein the first type sample text comprises a sample to-be-processed text, a sample prompt text and a conclusion type label, and the sample prompt text comprises the field type of the sample text; processing the sample to-be-processed text and the sample prompt text in the first type sample text through the text classification model to obtain a conclusion type predicted value corresponding to the first type sample text; parameters of the text classification model are adjusted to reduce differences in conclusion type predictors and the conclusion type labels corresponding to the first type of sample text.
Another aspect of the present specification provides a text classification model training system, the system comprising: the sample acquisition module is used for acquiring a first type of sample text, wherein the first type of sample text comprises a sample to-be-processed text, a sample prompt text and a conclusion type label, and the sample prompt text comprises the field type of the sample to-be-processed text; the processing module is used for processing the sample to-be-processed text and the sample prompt text in the first type sample text through the text classification model to obtain a conclusion type predicted value corresponding to the first type sample text; and the parameter adjusting module is used for adjusting parameters of the text classification model to reduce the difference between the conclusion type predicted value corresponding to the first type sample text and the conclusion type label.
Another aspect of the present description provides a computer-readable storage medium storing computer instructions that when executed by a processor implement a text classification model training method.
Drawings
The present specification will be further elucidated by way of example embodiments, which will be described in detail by means of the accompanying drawings. The embodiments are not limiting, in which like numerals represent like structures, wherein:
FIG. 1 is an application scenario diagram of text classification according to some embodiments of the present description;
FIG. 2 is an exemplary block diagram of a text classification system according to some embodiments of the present description;
FIG. 3 is an exemplary block diagram of a text classification model training system according to some embodiments of the present description;
FIG. 4 is an exemplary flow chart of a text classification method according to some embodiments of the present description;
FIG. 5 is a schematic diagram of a text classification model shown in accordance with some embodiments of the present description;
FIG. 6 is an exemplary flow chart of training of a text classification model according to some embodiments of the present description;
FIG. 7 is a schematic diagram of a prompt classification model shown in accordance with some embodiments of the present description;
FIG. 8 is an exemplary flow chart of domain classification model training according to some embodiments of the present description;
FIG. 9 is an exemplary flow chart of prompt classification model training, shown in accordance with some embodiments of the present description.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present specification, the drawings that are required to be used in the description of the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some examples or embodiments of the present specification, and it is possible for those of ordinary skill in the art to apply the present specification to other similar situations according to the drawings without inventive effort. Unless otherwise apparent from the context of the language or otherwise specified, like reference numerals in the figures refer to like structures or operations.
It should be appreciated that "system," "apparatus," "unit," and/or "module" as used in this specification is a method for distinguishing between different components, elements, parts, portions, or assemblies at different levels. However, if other words can achieve the same purpose, the words can be replaced by other expressions.
As used in this specification and the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.
A flowchart is used in this specification to describe the operations performed by the system according to embodiments of the present specification. It should be appreciated that the preceding or following operations are not necessarily performed in order precisely. Rather, the steps may be processed in reverse order or simultaneously. Also, other operations may be added to or removed from these processes.
FIG. 1 is an application scenario diagram of text classification according to some embodiments of the present description.
As shown in fig. 1, the application scenario 100 may include: processing device 110 may process data and/or information obtained from other devices or system components. The processing device may execute program instructions to perform one or more of the functions described herein based on such data, information, and/or processing results. For example, the processing device 110 may obtain the text to be processed from the user terminal 130. For another example, the processing device 110 may process the text to be processed to obtain a domain type of the text to be processed. For another example, the processing device 110 may also process the text to be processed to obtain a prompt text, a conclusion type, etc. of the text to be processed. In some embodiments, the processing device 110 may include one or more sub-processing devices (e.g., a single-core processing device or a multi-core, multi-core processing device).
Storage 120 may be used to store data and/or instructions. For example, the storage device 120 may store text to be processed. For another example, the storage device 120 may store hint text. Storage device 120 may include one or more storage components, each of which may be a separate device or may be part of another device. In some embodiments, the storage device 120 may include Random Access Memory (RAM), read Only Memory (ROM), mass storage, removable memory, volatile read-write memory, and the like, or any combination thereof. In some embodiments, the storage device 120 may be implemented on a cloud platform.
User terminal 130 refers to one or more terminal devices or software used by a user. In some embodiments, the user terminal 130 may be used to interact and display with a user. For example, the user terminal 130 may display the pending text, the prompt text, and the conclusion type to the user. For another example, the user terminal 130 may obtain the text to be processed entered by the user from the user. In some embodiments, one or more users of the user terminal 130 may be used, including users who directly use the service, as well as other related users. In some embodiments, the user terminal 130 may be one or any combination of mobile device 130-1, tablet computer 130-2, laptop computer 130-3, desktop computer 130-4, and other input and/or output enabled devices.
The network 140 may connect components of the system and/or connect the system with external resource components. Network 140 enables communication between the various components and other components outside the system to facilitate the exchange of data and/or information. In some embodiments, network 140 may be any one or more of a wired network or a wireless network. The network connection between the parts can be in one of the above-mentioned ways or in a plurality of ways. In some embodiments, the network may be a point-to-point, shared, centralized, etc. variety of topologies or a combination of topologies. In some embodiments, network 140 may include one or more network access points. For example, the network 140 may include wired or wireless network access points, such as base stations and/or network switching points 140-1, 140-2 …, through which one or more components of the access point system may be connected to the network 140 to exchange data and/or information.
In some embodiments, the processing device 110, the user terminal 130, and possibly other system components may include a storage device 120. In some embodiments, the user terminal 130, as well as other possible system components, may include the processing device 110.
It should be noted that the above description is provided for illustrative purposes only and is not intended to limit the scope of the present description. Many variations and modifications will be apparent to those of ordinary skill in the art, given the benefit of this disclosure. The features, structures, methods, and other features of the exemplary embodiments described herein may be combined in various ways to obtain additional and/or alternative exemplary embodiments. However, such changes and modifications do not depart from the scope of the present specification.
Fig. 2 is a block diagram of a text classification system according to some embodiments of the present description.
As shown in fig. 2, the text classification system 200 may include a first acquisition module 210, a second acquisition module 220, and a determination module 230.
The first obtaining module 210 may be configured to obtain a domain type of the text to be processed. Reference is made to step 410 and its associated description for more content regarding the retrieval of the domain type of text to be processed.
The second obtaining module 220 may be configured to obtain a prompt text including a domain type. Reference is made to step 420 and its associated description for further content including the retrieval of field-type alert text.
The determining module 230 may be configured to process the text to be processed and the prompt text to obtain a conclusion type of the text to be processed. Reference is made to step 430 and its associated description for more content regarding the processing of the text to be processed and the prompt text, determination of the conclusion type of the text to be processed.
FIG. 3 is a block diagram of a text classification model training system according to some embodiments of the present description.
As shown in fig. 3, the text classification model training system 300 may include a sample acquisition module 310, a processing module 320, and a tuning module 330.
The sample acquisition module 310 may be configured to acquire a first type of sample text, where the first type of sample text includes sample pending text, sample prompt text, and a conclusion type tag, and the sample prompt text includes a domain type of the sample pending text. For more details regarding the acquisition of the first type of sample text refer to step 640 and its associated description.
The processing module 320 may be configured to process the sample pending text and the sample prompting text in the first type sample text through a text classification model to obtain a conclusion type prediction value corresponding to the first type sample text. For more details on the determination of conclusion type predictors refer to step 660 and its associated description.
The parametrization module 330 may be used to adjust parameters of the text classification model to reduce the variance of conclusion type predictors and conclusion type labels corresponding to the first type of sample text. For more details on parameter adjustment of the text classification model refer to step 660 and its associated description.
Fig. 4 is an exemplary flow chart of a text classification method according to some embodiments of the present description.
In some embodiments, the process 400 may be implemented by the processing device 110 and/or the text classification system 200. As shown in fig. 4, a text classification method flow 400 may include:
step 410, obtaining the domain type of the text to be processed. Specifically, step 410 may be performed by the first acquisition module 210.
The text to be processed refers to text data that needs to be processed and analyzed. The source of the text to be processed may be some articles, for example, a American store or a national center for the relevant monetary policies. Also for example, the national statistical bureau is an article of economic data. For another example, information released by a license, a silver-colored authority, etc., regulatory requirements, etc.
In some embodiments, the first obtaining module 210 may process the article clauses, and each sentence obtained is used as a piece of text to be processed. It should be appreciated that the sentences may be independent of each other rather than being sequential or organized in a particular order, and that the sentences may be from different articles or data sources.
The domain type refers to a specific category to which the text content to be processed pertains, and for example, the domain type may be policy, economy, inflation, currency, and the like.
In some embodiments, the first obtaining module 210 may process the text to be processed through a domain classification model to obtain a domain type of the text to be processed.
The domain classification model refers to a model for classifying the domain type of the text to be processed. In some embodiments, the input of the domain classification model is the text to be processed and the output of the domain classification model is the domain type of the text to be processed.
In some embodiments, the domain classification model may be, but is not limited to, a support vector machine model, a Logistic regression model, a naive bayes classification model, a gaussian distributed bayes classification model, a decision tree model, a random forest model, a KNN classification model, a neural network model, and the like.
Training of domain classification models see fig. 8 and its description.
In some embodiments, the first obtaining module 210 may also obtain the domain type of the text to be processed by other methods. For example, the first obtaining module 210 may determine the domain type of the text to be processed based on the keywords by obtaining the keywords of the text to be processed and/or the text source article to be processed. For another example, the first obtaining module 210 may obtain the domain type of the text to be processed based on the information of the distribution channel, the author, etc. of the text source article to be processed. For another example, the domain type of the text to be processed may be determined manually.
Step 420, obtaining a prompt text containing a domain type. Specifically, step 420 may be performed by the second acquisition module 220.
The hint text refers to text containing hinting content. The suggestive content may include domain type information, etc.
In some embodiments, the second acquisition module 220 may acquire a hint text template that includes a domain slot. The second acquisition module 220 may complete the construction of the alert text containing the domain type based on the alert text template.
The prompt text template refers to a predefined text structure or format, which contains some placeholders or variables for generating a specific type of prompt text. In other words, the alert text template is a generalized text framework that can be populated with specific content (e.g., domain types) as needed to generate the final alert text.
In some embodiments, the second retrieval module 220 is capable of retrieving the hint text templates from the storage device 120, the storage unit of the processing device 110, or the like. In some embodiments, the second retrieval module 220 is capable of retrieving the resulting hint text templates by reading and invoking a data interface from a storage device, database, or the like.
In some embodiments, the hint text template includes a field slot. The field slots refer to specific positions occupied by characters and/or words representing the field type in the prompt text, which are typically set in a fixed form in the corresponding prompt text template. For example, in the following 2 alert text templates "… … while is < mask > For < class > and interest rate" and "For the < class > and interest rates alike, … …, while holds < mask > processes", where "… …" indicates the location of the text to be processed, the location of "< class >" is the domain slot.
In some embodiments, the second obtaining module 220 may add a domain type to the domain slot to obtain the prompt text. For example, filling economy at the < class > position results in the hint text "… … whish is < mask > for economy and interest rate".
The prompt text containing the field type is acquired based on the prompt template, so that the acquisition process of the prompt text can be standardized and optimized, and the processing efficiency and accuracy are improved.
In some embodiments, the second obtaining module 220 may also obtain the alert text containing the domain type by other means. For example, the text to be processed is prefixed or postfixed. The prefix or suffix may be a word or phrase in a specific format that may represent a domain type, for example, "[ economy ]", "economy", etc., is preceded by the text to be processed.
The prompt text containing the field type can provide the field information of the text to be processed, and the accuracy of text processing in different fields is improved.
And step 430, processing the text to be processed and the prompt text to obtain the conclusion type of the text to be processed. In particular, step 430 may be performed by determination module 230.
The conclusion types refer to classifications that have different effects or trends on the target metrics. Target metrics may include interest rate, index of stock, etc. Common conclusion types may include well, empty, neutral, etc. The conclusion types may also include hawk, pigeon, etc.
In some embodiments, the determination module 230 may process the pending text and the prompt text through a text classification model to arrive at a conclusion type for the pending text. Specifically, the determining module 230 inputs the text to be processed and the prompt text including the domain type obtained in step 420 into a text classification model, and after the machine learning model processes the text, outputs the conclusion type of the text to be processed.
In some embodiments, the training process of the text classification model includes: the training text classification model predicts the content of the partial mask in the sample text, and the sample text comprises sample prompt text.
Masking refers to replacing certain words or marks in the sample text with special masking marks. The contents of the partial mask refer to text containing a special mask mark. In some embodiments, the content of the partial mask may be hint text. For example, a part of the hint text "while < mask > for economy and interest rate" (the word of the < mask > position) is masked.
In some embodiments, the text classification model is trained to predict the content of the partial mask in the sample text. For example, the sample text "We also see price increase, while is < mask > for economy and interest rate" is input into the text classification model, which can predict "which is good for economy and interest rate".
The content of the predictive masked partial may utilize both the above information (the portion of the sentence to the left of the masked word) and the below information (the portion of the sentence to the right of the masked word) to more fully understand the text. The text classification model can be made to learn a richer, more comprehensive language representation during the training phase.
In some embodiments, training of the text classification model is a downstream task of the pre-training model. In other words, the processing device 110 may further train the pre-training model, and fine tune parameters of the pre-training model to obtain the text classification model, so as to complete specific text task requirements.
Pretraining refers to that a producer (such as google) learns on large-scale text data, so that a model learns general characteristics and semantic representation of language, and in the pretraining stage, the model can capture context and semantic information of the text. The pre-training model has rich understanding capability and representing capability for languages through large-scale data learning. Pre-trained models common in the field of NLP (natural language processing) include RoBERTa (Robustly Optimized BERT Pretraining Approach), BERT (Bidirectional Encoder Representations from Transformers), GPT (generated Pre-trained Transformer), XLNet (eXtreme Language Understanding), and the like. The pre-training process of the pre-training model (e.g., roBERTa) includes: training the pre-training model predicts the content of the partial mask in the sample text.
It is clear that the task form of training the text classification model is the same as the task form of the pre-training model, both of which are predictive of the content of the partial mask in the sample text. By using the same task form, the universal features learned by the pre-training model on large-scale data can be effectively migrated to the text classification task, so that the performance of the text classification task is improved.
In some embodiments, the hint text template includes not only the field slots, but also the mask slots. Mask slots refer to specific positions occupied by masked characters and/or words in the hint text that are typically set in a fixed form in the corresponding hint text template. In some embodiments, mask slots in the hint text correspond to conclusion types of text to be processed, in other words, characters and/or words in the hint text representing conclusion types are hidden.
In some embodiments, the second obtaining module 220 may add a domain type to the domain slot and keep the content of the mask slot hidden, resulting in a hint text. For example, in the hint template "… … while is < mask > for < class > and interest rate", the position of "< class > is a domain slot, and the position of < mask > is a mask slot. Filling economy in the < class > position keeps the < mask > position hidden, resulting in the hint text "… … whish is < mask > for economy and interest rate".
In some embodiments, obtaining the hint text may be accomplished by a function of the corresponding hint template. Such as formulaShown, wherein->Representation of correspondence->Corresponding prompt text, < > for >Indicate->The text to be processed is striped,,/>representing a function corresponding to the j-th hint template, < ->Representing the total defined number of hint templates, +.>Is->Is a field type of (a).
Illustratively, the text to be processedIs "We also see price increase", ->For economy, prompt text template +.>For "… … whish is<mask>for<class>and interest rate ", then by a functionGet prompt text +.>Is "We also see price increase, whish is<mask>for economy and interest rate "; prompt text template->Is "For the<class>and interest rates alike, …… ,which holds<mask>Prospin ", then by the function +.>Obtaining the prompt textIs "For the economy and interest rates alike, we also see price increase, whish holders<mask>prospects”。
The method has the advantages that the prompt texts which contain the field prompt information and hide the conclusion types are used as the input of the text classification model, so that the downstream text classification task is consistent with the input form of the pre-training model, the general features learned by the pre-training model on large-scale data can be effectively migrated to the text classification task, the understanding of the text classification model on the field knowledge can be enhanced, and the text classification is more accurate. In addition, the text classification model is obtained by fine tuning the pre-training model, so that the text classification model can meet specific text task requirements, the training cost of the text classification model can be reduced, and the time and the computing resources are saved.
In some embodiments, as shown in fig. 5, the text to be processed and the prompt text are processed by a text classification model, so as to obtain a prediction vector corresponding to the mask slot.
The prediction vector corresponding to the mask slot refers to the prediction result of the corresponding mask slot expressed in the form of a vector. In some embodiments, the prediction vector is a vector of dimension 1× < vocabolar size >, where < vocabolar size > is the Vocabulary size, and the prediction vector indicates that the output corresponding to the mask slots is a probability distribution for each word in the Vocabulary.
In some embodiments, processing device 110 may determine a conclusion type for the text to be processed based on the predictive vector. In some embodiments, processing device 110 may derive a conclusion type for the prediction vector based on the tag map.
The tag mapping refers to matching and indexing original tag content (text form) with content which can be identified by a model and input and output according to a certain corresponding relation. The input of the label mapping is the original classified label, and the output is a plurality of artificially defined token which express interest, empty and the like. For example, the token into which the label "interest" may be mapped may be good, great, or excel, and as long as the output of the model is one of good, great, or excel, the model is considered to be "interest". Similarly, the token to which the label "neutral" can be mapped may be normal, impartial or neutral, and the token to which the label "empty" can be mapped may be bad, bear, or native.
It will be appreciated that the tag map can only map tags to a smaller number of tokens (the number of tokens is denoted as n), and that in practice the number of words in the vocabulary may be much greater than the number of tokens that result after the tag map. In some embodiments, the processing device 110 may map a prediction vector of dimension 1× < vocabolar size > to a vector of dimension 1×n, the 1×n vector representing a probability distribution of the prediction vector for each of the n token's semantics. In some embodiments, the processing device 110 may obtain a token with the highest probability in the 1×n vector, and use a label corresponding to the token as the conclusion type of the text to be processed.
For example, the processing device 110 may map a prediction vector to a 1×n vector with the highest probability of corresponding "bull" in the prediction vector. In the 1×n vector, the probability of the corresponding "good" is the largest, and the label corresponding to "good" is "good", so that the conclusion type corresponding to the prediction vector is "good".
In some embodiments, the determination module 230 may input the pending text as well as the prompt text into a text classification model to arrive at a conclusion type for the pending text. See fig. 5, 6 for more on text classification models.
In some embodiments, the determination module 230 may process the pending text and the prompt text by other methods to arrive at a conclusion type for the pending text. For example, the determining module 230 may obtain keywords in the text to be processed and the prompt text, and determine the conclusion type of the text to be processed by using a keyword matching method.
FIG. 6 is an exemplary schematic diagram of training of a text classification model according to some embodiments of the present description.
In some embodiments, the training process of the text classification model includes: the training text classification model predicts the content of the partial mask in the sample text, and the sample text comprises sample prompt text. In some embodiments, the text classification model is trained to predict the content of the partial mask in the sample text. For example, the sample text "We also see price increase, while is < mask > for economy and interest rate" is input into the text classification model, which can predict "which is good for economy and interest rate".
As shown in fig. 6, training of the text classification model includes:
in step 610, unlabeled data is predicted.
The original dataset, for example, the original currency policy dataset may be divided into two parts, namely a first sample set D and a second sample set U, depending on whether there is a label.
The first sample set D contains sample data that has been manually or mechanically labeled or reviewed by an expert, and each sample (first type sample text) in the first sample set D has a corresponding field type tag and conclusion type tag. The domain type tag refers to a tag indicating a domain type of a training sample. The conclusion type label refers to a label that classifies the category of information expressed in the training sample. Conclusion type tags are used to represent the tendency or attitudes of a training sample to a particular thing or topic. For example, the conclusion type label may be interest, hollow, or neutral to the interest rate, or may be a tendency of hawk or pigeon, etc.
The second sample set U refers to sample data that is not marked or audited, and each sample (second type of sample text) in the second sample set U is not marked with a field type and a conclusion type, in other words, the second type of sample text only includes the text to be processed by the sample.
In a real scene, the training data needs to be expanded because the labeling data is very limited. In some embodiments, the processing device 110 may obtain the second type of sample text and make predictions about the domain type and conclusion type of the second type of sample text to augment the training samples to make the training data of the text classification model sufficient.
The second type of sample text can be unlabeled clauses in the training set, text which is just extracted from a data source and is not added into the training set, and near-sense and/or antisense text of the sample to-be-processed text generated by means of word replacement, phrase recombination, language model generation and the like based on the sample to-be-processed text in the first sample set. In some embodiments, the second set of samples is from the same data source as the first set of samples.
In some embodiments, the processing device 110 is capable of retrieving the second type of sample text from the storage device 120, a memory unit of the processing device 110, or the like. In some embodiments, the processing device 110 may obtain the second type of sample text by reading from a storage device, database, invoking a data interface, or the like.
In some embodiments, the processing device 110 processes the second type of sample text through the trained domain classification model to obtain a domain type predictor for the second type of sample text. Specifically, for the second sample setThe sample pending text set is +.>. Obtaining a domain type predicted value by using a domain classification model and using the domain type predicted value as a domain type label of a second type sample text +. >
In some embodiments, processing device 110 obtains the conclusion type predictor for the second type of sample text via the trained prompt classification model. Input, output, structure, etc. of the prompt classification model are shown in fig. 7 and description thereof, and training of the prompt classification model is shown in fig. 9 and description thereof.
Step 620, result fusion.
In some embodiments, the processing device 110 may process the sample pending text and the sample prompt text of the corresponding second type of sample text, respectively, using more than two prompt classification models to obtain the conclusion type soft tag.
In some embodiments, soft labels are used only to distinguish between genuine labels, but their content is consistent with labels having annotated swatches. In still other embodiments, a soft tag refers to a tag in the form of a probability distribution that, unlike a hard tag (i.e., containing only a certain class), assigns a probability value to each conclusion type indicating the size of the likelihood that the second class sample text belongs to that class. For example, a label or hard label of conclusion type for a sample of text to be processed is "good", a soft label for the sample of text to be processed may be [0.8, 0.1, 0.1], indicating that the sample of text to be processed has a probability of 80% of "good", a probability of 10% of "neutral", and a probability of 10% of "empty".
Compared with a hard tag, the mode of using the soft tag can improve model generalization and robustness, and alleviate error propagation problems caused by data enhancement.
In some embodiments, processing device 110 may obtain more than two hint text templates that respectively correspond to more than two hint classification models; the prompt text template comprises a field slot and a mask slot, and the mask slot corresponds to the conclusion type. In some embodiments, processing device 110 may refer to the method in step 420 for obtaining more than two hint text templates corresponding to more than two hint classification models, respectively.
In some embodiments, the processing device 110 may add the domain types of the corresponding sample to-be-processed text to the domain slots of more than two alert text templates, respectively, to obtain sample alert texts corresponding to the second type of sample texts of different alert classification models. In some embodiments, processing device 110 may refer to the method in step 420 for sample prompt text for the second type of sample text based on the prompt text template.
In some embodiments, the two or more alert text templates corresponding to the two or more alert classification models, respectively, have different keywords and/or orderings. In some embodiments, the hint text template of the text classification model may be the same as or different from the hint text template of the hint classification model.
In some embodiments, the processing device 110 may fuse the two or more conclusion type predictors obtained by the two or more prompt classification models in a weighted manner, and store the fusion result as a conclusion type soft tag, so as to obtain new labeling data. For example, for unlabeled sentencesSentence->Corresponding conclusion type soft tag->Can be expressed by the formulaObtained by (1), wherein->Is a weighting coefficient.
In some embodiments, the processing device 110 may determine the weighting coefficients by averaging, or may rely on experimental results to determine the weighting coefficients. For example, if it is found experimentally that the prompt effect of a part of the prompt classification model is better, the output of the part of the prompt classification model is given a larger weighting coefficient.
Because more than two prompt classification models are respectively trained on independent prompts, no interaction exists between the prompts, and the prompts are noise, prediction results of several prompt classification models are required to be combined to ensure high accuracy.
Step 630, the data sets are merged.
The second sample set labeled in step 620Can be expressed as sentence +.>Label->
In some embodiments, processing device 110 may compare the second set of annotated samples to the second set of annotated samples Is +.>Merging, obtaining a merged dataset->. Merging data sets +.>The sentences in areExpressed as->Conclusion type label isExpressed as->The domain type label isExpressed as->
At step 640, sample prompt text is obtained.
In some embodiments, for a first type of sample textThe domain information is directly tagged by the domain type tag->Given. Sample acquisition module 310 may apply a hint mapping function +.>And obtaining a sample prompt text.
In some embodiments, the sample acquisition module 310 may acquire a hint text template corresponding to the text classification model, the hint text template including a domain slot and a mask slot, the mask slot corresponding to a conclusion type; and adding the field type of the text to be processed of the corresponding sample in the field slot to obtain a sample prompt text of the first type of sample text. For example, the sample acquisition module 310 may obtain the sample prompt text of the first type of sample text based on the prompt text template corresponding to the text classification model using the same or similar method in step 420.
In some embodiments, the domain type of the sample to be processed text in the first type training sample or the second type training sample is obtained by processing the sample to be processed text in the first type training sample or the second type training sample through a domain classification model. For the first type sample text, the domain type label can be marked manually and/or by domain Classification modelAnd (5) predicting to obtain the final product.
For the second type of sample textThe domain information is represented by domain classification model->Predictive finding, apply hint mapping function +.>Sample prompt text may be obtained.
Step 650, defining a text classification loss function.
In some embodiments, the text classification model processes the sample pending text and the sample prompting text in the first type of sample text to obtain a prediction vector corresponding to the mask slot, and a text prediction vector corresponding to the sample pending text and the sample prompting text, such as a CLS vector output by the BERT model. In some embodiments, processing device 110 may determine a conclusion type predictor corresponding to the first type of sample text based on the predictor vector using the same or similar method as in step 430. Processing device 110 determines a domain type predictor based on the text predictor vector.
In some embodiments, a first term of the text classification loss function reflects differences in conclusion type predictors and conclusion type labels corresponding to the first type of sample text and/or the second type of sample text, and a second term reflects differences in domain type predictors and corresponding domain type labels corresponding to the first type of sample text and/or the second type of sample text. For example, processing device 110 may construct a text classification loss function based on the domain classification loss function and the conclusion classification loss function. For example, text classification loss function Can be as formula->Shown, wherein->In order to control the coefficient of the power consumption,。/>classifying the loss function for the domain, which can be expressed as +.>。/>To conclude the classification loss function, it can be expressed as: />
Wherein, the liquid crystal display device comprises a liquid crystal display device,representing a cross entropy loss function, ">For text classification model->Output of->Representing a domain type label->Representing a conclusion type label.
By taking the domain classification loss function as a part of the text classification loss function, the text classification model can refer to the text information and the text domain information during training, so that the accuracy of judging the text domain by the text classification model is ensured, and the accuracy of the conclusion type output by the text classification model is ensured.
In some embodiments, a regularization term is also included in the text classification loss function. The regular term is used for restraining the model by adding a penalty term in the loss function, so that the numerical range of parameters is reduced, and the overfitting risk of the model is reduced.
In some embodiments, the random inactivation term and the regular term can be used in combination, so that the generalization capability of the model can be improved more effectively, and the problem of overfitting is relieved.
Step 660, training a text classification model.
In some embodiments, the processing device 110 may be based on the merged dataset The training sample (including the first type sample text and/or the second type sample text after labeling) in the training text classification model.
For example, the processing module 320 may process the sample pending text and the sample prompt text in the training samples via a text classification model to obtain a conclusion type predictor corresponding to the training samples. The tuning module 330 may adjust parameters of the text classification model to reduce the variance of conclusion type predictors and conclusion type labels corresponding to the training samples.
In some embodiments, the parameter tuning module 330 may be configured to combine the data setsAnd performing iterative training on the initial text classification model for a plurality of times to obtain a trained text classification model. The method for iterative training can comprise the following steps: and calculating the gradient of the text classification loss function, and iteratively updating parameters of the text classification model by a gradient descent method to reduce the difference between the conclusion type predicted value and the conclusion type label. The gradient descent method may include a standard gradient descent method, a random gradient descent method, and the like. A variety of learning rate decay strategies may be employed in iterative training, such as piecewise decay, reverse time decay, exponential decay, adaptive decay, and the like. When the iteration termination condition is satisfied, the iterative training may be ended. The iteration termination condition may include the text classification loss function converging or being smaller than a preset threshold, the iteration round reaching a pre-determined value Setting the times and the like.
In some embodiments, to further improve accuracy and robustness of the model, the parameter tuning module 330 may adjust the learning rate according to experience and/or requirements, so as to train to obtain a final text classification model.
FIG. 7 is a schematic diagram of a prompt classification model according to some embodiments of the present description.
As shown in fig. 7, the inputs of the prompt classification model are the text to be processed and the prompt text, and the output of the prompt classification model includes the domain type of the text to be processed and the conclusion type of the text to be processed. The prompt classification model includes a natural language processing model, a linear layer, random inactivation terms, activation functions, and the like.
Natural language processing models are used for various machine learning and deep learning models of natural language processing tasks. In some embodiments, the natural language processing model may include a Pre-trained language model (Pre-trained Language Models), such as RoBERTa, BERT, GPT and XLNet, among others. In some embodiments, the natural language processing model may further include: bag of Words Model (Bag-of-Words Model), word embedding Model (Word Embedding Models), long Short Term Memory network (LSTM), etc.
The linear layer may multiply the input data with a weight matrix and add a bias vector. In some embodiments, the linear layer is connected to a natural language processing model, and the output of the natural language processing model can be mapped to a final task output space, such as emotion classification or named entity recognition.
The random inactivation term refers to that a part of neurons are randomly discarded in the process of network training, so that the dependency relationship among the neurons is reduced, and the generalization performance is improved. By randomly discarding a portion of the neurons, the network is forced to learn combinations of different features that appear in the sample, thereby reducing the dependency between neurons and preventing overfitting from occurring.
The activation function is a nonlinear conversion function. Common activation functions include: sigmoid functions, reLU functions, activation functions, tanh functions, softmax functions, etc. The function of the activation function is to introduce non-linear characteristics that enable the model to learn and represent complex data patterns and relationships to increase the expressive power and non-linear fitting power of the model.
The main task of the prompt classification model is conclusion classification, for example, as shown on the right side of fig. 7, the conclusion type of the text to be processed can be obtained through the processing of a natural language processing model, a linear layer, a random inactivation term and an activation function. However, in order to make the prompt classification model fully understand the domain knowledge, a domain classification subtask may be added to the prompt classification model, for example, as shown in the left side of fig. 7, through processing of a natural language processing model, a linear layer, a random inactivation term and an activation function, the prompt classification model may also output a domain type of the text to be processed.
In some embodiments, the text classification model has the same model structure as the prompt classification model. In some embodiments, processing device 110 may train a plurality of alert classification models, respectively, according to different alert templates, and identify conclusion type tags for sample pending text (e.g., sample pending text in a second type of sample text) using the trained plurality of alert classification models. In some embodiments, processing device 110 may train a text classification model using sample text annotated by a plurality of prompt classification models.
FIG. 8 is an exemplary schematic diagram of domain classification model training according to some embodiments of the present description.
As shown in fig. 8, in some embodiments, the training process of the domain classification model includes the steps of:
step 810, data preparation.
In some embodiments, the processing device 110 trains the domain classification model based on the first sample set D. Each sample (first type sample text) in the first sample set D has a corresponding domain type tag and conclusion type tag.
First, the processing device 110 performs data cleansing on the original data set, the data cleansing including removal of special characters, space replacement, tag inspection, and the like. The label checking refers to performing error screening on label content of the label, and comprises checking whether characters of the label have errors or are missing, and not checking logic relation between the label and a sample. In some embodiments, the text content of the label can be automatically screened through the machine language script, and if a problem exists, the label is finally approved by a labeling person.
First sample set after washingCan be expressed as a set +.>Wherein->Representing +.>Each sentence, n is the length of the marked data set; the corresponding tag may be represented as the set +.>Wherein、/>Respectively represent +.>The domain type label and the conclusion type label corresponding to the sentences.
Second, the processing device 110 sets the first sample setRandomly split into training and test sets, denoted +.>And->Training set is used for training field classification model, and test set is used forThe performance and generalization ability of the domain classification model is evaluated. In some embodiments, the processing device 110 may +.a.a first sample set in a ratio of 70:30 or 80:20>The partitioning is performed such that the training set is 70% or 80% of the total data set and the test set is 30% or 20% of the labeled data set. When the division ratio is 80:20, < + >>And->The length of (2) is>And +.>
Step 820, data input.
Since the domain classification model only relates to domain classification, it does not relate to conclusion classification. Therefore, in training of the domain classification model, only domain type labels of training samples are used
The processing device 110 will process the samples in the training setAnd the corresponding tag->. And inputting a domain classification model.
Step 830, define a domain classification loss function.
In some embodiments, a cross entropy loss function, a mean square error loss function, or the like may be used in training of domain classification.
For example, set for sentencesDomain classification model->Output is->,/>Representing a cross entropy loss function. The domain classification loss function is: />
In some embodiments, the domain classification loss function further includes a regularization term. The regular term is used for restraining the model by adding a penalty term in the loss function, so that the numerical range of parameters is reduced, and the overfitting risk of the model is reduced.
In some embodiments, the random inactivation term and the regular term can be used in combination, so that the generalization capability of the model can be improved more effectively, and the problem of overfitting is relieved.
Step 840, training a domain classification model.
In some embodiments, processing device 110 may adjust parameters of the domain classification model based on the domain classification loss function to reduceAnd->Differences between them. For example, the domain classification loss function is reduced or minimized by continuously adjusting parameters of the domain classification model.
In some embodiments, the processing device 110 may obtain the prompt text of the sample pending text, which includes the field type of the sample pending text, by the same or similar method as step 420.
FIG. 9 is an exemplary diagram of prompt classification model training, shown in accordance with some embodiments of the present description.
In some embodiments, the prompt classification model includes a training process comprising: training a prompt classification model to predict the content of a part of the mask in a sample text, wherein the sample text comprises a sample prompt text; the prompt text template comprises a field slot and a mask slot, and the mask slot corresponds to the conclusion type.
Step 910, a sample prompt text is obtained.
In some embodiments, the processing device 110 may obtain, by the same or similar method as step 420, a prompt text of the sample text, the prompt text including a domain type of the sample text, the mask slot in the prompt text corresponding to a conclusion type, i.e., the conclusion type is hidden.
Step 920, defining a hint class loss function.
In some embodiments, processing device 110 may construct a hint class loss function based on the domain class loss function and the conclusion class loss function. For example, prompt class loss functionCan be as formulaShown, wherein->For controlling the coefficient +.>。/>Classifying the loss function for the domain, which can be expressed as +.>。/>To conclude the classification loss function, it can be expressed as:
wherein, the liquid crystal display device comprises a liquid crystal display device, Representing a cross entropy loss function, ">For prompting classification model->Output of->Representing a domain type label->Representing a conclusion type label.
By taking the domain classification loss function as a part of the prompt classification loss function, the prompt classification model can refer to the text information and the text domain information during training, so that the accuracy of judging the sentence domain by the prompt classification model is ensured, and the accuracy of the interest rate type output by the prompt classification model is ensured.
In some embodiments, a canonical term is also included in the hint classification loss function. The regular term is used for restraining the model by adding a penalty term in the loss function, so that the numerical range of parameters is reduced, and the overfitting risk of the model is reduced.
In some embodiments, the random inactivation term and the regular term can be used in combination, so that the generalization capability of the model can be improved more effectively, and the problem of overfitting is relieved.
Step 930, training a prompt classification model.
In some embodiments, processing device 110 may adjust parameters of the hint classification model based on the hint classification loss function to reduceAnd->Difference and->And->Differences between them. For example, the hint classification loss function is reduced or minimized by continually adjusting parameters of the hint classification model.
In some embodiments, processing device 110 may train multiple hint classification models separately according to different hint templatesWherein m is a positive integer. In some embodiments, the trained prompt classification model may be used for conclusion prediction, augmenting training samples, and the like.
It should be noted that the above description of the flow is provided for illustrative purposes only and is not intended to limit the scope of the present description. Various changes and modifications may be made by one of ordinary skill in the art in light of the description herein. However, such changes and modifications do not depart from the scope of the present specification. The operational schematic of the process presented above is illustrative. In some embodiments, the above-described processes may be accomplished with one or more additional operations not described and/or one or more operations not discussed. For example, the flow may be stored in a storage device (e.g., storage device 150, a memory unit of a system) in the form of a program or instructions that, when executed by processing device 110 and/or text classification system 200, may implement the flow. In addition, the order of the operations of the flows shown in the figures and described above is not limiting.
Possible benefits of embodiments of the present description include, but are not limited to: (1) The field knowledge is brought to the model by using the prompt words, and a field classification loss function is added to the model, so that the understanding of the model to the field knowledge is enhanced, and the text classification is more accurate; (2) Aligning a downstream classification task target with a pre-training task through a prompt template, and solving the problem of inconsistent pre-training and fine-tuning so as to improve the efficiency and effect of model training; (3) And new data with soft labels are introduced by semi-supervised learning, and meanwhile, the accuracy and the robustness of the model are improved.
It should be noted that, the advantages that may be generated by different embodiments may be different, and in different embodiments, the advantages that may be generated may be any one or a combination of several of the above, or any other possible advantages that may be obtained.
While the basic concepts have been described above, it will be apparent to those skilled in the art that the foregoing detailed disclosure is by way of example only and is not intended to be limiting. Although not explicitly described herein, various modifications, improvements, and adaptations to the present disclosure may occur to one skilled in the art. Such modifications, improvements, and modifications are intended to be suggested within this specification, and therefore, such modifications, improvements, and modifications are intended to be included within the spirit and scope of the exemplary embodiments of the present invention.
Meanwhile, the specification uses specific words to describe the embodiments of the specification. Reference to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic is associated with at least one embodiment of the present description. Thus, it should be emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various positions in this specification are not necessarily referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the present description may be combined as suitable.
Furthermore, those skilled in the art will appreciate that the various aspects of the specification can be illustrated and described in terms of several patentable categories or circumstances, including any novel and useful procedures, machines, products, or materials, or any novel and useful modifications thereof. Accordingly, aspects of the present description may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as a "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the specification may take the form of a computer product, comprising computer-readable program code, embodied in one or more computer-readable media.
The computer storage medium may contain a propagated data signal with the computer program code embodied therein, for example, on a baseband or as part of a carrier wave. The propagated signal may take on a variety of forms, including electro-magnetic, optical, etc., or any suitable combination thereof. A computer storage medium may be any computer readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated through any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or a combination of any of the foregoing.
The computer program code necessary for operation of portions of the present description may be written in any one or more programming languages, including an object oriented programming language such as Java, scala, smalltalk, eiffel, JADE, emerald, C ++, c#, vb net, python and the like, a conventional programming language such as C language, visual Basic, fortran2003, perl, COBOL2002, PHP, ABAP, a dynamic programming language such as Python, ruby and Groovy, or other programming languages and the like. The program code may execute entirely on the user's computer or as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or processing device. In the latter scenario, the remote computer may be connected to the user's computer through any form of network, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or the use of services such as software as a service (SaaS) in a cloud computing environment.
Furthermore, the order in which the elements and sequences are processed, the use of numerical letters, or other designations in the description are not intended to limit the order in which the processes and methods of the description are performed unless explicitly recited in the claims. While certain presently useful inventive embodiments have been discussed in the foregoing disclosure, by way of various examples, it is to be understood that such details are merely illustrative and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements included within the spirit and scope of the embodiments of the present disclosure. For example, while the system components described above may be implemented by hardware devices, they may also be implemented solely by software solutions, such as installing the described system on an existing processing device or mobile device.
Likewise, it should be noted that in order to simplify the presentation disclosed in this specification and thereby aid in understanding one or more inventive embodiments, various features are sometimes grouped together in a single embodiment, figure, or description thereof. This method of disclosure, however, is not intended to imply that more features than are presented in the claims are required for the present description. Indeed, less than all of the features of a single embodiment disclosed above.
In some embodiments, numbers describing the components, number of attributes are used, it being understood that such numbers being used in the description of embodiments are modified in some examples by the modifier "about," approximately, "or" substantially. Unless otherwise indicated, "about," "approximately," or "substantially" indicate that the number allows for a 20% variation. Accordingly, in some embodiments, numerical parameters set forth in the specification and claims are approximations that may vary depending upon the desired properties sought to be obtained by the individual embodiments. In some embodiments, the numerical parameters should take into account the specified significant digits and employ a method for preserving the general number of digits. Although the numerical ranges and parameters set forth herein are approximations that may be employed in some embodiments to confirm the breadth of the range, in particular embodiments, the setting of such numerical values is as precise as possible.
Each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., referred to in this specification is incorporated herein by reference in its entirety. Except for application history documents that are inconsistent or conflicting with the content of this specification, documents that are currently or later attached to this specification in which the broadest scope of the claims to this specification is limited are also. It is noted that, if the description, definition, and/or use of a term in an attached material in this specification does not conform to or conflict with what is described in this specification, the description, definition, and/or use of the term in this specification controls.
Finally, it should be understood that the embodiments described in this specification are merely illustrative of the principles of the embodiments of this specification. Other variations are possible within the scope of this description. Thus, by way of example, and not limitation, alternative configurations of embodiments of the present specification may be considered as consistent with the teachings of the present specification. Accordingly, the embodiments of the present specification are not limited to only the embodiments explicitly described and depicted in the present specification.

Claims (18)

1. A method of text classification, the method comprising:
acquiring the field type of a text to be processed;
acquiring a prompt text containing the field type;
and processing the text to be processed and the prompt text through a text classification model to obtain the conclusion type of the text to be processed.
2. The method of claim 1, wherein the obtaining the domain type of the text to be processed comprises:
and processing the text to be processed through a domain classification model to obtain the domain type of the text to be processed.
3. The method of claim 1, the obtaining the alert text containing the domain type comprising:
acquiring a prompt text template, wherein the prompt text template comprises a field slot position;
and adding the domain type into the domain slot to obtain the prompt text.
4. The method of claim 1, wherein the training process of the text classification model comprises:
training the text classification model predicts the content of the partial mask in the sample text, wherein the sample text comprises sample prompt text.
5. The method of claim 4, the obtaining the alert text containing the domain type comprising:
acquiring a prompt text template, wherein the prompt text template comprises a field slot position and a mask slot position, and the mask slot position corresponds to a conclusion type;
and adding the domain type into the domain slot to obtain the prompt text.
6. The method according to claim 5, wherein the processing the text to be processed and the prompt text through the text classification model to obtain the conclusion type of the text to be processed comprises:
processing the text to be processed and the prompt text through a text classification model to obtain a prediction vector corresponding to the mask slot position;
and determining the conclusion type of the text to be processed based on the prediction vector.
7. A text classification model training method, the method comprising:
acquiring a first type sample text, wherein the first type sample text comprises a sample to-be-processed text, a sample prompt text, a field type tag and a conclusion type tag, and the sample prompt text comprises the field type of the sample to-be-processed text;
Processing the sample to-be-processed text and the sample prompt text in the first type sample text through the text classification model to obtain a conclusion type predicted value corresponding to the first type sample text;
parameters of the text classification model are adjusted to reduce differences in conclusion type predictors and the conclusion type labels corresponding to the first type of sample text.
8. The method of claim 7, wherein the training process of the text classification model comprises:
training the text classification model predicts the content of the partial mask in the sample text, wherein the sample text comprises sample prompt text.
9. The method of claim 7, obtaining a first type of sample text, comprising:
acquiring a prompt text template corresponding to a text classification model, wherein the prompt text template comprises a field slot position and a mask slot position, and the mask slot position corresponds to a conclusion type;
and adding the field type of the text to be processed of the corresponding sample in the field slot to obtain the sample prompt text of the first type sample text.
10. The method of claim 9, wherein the processing, by the text classification model, the sample pending text and the sample prompting text in the first type sample text to obtain a conclusion type predictor corresponding to the first type sample text comprises:
Processing a sample to-be-processed text and a sample prompt text in the first type of sample text through a text classification model to obtain a prediction vector corresponding to the mask slot position;
a conclusion type predictor corresponding to the first type of sample text is determined based on the predictor vector.
11. The method of claim 9, the adjusting parameters of the text classification model to reduce differences in conclusion type predictors and the conclusion type labels corresponding to a first type of sample text, comprising:
processing a sample to-be-processed text and a sample prompt text in the first type of sample text through a text classification model to obtain a text prediction vector;
determining a domain type predictor corresponding to a first type of sample text based on the text predictor vector;
constructing a loss function, wherein a first item of the loss function reflects the difference between a field type predicted value corresponding to a first type sample text and the field type label, and a second item reflects the difference between a conclusion type predicted value corresponding to the first type sample text and the conclusion type label;
parameters of the text classification model are adjusted to reduce the first term and the second term.
12. The method of claim 7, further comprising:
Obtaining a second type sample text, wherein the second type sample text comprises a sample to-be-processed text;
acquiring more than two prompt text templates respectively corresponding to more than two prompt classification models; the training process of the prompt classification model comprises the following steps: training the prompt classification model to predict the content of a part of the mask in a sample text, wherein the sample text comprises a sample prompt text; the prompt text template comprises a field slot position and a mask slot position, wherein the mask slot position corresponds to the conclusion type;
adding field types of the texts to be processed of the corresponding samples into the field slots of more than two prompt text templates respectively to obtain sample prompt texts of second-type sample texts corresponding to different prompt classification models;
respectively processing a sample to-be-processed text and a sample prompt text of a corresponding second type sample text by using the more than two prompt classification models to obtain a conclusion type soft label;
processing the sample to-be-processed text and the sample prompt text in the second type sample text through the text classification model to obtain a conclusion type predicted value corresponding to the second type sample text;
parameters of the text classification model are adjusted to reduce differences in conclusion type predictors and conclusion type soft labels corresponding to the second type of sample text.
13. The method of claim 12, wherein the two or more alert text templates corresponding to the two or more alert classification models, respectively, have different keywords and/or orderings.
14. The method according to claim 7 or 12, wherein the domain type of the sample to be processed text in the first type of training sample or in the second type of training sample is obtained by processing the sample to be processed text in the first type of training sample or in the second type of training sample by a domain classification model.
15. A text classification system, the system comprising:
the first acquisition module is used for acquiring the field type of the text to be processed;
the second acquisition module is used for acquiring prompt texts containing the field types;
and the determining module is used for processing the text to be processed and the prompt text through a text classification model to obtain the conclusion type of the text to be processed.
16. A computer readable storage medium storing computer instructions which, when read by a computer in the storage medium, perform the method of text classification according to any one of claims 1 to 6.
17. A text classification model training system, the system comprising:
The sample acquisition module is used for acquiring a first type of sample text, wherein the first type of sample text comprises a sample to-be-processed text, a sample prompt text and a conclusion type label, and the sample prompt text comprises the field type of the sample to-be-processed text;
the processing module is used for processing the sample to-be-processed text and the sample prompt text in the first type sample text through the text classification model to obtain a conclusion type predicted value corresponding to the first type sample text;
and the parameter adjusting module is used for adjusting parameters of the text classification model to reduce the difference between the conclusion type predicted value corresponding to the first type sample text and the conclusion type label.
18. A computer readable storage medium storing computer instructions which, when read by a computer in the storage medium, perform the method of text classification model training of any of claims 7 to 14.
CN202311028049.3A 2023-08-16 2023-08-16 Text classification method, system and storage medium Active CN116738298B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311028049.3A CN116738298B (en) 2023-08-16 2023-08-16 Text classification method, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311028049.3A CN116738298B (en) 2023-08-16 2023-08-16 Text classification method, system and storage medium

Publications (2)

Publication Number Publication Date
CN116738298A true CN116738298A (en) 2023-09-12
CN116738298B CN116738298B (en) 2023-11-24

Family

ID=87919075

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311028049.3A Active CN116738298B (en) 2023-08-16 2023-08-16 Text classification method, system and storage medium

Country Status (1)

Country Link
CN (1) CN116738298B (en)

Citations (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101876985A (en) * 2009-11-26 2010-11-03 西北工业大学 WEB text sentiment theme recognizing method based on mixed model
JP2014056433A (en) * 2012-09-12 2014-03-27 Multi Solution Co Ltd Debate type web contribution program and system
GB201803464D0 (en) * 2018-03-04 2018-04-18 Cp Connections Ltd Ability classification
CN108446813A (en) * 2017-12-19 2018-08-24 清华大学 A kind of method of electric business service quality overall merit
CN109543031A (en) * 2018-10-16 2019-03-29 华南理工大学 A kind of file classification method based on multitask confrontation study
CN109800418A (en) * 2018-12-17 2019-05-24 北京百度网讯科技有限公司 Text handling method, device and storage medium
CN110032736A (en) * 2019-03-22 2019-07-19 深兰科技(上海)有限公司 A kind of text analyzing method, apparatus and storage medium
JP2019160134A (en) * 2018-03-16 2019-09-19 株式会社日立製作所 Sentence processing device and sentence processing method
CN111125354A (en) * 2018-10-31 2020-05-08 北京国双科技有限公司 Text classification method and device
CN111274823A (en) * 2020-01-06 2020-06-12 科大讯飞(苏州)科技有限公司 Text semantic understanding method and related device
CN111428510A (en) * 2020-03-10 2020-07-17 蚌埠学院 Public praise-based P2P platform risk analysis method
CN111444709A (en) * 2020-03-09 2020-07-24 腾讯科技(深圳)有限公司 Text classification method, device, storage medium and equipment
CN112395417A (en) * 2020-11-18 2021-02-23 长沙学院 Network public opinion evolution simulation method and system based on deep learning
CN112395414A (en) * 2019-08-16 2021-02-23 北京地平线机器人技术研发有限公司 Text classification method and training method, device, medium and equipment of classification model
CN112434166A (en) * 2020-12-17 2021-03-02 网易传媒科技(北京)有限公司 Text classification method, device and equipment based on timeliness and storage medium
CN113569001A (en) * 2021-01-29 2021-10-29 腾讯科技(深圳)有限公司 Text processing method and device, computer equipment and computer readable storage medium
CN113672731A (en) * 2021-08-02 2021-11-19 北京中科闻歌科技股份有限公司 Emotion analysis method, device and equipment based on domain information and storage medium
CN113821590A (en) * 2021-06-15 2021-12-21 腾讯科技(深圳)有限公司 Text type determination method, related device and equipment
CN113961705A (en) * 2021-10-29 2022-01-21 聚好看科技股份有限公司 Text classification method and server
CN114676255A (en) * 2022-03-29 2022-06-28 腾讯科技(深圳)有限公司 Text processing method, device, equipment, storage medium and computer program product
CN114942994A (en) * 2022-06-17 2022-08-26 平安科技(深圳)有限公司 Text classification method, text classification device, electronic equipment and storage medium
CN115080750A (en) * 2022-08-16 2022-09-20 之江实验室 Weak supervision text classification method, system and device based on fusion prompt sequence
CN115310425A (en) * 2022-10-08 2022-11-08 浙江浙里信征信有限公司 Policy text analysis method based on policy text classification and key information identification
CN115409017A (en) * 2022-09-02 2022-11-29 中国银行股份有限公司 Customer service communication text mining method and system, electronic equipment and storage medium
CN115495744A (en) * 2022-10-10 2022-12-20 北京天融信网络安全技术有限公司 Threat information classification method, device, electronic equipment and storage medium
CN115688414A (en) * 2022-10-27 2023-02-03 北京理工大学 False news detection method with theme embedded multi-mask prompt template
CN115994225A (en) * 2023-03-20 2023-04-21 北京百分点科技集团股份有限公司 Text classification method and device, storage medium and electronic equipment
CN116152840A (en) * 2023-03-10 2023-05-23 京东方科技集团股份有限公司 File classification method, apparatus, device and computer storage medium
CN116304717A (en) * 2023-05-09 2023-06-23 北京搜狐新媒体信息技术有限公司 Text classification method and device, storage medium and electronic equipment
CN116304014A (en) * 2022-12-07 2023-06-23 阿里巴巴(中国)有限公司 Method for training entity type recognition model, entity type recognition method and device
CN116383382A (en) * 2023-03-15 2023-07-04 北京百度网讯科技有限公司 Sensitive information identification method and device, electronic equipment and storage medium

Patent Citations (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101876985A (en) * 2009-11-26 2010-11-03 西北工业大学 WEB text sentiment theme recognizing method based on mixed model
JP2014056433A (en) * 2012-09-12 2014-03-27 Multi Solution Co Ltd Debate type web contribution program and system
CN108446813A (en) * 2017-12-19 2018-08-24 清华大学 A kind of method of electric business service quality overall merit
GB201803464D0 (en) * 2018-03-04 2018-04-18 Cp Connections Ltd Ability classification
JP2019160134A (en) * 2018-03-16 2019-09-19 株式会社日立製作所 Sentence processing device and sentence processing method
CN109543031A (en) * 2018-10-16 2019-03-29 华南理工大学 A kind of file classification method based on multitask confrontation study
CN111125354A (en) * 2018-10-31 2020-05-08 北京国双科技有限公司 Text classification method and device
CN109800418A (en) * 2018-12-17 2019-05-24 北京百度网讯科技有限公司 Text handling method, device and storage medium
CN110032736A (en) * 2019-03-22 2019-07-19 深兰科技(上海)有限公司 A kind of text analyzing method, apparatus and storage medium
CN112395414A (en) * 2019-08-16 2021-02-23 北京地平线机器人技术研发有限公司 Text classification method and training method, device, medium and equipment of classification model
CN111274823A (en) * 2020-01-06 2020-06-12 科大讯飞(苏州)科技有限公司 Text semantic understanding method and related device
CN111444709A (en) * 2020-03-09 2020-07-24 腾讯科技(深圳)有限公司 Text classification method, device, storage medium and equipment
CN111428510A (en) * 2020-03-10 2020-07-17 蚌埠学院 Public praise-based P2P platform risk analysis method
CN112395417A (en) * 2020-11-18 2021-02-23 长沙学院 Network public opinion evolution simulation method and system based on deep learning
CN112434166A (en) * 2020-12-17 2021-03-02 网易传媒科技(北京)有限公司 Text classification method, device and equipment based on timeliness and storage medium
CN113569001A (en) * 2021-01-29 2021-10-29 腾讯科技(深圳)有限公司 Text processing method and device, computer equipment and computer readable storage medium
CN113821590A (en) * 2021-06-15 2021-12-21 腾讯科技(深圳)有限公司 Text type determination method, related device and equipment
CN113672731A (en) * 2021-08-02 2021-11-19 北京中科闻歌科技股份有限公司 Emotion analysis method, device and equipment based on domain information and storage medium
CN113961705A (en) * 2021-10-29 2022-01-21 聚好看科技股份有限公司 Text classification method and server
CN114676255A (en) * 2022-03-29 2022-06-28 腾讯科技(深圳)有限公司 Text processing method, device, equipment, storage medium and computer program product
CN114942994A (en) * 2022-06-17 2022-08-26 平安科技(深圳)有限公司 Text classification method, text classification device, electronic equipment and storage medium
CN115080750A (en) * 2022-08-16 2022-09-20 之江实验室 Weak supervision text classification method, system and device based on fusion prompt sequence
CN115409017A (en) * 2022-09-02 2022-11-29 中国银行股份有限公司 Customer service communication text mining method and system, electronic equipment and storage medium
CN115310425A (en) * 2022-10-08 2022-11-08 浙江浙里信征信有限公司 Policy text analysis method based on policy text classification and key information identification
CN115495744A (en) * 2022-10-10 2022-12-20 北京天融信网络安全技术有限公司 Threat information classification method, device, electronic equipment and storage medium
CN115688414A (en) * 2022-10-27 2023-02-03 北京理工大学 False news detection method with theme embedded multi-mask prompt template
CN116304014A (en) * 2022-12-07 2023-06-23 阿里巴巴(中国)有限公司 Method for training entity type recognition model, entity type recognition method and device
CN116152840A (en) * 2023-03-10 2023-05-23 京东方科技集团股份有限公司 File classification method, apparatus, device and computer storage medium
CN116383382A (en) * 2023-03-15 2023-07-04 北京百度网讯科技有限公司 Sensitive information identification method and device, electronic equipment and storage medium
CN115994225A (en) * 2023-03-20 2023-04-21 北京百分点科技集团股份有限公司 Text classification method and device, storage medium and electronic equipment
CN116304717A (en) * 2023-05-09 2023-06-23 北京搜狐新媒体信息技术有限公司 Text classification method and device, storage medium and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
於雯;周武能;: "基于LSTM的商品评论情感分析", 计算机系统应用, vol. 27, no. 08, pages 159 - 163 *

Also Published As

Publication number Publication date
CN116738298B (en) 2023-11-24

Similar Documents

Publication Publication Date Title
Sun et al. How to fine-tune bert for text classification?
Bhardwaj et al. Sentiment analysis for Indian stock market prediction using Sensex and nifty
AU2021322785B2 (en) Communication content tailoring
CN115310425B (en) Policy text analysis method based on policy text classification and key information identification
CN111177325B (en) Method and system for automatically generating answers
CN115956242A (en) Automatic knowledge graph construction
Song et al. Ada-boundary: accelerating DNN training via adaptive boundary batch selection
Hong et al. Knowledge-grounded dialogue modelling with dialogue-state tracking, domain tracking, and entity extraction
Tang et al. LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models
CN116738298B (en) Text classification method, system and storage medium
Blümel et al. Comparative analysis of classical and deep learning-based natural language processing for prioritizing customer complaints
Sisodia et al. Performance evaluation of learners for analyzing the hotel customer sentiments based on text reviews
Lamons et al. Python Deep Learning Projects: 9 projects demystifying neural network and deep learning models for building intelligent systems
Huang et al. Chatgpt: Inside and impact on business automation
de Oliveira et al. A model-agnostic and data-independent tabu search algorithm to generate counterfactuals for tabular, image, and text data
Goossens et al. Comparing the Performance of GPT-3 with BERT for Decision Requirements Modeling
Wang et al. Sentence compression with reinforcement learning
US11868714B2 (en) Facilitating generation of fillable document templates
LU504829B1 (en) Text classification method, computer readable storage medium and system
CN117708351B (en) Deep learning-based technical standard auxiliary review method, system and storage medium
US20230297965A1 (en) Automated credential processing system
CN111368526B (en) Sequence labeling method and system
US20220269858A1 (en) Learning Rules and Dictionaries with Neuro-Symbolic Artificial Intelligence
CN117291168A (en) Aspect emotion analysis method and system
Nugent Assesing Completeness of Solvency and Financial Condition Reports through the use of Machine Learning and Text Classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant