CN117668562A

CN117668562A - Training and using method, device, equipment and medium of text classification model

Info

Publication number: CN117668562A
Application number: CN202410130515.7A
Authority: CN
Inventors: 何宇; 汪翔
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2024-01-31
Filing date: 2024-01-31
Publication date: 2024-03-08
Anticipated expiration: 2044-01-31
Also published as: CN117668562B

Abstract

The application discloses a training and using method, device, equipment and medium of a text classification model, and belongs to the field of data classification. The text classification model includes a first classifier, a second classifier, and an ensemble classifier, the method comprising: acquiring first sample data and second sample data, wherein the first sample data is used for representing a first text without keywords, and the second sample data is used for representing a second text with keywords; acquiring first result data through a first classifier based on the first sample data, and acquiring second result data through a second classifier based on the second sample data, wherein the first classifier is used for predicting the type of the first text, and the second classifier is used for predicting the type of the second text; based on the first result data and the second result data, n decision trees are constructed to form an integrated classifier. The embodiment of the application can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like.

Description

Training and using method, device, equipment and medium of text classification model

Technical Field

The embodiment of the application relates to the field of data classification, in particular to a training and using method, device, equipment and medium of a text classification model.

Background

In the current Internet, netizens express own beliefs by posting comments, and different comments have respective emotional tendencies. However, in different events, the same or similar comments may have different emotional tendencies. For example, in a general event, "regrettably" is used to represent an offensive negative emotion, but in some special events, such as when a bad person encounters a bad thing, "the bad person encounters such a bad thing is regrettably" may be a counter-word used to represent a happy positive emotion. Conventional text classification models will uniformly classify "unfortunate" as representing a negative emotion for difficulty and not as positive emotion for happiness.

In the related art, the conventional text classification model is adjusted through an online learning algorithm, so that the adjusted conventional text classification model can make more accurate predictions for different events.

However, the adjusted conventional text classification model needs to be frequently adjusted for different events, even when the conventional text classification model faces a general event, the conventional text classification model needs to be adjusted back to the original conventional text classification model again, the stability is poor, the performance of the conventional text classification model is easily affected by frequent adjustment, and the problems of over fitting and the like of the adjusted conventional text classification model are caused.

Disclosure of Invention

The application provides a training and using method, device, equipment and medium of a text classification model.

According to an aspect of an embodiment of the present application, there is provided a training method of a text classification model, the text classification model including a first classifier, a second classifier, and an integrated classifier, the method including:

acquiring first sample data and second sample data, wherein the first sample data is used for representing a first text without keywords, the second sample data is used for representing a second text with keywords, the keywords are characters affecting the classification result of a target character string, and the first text and the second text both comprise the target character string;

acquiring first result data through a first classifier based on the first sample data, and acquiring second result data through a second classifier based on the second sample data, wherein the first classifier is used for predicting the type of the first text, and the second classifier is used for predicting the type of the second text;

based on the first result data and the second result data, constructing n decision trees to form an integrated classifier, wherein the integrated classifier is used for predicting the type of at least one of the first text and the second text, and n is a positive integer greater than 1.

According to another aspect of an embodiment of the present application, there is provided a method for using a text classification model, the text classification model including a first classifier, a second classifier, and an ensemble classifier, the method including:

acquiring an input text;

acquiring third result data through a first classifier based on the input text, and acquiring fourth result data through a second classifier based on the input text;

based on the third result data and the fourth result data, outputting classification results through n decision trees in the integrated classifier, wherein n is a positive integer greater than 1.

According to another aspect of an embodiment of the present application, there is provided a training apparatus for a text classification model, the text classification model including a first classifier, a second classifier, and an integrated classifier, the apparatus comprising:

the system comprises an acquisition module, a first text generation module and a second text generation module, wherein the acquisition module is used for acquiring first sample data and second sample data, the first sample data is used for representing a first text without keywords, the second sample data is used for representing a second text with keywords, the keywords are characters affecting the classification result of target character strings, and the first text and the second text comprise target character strings;

the prediction module is used for acquiring first result data through a first classifier based on the first sample data and acquiring second result data through a second classifier based on the second sample data, wherein the first classifier is used for predicting the type of the first text, and the second classifier is used for predicting the type of the second text;

The construction module is used for constructing n decision trees based on the first result data and the second result data to form an integrated classifier, wherein the integrated classifier is used for predicting the type of at least one of the first text and the second text, and n is a positive integer greater than 1.

According to another aspect of an embodiment of the present application, there is provided an apparatus for using a text classification model, the text classification model including a first classifier, a second classifier, and an integrated classifier, the apparatus including:

the acquisition module is used for acquiring an input text;

the prediction module is used for acquiring third result data through the first classifier based on the input text and acquiring fourth result data through the second classifier based on the input text;

the output module is used for outputting classification results through n decision trees in the integrated classifier based on the third result data and the fourth result data, wherein n is a positive integer greater than 1.

According to another aspect of the embodiments of the present application, there is provided a computer device, comprising: a processor and a memory, wherein the memory stores at least one section of program; the processor is used for executing at least one section of program in the memory to realize the training method of the text classification model and the using method of the text classification model.

According to another aspect of the embodiments of the present application, there is provided a computer readable storage medium having at least one program stored therein, the at least one program being loaded and executed by a processor to implement the training method of the text classification model and the usage method of the text classification model.

According to another aspect of the embodiments of the present application, there is provided a computer program product or a computer program, the computer program product or the computer program including computer instructions, the computer instructions being stored in a computer readable storage medium, from which a processor obtains the computer instructions, the processor executing the computer instructions to implement the above-described training method of a text classification model and the use method of the text classification model.

The technical scheme provided by the embodiment of the application can comprise the following beneficial effects: the method comprises the steps of obtaining first sample data and second sample data, wherein the first sample data is used for representing a first text without keywords, the second sample data is used for representing a second text with keywords, the keywords are characters affecting the classification result of target character strings, and the first text and the second text comprise target character strings; acquiring first result data through a first classifier based on the first sample data, and acquiring second result data through a second classifier based on the second sample data, wherein the first classifier is used for predicting the type of the first text, and the second classifier is used for predicting the type of the second text; based on the first result data and the second result data, constructing n decision trees to form an integrated classifier, wherein the integrated classifier is used for predicting the type of at least one of the first text and the second text, and n is a positive integer greater than 1. Compared with a conventional text classification model, the text classification model uses the integrated classifier to predict, does not adjust the first classifier, and improves the stability of the text classification model. And in the case where the input text is the second text with the keyword, the type of the input text can be accurately predicted as well.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 illustrates a schematic diagram of a text classification system provided in an exemplary embodiment of the present application;

FIG. 2 illustrates a schematic diagram of a training method for a text classification model provided in an exemplary embodiment of the present application;

FIG. 3 illustrates a flowchart of a method for training a text classification model provided in an exemplary embodiment of the present application;

FIG. 4 illustrates a schematic diagram of features carried by first and second result data provided by an exemplary embodiment of the present application;

FIG. 5 illustrates a flowchart of a method of using a text classification model provided in an exemplary embodiment of the present application;

FIG. 6 illustrates a flowchart of a training method for a text classification model provided in another exemplary embodiment of the present application;

FIG. 7 illustrates a flowchart of a method of training a text classification model provided in accordance with yet another exemplary embodiment of the present application;

FIG. 8 illustrates a flowchart of a method of training a text classification model provided in accordance with yet another exemplary embodiment of the present application;

FIG. 9 illustrates a flowchart of a method of using a text classification model provided in another exemplary embodiment of the present application;

FIG. 10 is a flow chart illustrating a method of using a text classification model provided in accordance with yet another exemplary embodiment of the present application;

FIG. 11 illustrates a block diagram of a training device for text classification models provided in an exemplary embodiment of the present application;

FIG. 12 illustrates a block diagram of an apparatus for using a text classification model provided in an exemplary embodiment of the present application;

fig. 13 shows a block diagram of a computer device according to an exemplary embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.

The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be noted that, the object information (including, but not limited to, object device information, object personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) related to the present application are both information and data authorized by the object or sufficiently authorized by each party, and the collection, use, and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.

It should be understood that, although the terms first, second, etc. may be used in this application to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first parameter may also be referred to as a second parameter, and similarly, a second parameter may also be referred to as a first parameter, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

Terminal devices involved in the embodiments of the present application include, but are not limited to, mobile phones, computers, intelligent voice interaction devices, intelligent home appliances, vehicle terminals, aircrafts, and the like. The embodiments of the present application may be applied to various scenarios including, but not limited to, cloud technology, artificial intelligence, intelligent transportation, assisted driving, and the like.

First, the following description is made of the related matters related to the present application.

Pre-Training Model (PTM): the model is also called a basic stone model and a large model, which means that a deep neural network (Deep Neural Network, DNN) with large parameters trains the model on massive unlabeled data, the PTM extracts common characteristics on the data by utilizing the function approximation capability of the large-parameter DNN, and the model is suitable for downstream tasks through technologies of fine tuning, efficient fine tuning of parameters and the like. Thus, the pre-training model can achieve ideal effects in small sample or zero sample scenarios. PTM can be classified into language model, visual model, speech model, multi-modal model, etc. according to the data modality being processed, wherein multi-modal model refers to a model that builds a representation of features of two or more data modalities. The pre-training model is an important tool for outputting artificial intelligence to generate content, and can also be used as a general interface for connecting a plurality of specific task models.

Natural language processing (Nature Language Processing, NLP): is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing involves natural language, i.e., language that people use daily, closely with linguistic research, as well as computer science and mathematics. An important technique for model training in the artificial intelligence domain, a pre-training model, is developed from a large language model (Large Language Model) in the NLP domain. Through fine tuning, the large language model can be widely applied to downstream tasks. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

Machine Learning (ML): the method is a multi-field intersection subject, relates to a plurality of subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like, and is used for specially researching how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like. The pre-training model is the latest development result of deep learning, and integrates the technology.

Artificial intelligence (Artificial Intelligence, AI): the system is a theory, a method, a technology and an application system which simulate, extend and extend human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include, for example, sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, pre-training model technologies, operation/interaction systems, mechatronics, and the like. The pre-training model is also called a large model and a basic model, and can be widely applied to all large-direction downstream tasks of artificial intelligence after fine adjustment. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

With research and progress of artificial intelligence technology, research and application of artificial intelligence technology are developed in various fields, such as common smart home, intelligent wearable equipment, virtual assistant, intelligent sound box, intelligent marketing, unmanned, automatic driving, unmanned aerial vehicle, digital twin, virtual person, robot, artificial intelligence generation content, conversational interaction, intelligent medical treatment, intelligent customer service, game AI and the like, and with development of technology, the artificial intelligence technology is applied in more fields and plays an increasingly important role.

Fig. 1 shows a schematic diagram of a text classification system 100 according to an exemplary embodiment of the present application, where the text classification system 100 includes: training device 110, classification device 120, and terminal device 130.

The training device 110 and the classifying device 120 are servers, in this application, the servers may be independent physical servers, may be a server cluster or a distributed system formed by a plurality of physical servers, and may also be cloud servers that provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms, but are not limited thereto.

The training device 110 is configured to train a text classification model, and the classifying device 120 is configured to receive input data sent by the terminal device 130, determine a type of the input data using the model, and output a prediction result. Optionally, training device 110 and classification device 120 are the same device; alternatively, the training device 110 and the classifying device 120 are different devices, and in this embodiment of the present application, the training device 110 and the classifying device 120 are illustrated as different devices.

The terminal device 130 may be an electronic device such as a personal computer (Personal Computer, PC), a mobile phone, a tablet computer, an in-vehicle terminal (car machine), a wearable device, or the like.

Those skilled in the art will appreciate that the number of terminal devices 130 may be greater or lesser. For example, the number of the terminal devices 130 may be only one, or the number of the terminal devices 130 may be several tens or hundreds, or more. The number and device types of the terminal devices 130 are not limited in the embodiment of the present application.

In the current internet, netizens express their own beliefs by posting comments, different comments have their own emotional tendencies, however in different events, the same or similar comments may have different emotional tendencies. For example, in a general event, "regrettably" is used to represent an offensive negative emotion, but in some special events, such as when a bad person encounters a bad thing, "the bad person encounters such a bad thing is regrettably" may be a counter-word used to represent a happy positive emotion. Conventional text classification models will uniformly classify "unfortunate" as representing a negative emotion for difficulty and not as positive emotion for happiness.

In order to solve the above problem, embodiments of the present application provide a text classification model that may classify a text, for example, by deriving a classification result as positive emotion or negative emotion or neutral emotion according to emotion in the text, or by deriving a classification result as written or spoken language according to a text style. The embodiment of the application does not limit specific classification tasks, and only text emotion classification tasks are taken as examples for illustration. FIG. 2 illustrates a schematic diagram of a training method for a text classification model provided in an exemplary embodiment of the present application.

In some embodiments, training of the text classification model is divided into three parts: the first classifier training part, the second classifier training part and the integrated classifier training part. Wherein the first classifier may also be referred to as a base classifier 210 and the second classifier may also be referred to as an event classifier 220.

In some embodiments, the base classifier 210 is used to train on the first sample data. The first sample data may also be referred to as sample base data, including data of a first text (general text) without a keyword as a sample, and a label corresponding to the general text, which is text evaluating a general case without a keyword, for example, a label corresponding to "being unfortunately" is negative emotion in a general case without a keyword.

The event classifier 220 is used to train on the second sample data. The second sample data may also be referred to as sample event data, including data of a second text with a keyword (event text) as a sample, and a label corresponding to the event text, the event text being text with a keyword for evaluating a specific event, for example, "the bad person is regrettably ill to the bad person", the keyword is "the bad person", and the corresponding label is positive emotion.

The integrated classifier 230 is used to train on the first result data output by the base classifier 210 and the second result data output by the event classifier 220. The first result data includes a first feature vector of the first sample data and a value of a tag corresponding to the general text (first predicted value), and the second result data includes a second feature vector of the second sample data and a value of a tag corresponding to the event text (second predicted value). For example, a general text is "regrettable", the corresponding label is negative emotion, the first predictive value is 0.3, the event text is "the bad person is regrettable by the bad person", the corresponding label is positive emotion, and the second predictive value is 0.7.

In some embodiments, the integrated classifier comprises n decision trees, the (i+1) th decision tree is constructed based on the residual of the (i) th decision tree, the first result data and the second result data, i being a positive integer having a starting value of 1 and less than n. The number n of decision trees is determined based on a preset parameter num_round, for example when num_round=20, indicating that 20 decision trees need to be built. Each decision tree is used for predicting the type of the input text, and outputting the numerical value of the predicted classification result, for example, the numerical value of the classification result corresponding to the 1 st decision tree is G1, the numerical value of the classification result corresponding to the 2 nd decision tree is G2, and the numerical value of the finally output classification result is the sum value G=G1+G2+ … … +G20 of the classification results of the 20 decision trees. Each decision tree is divided from the root node downwards gradually, and the depth of the decision tree is increased gradually along with node classification until the depth of the decision tree reaches a certain threshold value.

In some embodiments, the first classifier training portion (base classifier training portion) and the second classifier training portion (event classifier training portion) are performed simultaneously or sequentially, and embodiments of the present application are described with respect to the simultaneous performance of the two training portions.

FIG. 3 illustrates a flowchart of a method for training a text classification model provided in an exemplary embodiment of the present application. In some embodiments, the base classifier training portion includes at least one of the following steps.

Step 310: training the first pre-training model to generate a base classifier.

In some embodiments, first sample data is input into a first pre-training model, the first pre-training model is trained to generate a base classifier, and the first sample data is data representing general text.

In some embodiments, training the first pre-training model flow includes at least one of the following steps.

(1) The raw data is preprocessed.

In some embodiments, to obtain the first sample data, the raw data needs to be preprocessed.

Illustratively, data corresponding to 20 ten thousand texts related to the text emotion classification task is collected as original data, and is subjected to duplicate removal processing. And (3) manually labeling the original data to obtain preprocessed data with labels, wherein the labels comprise 3 labels of positive emotion, negative emotion and neutral emotion. The tag may be represented by a numerical value, for example, negative emotions ranging from 0 to 0.4 (including 0 and 0.4), neutral emotions ranging from 0.4 to 0.6 (excluding 0.4 and 0.6), positive emotions ranging from 0.6 to 1 (including 0.6 and 1).

And delivering the marked preprocessed data to a third party for quality inspection. Taking the preprocessed data with accurate quality inspection results (i.e. consistent quality inspection results) as first sample data under the condition that the quality inspection accuracy is greater than a certain threshold value, such as 90%; and under the condition that the quality inspection accuracy is less than or equal to 90%, manually marking the original data again.

In some embodiments, the first sample data is subjected to a data cleansing prior to training, the cleansing means comprising at least one of: repeating the mark cleaning, removing special characters, removing stop words and expanding word stock. Wherein, the repeated mark cleaning means that continuously repeated words or characters are removed from the text; removing special characters refers to removing some special characters such as punctuation marks and special symbols; removing stop words refers to removing words that frequently appear in text but do not typically carry much semantic information, such as "etc"; thesaurus expansion refers to expanding text by using synonyms, anti-ambiguities, stem extraction, morphological transformations, and the like.

(2) A pre-training model is selected.

In some embodiments, a first pre-training model using dynamic mask processing is selected as the pre-training model to enhance the generalization ability of the base classifier.

The mask is used to randomly mask or replace any word or word in the text and the first pre-trained model predicts the masked or replaced portion by context.

The dynamic masking process refers to copying n copies of the original data of the input text T to obtain copied data T1, T2, … …, tn. A random static mask process is performed for each copy of the copy data, so that the processing results of the n copies of the data are not identical or the processing results of the n copies of the data are completely different.

The static masking process refers to assuming that the input text T is associated with a random seed that can transform a portion of the text in the input text T, thereby transforming the input text T into the transformed text M. The transformation mode comprises at least one of the following steps: converting the partial text into blank text; converting part of the text into other text; no transformation was performed. When the random seed of the input text T is unchanged, the converted text M is unchanged.

(3) And training a base classifier.

In some embodiments, parameters in the first pre-training model are set. Exemplary, the learning rate is set to 1×10 ^-5 The activation function selects the normalized exponential function (softmax), the loss function selects the categorical cross entropy function (categorical cross entropy), epoch=64, batch_size=64, max_length=128, num_labes=3, and wakeup_ratio=0.06. Where epoch=64 means that training will be performed for 64 rounds (epoch) to update the parameters of the model, each round wrapping Multiple steps (steps) are included, in each of which data of a batch (batch) is processed. batch_size=64 means that the number of samples used is 64 at each parameter update. max_length=128 indicates that the maximum length of the input data is 128 characters. num_labels=3 indicates that the number of types of prediction is 3. wakeup_ratio=0.06 means that the learning rate increases by 0.06 every time one round of training is completed.

In some embodiments, the data enhancement training is performed by performing random insert/delete/replace processing on a word in the text; or extracting part of characters from the long text as input, for example, extracting the first 128 characters from the long text including more than 128 characters as input, thereby avoiding information loss. Through the processing mode, the robustness of the base classifier is improved.

In some embodiments, the first pre-training model is adjusted, and the adjustment procedure includes the following steps.

1) The parameters are imported, and a first pre-training model is set, including an activation function and a loss function. A fully connected layer is included in the first pre-training model for providing classification functionality.

2) The input data is segmented through a word segmentation device associated with the first pre-training model, and information such as training related marks (token), mask information, identifiers (IDs) and the like is obtained. In natural language processing, word segmentation is a process of segmenting continuous text (such as sentences or paragraphs) into words according to certain rules. In the first pre-training model, a corresponding function library comprises a word segmentation device for carrying out word segmentation processing on the text.

3) And (3) training is started, training indexes such as loss (loss) and accuracy (accuracy) of the current model are measured every 5 steps, and output and display are performed.

4) The loss is calculated for each step by a categorical cross entropy function (categorical cross-entropy). The classification cross entropy function formula is shown below.

；

Wherein,representing the true value of the ith sample, < +.>Representing the predicted value of the ith sample, the output size represents the number of categories of output tags, +.>The j-th feature representing the i-th sample,/>And the weight coefficient corresponding to the j-th feature is represented. For example, assume that output size=2, the true value y_true= [0,1]Predicted value y_pred= [0.4,0.6 ]]Loss= - (0×log (0.4) +1×log (0.6))= 0.5108256.

5) An early stop (early stop) mechanism is added, iteration times are set to be equal to 10, and the model training is stopped when the loss is not reduced in 10 continuous training periods, so that a trained base classifier is obtained, and the overfitting problem is prevented.

Step 320: the base classifier outputs first result data.

In some embodiments, the base classifier outputs first result data based on the input first sample data. The first result data at least comprises a first feature vector of the first sample data and a predicted value of a label corresponding to the first sample data.

In some embodiments, the event classifier training portion includes at least one of the following steps.

Step 330: and training the second pre-training model to generate an event classifier.

In some embodiments, the second sample data is input into a second pre-training model, which is trained to generate an event classifier. The second sample data is data corresponding to a second text with a keyword as a sample. The keyword is a character affecting the classification result of the target character string, and the first text and the second text each include the target character string, for example, the first text is "regrettably" and the second text is "regrettably" that the bad person encounters the bad thing ", and" regrettably "is included in addition to the keyword" the bad person "," the bad thing ".

In some embodiments, the keywords are characters that affect the results of the text classification. If the first text is classified as a first classification result, the second text with keywords is classified as a second classification result. For example, the first text is "regrettable", classified as negative emotion, the second text is "the bad person is regrettable by the bad person", classified as positive emotion, since the second text carries the keywords "the bad person", "the bad person".

In some embodiments, the keywords are characters that directly or indirectly reflect the classification propensity of the text. For the second text with keywords, there is a classification tendency to classify in a certain fixed direction. For example, for a second text with the keyword "fine" the classification trend is classified as positive emotion.

In some embodiments, the keywords include at least one of: keywords such as time keywords, place keywords, character keywords, behavior keywords, emotion keywords, numerical keywords, and view keywords.

Wherein, the time-class keywords include specific time or holidays related to the event, such as "2024, 1", "spring festival"; the place class keywords include specific country names associated with the event; the character keywords include specific character names or character information related to the event, such as "Zhang Sanj", "certain engineer"; behavior-like keywords include behaviors or actions related to an event, such as "sign agreement", "speak; emotion-type keywords include emotion or emotion associated with an event, such as "anger", "sadness"; numeric class keywords include specific numeric values associated with an event, such as "100 people", "3%"; the opinion keywords include opinion or opinion related to the event, such as "endorsement", "criticism".

The principle of generating the event classifier refers to step 310 of generating the base classifier, which is not described in detail herein.

Step 340: the event classifier outputs second result data.

In some embodiments, the event classifier outputs second result data based on the second sample data input. The second result data at least comprises a second feature vector of the second sample data and a predicted value of the label corresponding to the second sample data.

Step 350: and training an integrated classifier.

In some embodiments, the first result data and the second result data are input into an integrated classifier.

In some embodiments, a feature set is obtained, the feature set comprising at least one of: the emotion keyword hits the marker bit feature, the user emotion portraits feature, the positive text similarity feature, the negative text similarity feature, the neutral text similarity feature, and the heat feature.

In some embodiments, the above-described features are carried in the first sample data and/or the second sample data.

In some embodiments, the above-described features are carried in the first result data and/or the second result data. As shown in fig. 4, first result data 410 carries a base classifier score feature 411, a sentiment key hit flag bit feature 412, a user sentiment image feature 413, a positive text similarity feature 414, a negative text similarity feature 415, a neutral text similarity feature 416; the second result data 420 carries an event classifier score feature 421 and a heat feature 422, for a total of 8-dimensional features.

Wherein the base classifier score feature 411 comprises a first feature vector representing first sample data and a first predictor; emotion keyword hit flag bit feature 412 indicates whether an emotion keyword exists in the text corresponding to the first sample data, such as "happy", "sad", etc., and generally indicates the existence by 1, and the nonexistence by 0; the user emotion figure feature 413 is a feature corresponding to a user emotion figure, and the user emotion figure is a user figure established for emotion tendencies of a user based on data such as historical behaviors and comments of the user; a positive text similarity feature 414, a negative text similarity feature 415, a neutral text similarity feature 416, which are text similarity-based calculations for measuring the similarity of the text corresponding to the first sample data to the positive, negative or neutral text; the event classifier score feature 421 includes a second feature vector representing second sample data and a second predictor; the popularity feature 422 is used to represent popularity of text corresponding to the second sample data, such as a number of forwarding on social media, a number corresponding to a praise, and the like.

In some embodiments, in addition to the first result data necessarily carrying the base classifier score feature and the second result data necessarily carrying the event classifier score feature, other features may be freely carried in the first result data and/or the second result data. For example, the first result data carries a base classifier score feature, a emotion keyword hit flag bit feature, a user emotion portrait feature, a positive text similarity feature, a negative text similarity feature, a neutral text similarity feature; the second result data carries event classifier score features, emotion keyword hit flag bit features and heat features.

In some embodiments, an XGBoost algorithm is selected to train the integrated classifier, the XGBoost algorithm needs to sequentially construct n decision trees, and the (i+1) th decision tree is constructed based on the residual error of the (i) th decision tree, the first result data and the second result data, wherein i is a positive integer with a starting value of 1 and less than n. The number n of decision trees is determined based on a preset parameter num_round, for example when num_round=20, indicating that 20 decision trees need to be built. Each decision tree is used for predicting the type of the input text, and outputting the numerical value of the predicted classification result, for example, the numerical value of the classification result corresponding to the 1 st decision tree is G1, the numerical value of the classification result corresponding to the 2 nd decision tree is G2, and the numerical value of the finally output classification result is the sum value G=G1+G2+ … … +G20 of the classification results of the 20 decision trees.

Each decision tree in the XGBoost algorithm is divided from the root node downwards gradually, and the depth of the decision tree is increased gradually along with node classification until the depth of the decision tree reaches a certain threshold value. In the embodiment of the present application, according to the set parameter max_depth=6, the depth of the decision tree is determined to be 6.

The node splitting of each decision tree is determined based on the characteristics of the sample. By selecting a feature f with the greatest splitting value and a corresponding constant H, two child nodes are split: { x|x_f. Gtoreq.H } and { x|x_f < H }, defining { x|x_f. Gtoreq.H } to indicate that the value of the characteristic f of sample x is greater than or equal to H, and { x|x_f < H } to indicate that the value of the characteristic f of sample x is less than H. Wherein the split value is represented by an objective function Obj.

；

Where K represents the total number of decision trees, n is the number of samples,is the true value of the ith sample, < +.>For the predicted value of the ith sample, +.>A value representing the i-th sample, +.>Representing, for the ith sample, the predicted value of the kth decision tree,for loss function, for calculating the difference between the true and predicted values,/>For regularization terms, representing the complexity of the decision tree, the more complex the decision tree the higher the value of the regularization term.

In some embodiments, the integrated classifier includes a plurality of decision trees and the training process of the integrated classifier includes the following steps.

1) And calculating candidate division points for each one of the 8-dimensional features by adopting a division point algorithm.

In some embodiments, the quantile algorithm takes its quantiles, also called quantiles, from the distribution of the feature, which is a numerical point for dividing the distribution of the feature into several equal parts, e.g., the median (i.e., the quantile) is a numerical point for dividing the distribution of the feature into two equal parts, and the quantile is a numerical point for dividing the distribution of the feature into four equal parts. The dividing points are used as dividing points to divide the whole characteristic interval into a plurality of segments, and all the characteristic values have corresponding segments, namely, each characteristic value is positioned in only one segment in the plurality of segments, and the plurality of segments are not overlapped with each other. The true eigenvalue is replaced by a dividing point, the essence of which is the piecewise dispersion of the continuous eigenvalue, so as to calculate the candidate dividing point.

2) And constructing a 1 st decision tree and splitting nodes.

In some embodiments, in the 8-dimensional feature, the extraction part of the feature participates in splitting, for example, it is determined that the extraction part of the feature participates in splitting according to the set parameter colsample_byte=0.8, and 8×80% =6.4≡6, that is, the extraction part of the feature participates in splitting in 6 dimensions. The feature number is a positive integer, and rounding modes include rounding, upward rounding, downward rounding and the like, and the embodiments of the present application are not limited thereto, and are described by taking rounding as an example.

In some embodiments, in the text corresponding to all sample data, the extracted part of the samples participate in splitting, for example, it is determined that 60% of the samples are extracted to participate in splitting according to the set parameter sub_sample=0.6.

In some embodiments, according to the objective function value corresponding to the candidate segmentation point of each dimension feature, selecting the candidate segmentation point with the largest objective function value, and performing subsequent node splitting.

3) According to the set parameter max_depth=6, determining the depth of the split decision tree as 6, calculating the residual error of the predicted value and the true value of each child node, and delivering to the 2 nd decision tree for calculation, wherein the construction step of the 2 nd decision tree is the same as that of the 1 st decision tree.

4) And so on, until 20 decision trees are built according to the set parameter num_round=20. Each decision tree corresponds to the value of 1 classification result, for example, the value of the classification result corresponding to the 1 st decision tree is G1, the value of the classification result corresponding to the 2 nd decision tree is G2, and the value of the finally output classification result is the sum of the classification results of the 20 decision trees, namely, the finally output classification result g=g1+g2+ … … +g20.

FIG. 5 is a flow chart illustrating a method of using a text classification model according to an exemplary embodiment of the present application, the text classification model comprising: a first classifier (base classifier), a second classifier (event classifier) and an integrated classifier, the method comprising at least one of the following steps.

Step 510: and inputting a base classifier.

In some embodiments, the input text is input to the base classifier, and the data of the input text (simply referred to as input data) includes offline data, which is data stored in a memory, and online data, which is data generated by a specified website or application over a period of time, e.g., the online data is data corresponding to all comments in the last 5 minutes of website a.

Step 520: the base classifier outputs third result data.

In some embodiments, the base classifier outputs third result data to the integrated classifier based on the input text.

In some embodiments, the third result data includes a third feature vector representing the input text and a third predicted value of a tag corresponding to the input text.

In some embodiments, steps 510 to 520 and 530 to 550 are performed simultaneously or sequentially, and the embodiments of the present application are described by taking two steps as an example.

Step 530: judging whether the first concentration reaches the standard.

In some embodiments, the text classification model determines whether the first concentration of the input data is greater than a first threshold, i.e., whether the first concentration meets a criterion. The first density is a ratio of the data amount of the first data to the data amount of the input data, and the first data is data of the second text with the keyword.

In some embodiments, when the first concentration is greater than or equal to a first threshold, step 544 is performed; when the first concentration is less than the first threshold, step 542 is performed.

Illustratively, the first threshold is 10%, and when the first concentration is 20% greater than 10%, step 544 is performed to turn on the event classifier; when the first concentration is 1% less than 10%, step 542 is performed without turning on the event classifier.

Step 542: the event classifier is not turned on.

In some embodiments, the default event classifier is an unopened state, and the event classifier is maintained in an unopened state when the first concentration is less than the first threshold.

In some embodiments, after the event classifier is turned on, the event classifier is turned off when the first concentration is less than a first threshold.

Step 544: the event classifier is turned on.

In some embodiments, the event classifier is turned on when the first concentration is greater than or equal to a first threshold.

Step 550: the event classifier outputs fourth result data.

In some embodiments, the event classifier outputs fourth result data to the integrated classifier based on the input text.

In some embodiments, the data of the input text is expanded through expansion operation by means of text similarity expansion, text context expansion, variant word replacement, synonym replacement, homonym replacement and the like, so that the data size of the input text is expanded.

In some embodiments, the fourth result data includes a third feature vector representing the input text and a fourth predicted value of a tag corresponding to the input text.

Step 560: and the integrated classifier outputs classification results.

In some embodiments, the integrated classifier outputs a classification result based on the third result data and the fourth result data.

For the input text of which the bad person is regrettably ill, the predicted value of the label included in the third result data is 0.3, namely a negative label, and the third result data also includes other characteristics such as emotion keyword hit mark bit characteristics, user emotion portrait characteristics and the like; the predicted value of the label included in the fourth result data is 0.8, namely the front label, and the fourth result data also includes other characteristics such as heat characteristics; the integrated classifier outputs a classification result of 0.7 according to the third result data and the fourth result data, namely a positive label, and the classification result is used for expressing that the text expresses positive emotion.

In summary, compared with the conventional text classification model, the text classification model in the method provided by the embodiment uses the integrated classifier to predict, does not adjust the first classifier, and improves the stability of the text classification model. And in the case where the input text is the second text with the keyword, the type of the input text can be accurately predicted as well.

The method provided by the embodiment also judges whether to start the second classifier according to the first concentration, starts the second classifier when the first concentration is larger than or equal to the first threshold value, so that the prediction of the type of the input text is more accurate, does not start the second classifier when the first concentration is smaller than the first threshold value, does not influence the first classifier, and predicts the type of the input text according to the predicted value of the first classifier.

Fig. 6 shows a flowchart of a training method for a text classification model according to another exemplary embodiment of the present application, the method being performed by a training device, the method comprising at least one of the following steps.

Step 610: first sample data and second sample data are acquired.

In some embodiments, the first sample data is used to represent a first text without keywords, the second sample data is used to represent a second text with keywords, the keywords being characters affecting the classification result of the target string, the first text and the second text each comprising the target string.

In some embodiments, the keywords are characters that affect the emotional classification results of the target string. The emotion classification result includes at least one of: positive emotion, negative emotion, neutral emotion.

Illustratively, the first text is "i lost wallet is regrettable", the second text is "the bad person is regrettable by the bad person", except the keywords "the bad person", the first text is a negative emotion as the emotion classification result, the second text is provided with the keywords, the emotion classification result of the target character string is influenced, and the emotion classification result of the second text is a positive emotion.

In some embodiments, the keywords are characters that affect the semantic classification results of the target string. The second text has keywords, so that the semantic classification result of the second text is different from the semantic classification result of the first text.

Illustratively, the first text is "pineapple", the second text is "pineapple phone", and the target string "pineapple" is typically used to represent pineapple in fruit, but the second text affects the semantic classification result of the target string with the keyword "phone", and the "pineapple" in the second text is used to represent a phone brand.

In some embodiments, the keywords are characters that affect the text category classification result of the target string. The second text has keywords, and the text category classification result of the second text is different from the text category classification result of the first text.

Illustratively, the first text is "raining today", the second text is "i think raining today", the first text is a fact-type text, the second text carries the keyword "i think", and affects the text category classification result of the target character string, and the second text is a view-type text.

The above classification results are merely examples, and other classification results may be included, which are not limited in this embodiment, and description is given by taking the case that the classification result is an emotion classification result as an example.

In some embodiments, if a first text is classified as a first classification result, a second text with keywords is classified as a second classification result. For example, the first text is "regrettable", classified as negative emotion, the second text is "the bad person is regrettable by the bad person", classified as positive emotion, since the second text carries the keywords "the bad person", "the bad person".

Step 622: based on the first sample data, first result data is obtained by a first classifier.

In some embodiments, a first classifier is used to predict the type of the first text, which may also be referred to as a base classifier.

Illustratively, in the case where the first text is "too regrettable", the first classifier predicts that the emotion represented by the first text is a negative emotion by a first predictor of 0.3, and the first result data includes the first predictor.

Step 624: based on the second sample data, second result data is obtained by a second classifier.

In some embodiments, a second classifier is used to predict the type of the second text, which may also be referred to as an event classifier.

Illustratively, in the case where the second text is "the bad person is regrettably suffering from the bad person", the second classifier predicts that the emotion represented by the second text is positive emotion by a second predictor of 0.7, and the second result data includes the second predictor.

Step 630: based on the first result data and the second result data, n decision trees are constructed to form an integrated classifier.

In some embodiments, the integrated classifier is configured to predict a type of at least one of the first text and the second text, and n is a positive integer greater than 1.

In some embodiments, n decision trees are constructed sequentially in sequence, and the (i+1) th decision tree is constructed based on the residual error of the (i) th decision tree, the first result data and the second result data, i being a positive integer having a start value of 1 and less than n. The number n of decision trees is determined based on a preset parameter num_round, for example when num_round=20, indicating that 20 decision trees need to be built. Each decision tree is used for predicting the type of the input text, and outputting the numerical value of the predicted classification result, for example, the numerical value of the classification result corresponding to the 1 st decision tree is G1, the numerical value of the classification result corresponding to the 2 nd decision tree is G2, and the numerical value of the finally output classification result is the sum value G=G1+G2+ … … +G20 of the classification results of the 20 decision trees.

In summary, compared with the conventional text classification model, the text classification model in the method provided by the embodiment does not adjust the first classifier, and improves the stability of the text classification model. And in the case where the input text is the second text with the keyword, the type of the input text can be accurately predicted as well.

In some embodiments, for a flow of building n decision trees, fig. 7 shows a flowchart of a training method of a text classification model provided in a further exemplary embodiment of the present application, the method being performed by a training device, the method comprising at least one of the following steps.

Step 610: first sample data and second sample data are acquired.

In some embodiments, the first text and the second text each include a meaning of the target string, including at least one of: the partial text in the second text is the same as the first text; the partial text in the second text is the same as the partial text in the first text; the semantics of the partial text in the second text and the semantics of the partial text in the first text are the same.

In some embodiments, the partial text in the second text represents all text in the second text except for the keywords, or represents partial text in the second text except for the keywords; the partial text in the first text represents the partial text in the first text.

Illustratively, the keywords are "this bad person", "this bad thing", and in the case where the second text is "this bad person is regretted with this bad thing", the first text is "regretted", the local text other than the keywords in the second text is "regretted" the same as the first text;

in the case that the second text is "the bad person encounters the bad thing is regrettable", and the first text is "regrettable", the local text except the keyword in the second text is "regrettable" and the local text in the first text is "regrettable" the same;

in the case where the second text is "this bad person is regrettably hit to this bad thing", and the first text is "truly offensive", the local text "regrettably" except for the keyword in the second text is the same semantic meaning as the local text "offensive" in the first text.

In some embodiments, the type of the first text and the type of the second text include at least one of:

text with positive emotion; text of negative emotion; text of neutral emotion;

the text type of positive emotion corresponds to the numerical range of the first interval, the text type of neutral emotion corresponds to the numerical range of the second interval, the text type of negative emotion corresponds to the numerical range of the third interval, and the first interval, the second interval and the third interval are different from each other.

Illustratively, for positive emotion text "happy", the first interval is 0.6 to 1 (including 0.6 and 1); the text "i don't call" for neutral emotion, the second interval is 0.4 to 0.6 (excluding 0.4 and 0.6); for the text "sad" of negative emotion, the third interval is 0 to 0.4 (including 0 and 0.4), the first interval, the second interval, and the third interval are different from each other two by two, and have no intersection with each other.

In some embodiments, the keywords include at least one of: time-class keywords; place class keywords; character keywords; behavior-type keywords; emotion keywords; numerical class keywords; view class keywords.

Details of the specific implementation refer to step 622 of the embodiment of fig. 6, and are not described herein.

Details of the specific implementation refer to step 624 of the embodiment of fig. 6, and are not described herein.

Step 632: based on the first result data and the second result data, constructing n decision trees with the same structure to form an integrated classifier.

In some embodiments, residuals of predicted values and true values of all child nodes of an ith decision tree in the n decision trees need to be input into the (i+1) th decision tree, n is a preset value, i is a positive integer with a starting value of 1 and less than n, the first result data comprises a first feature vector and a first predicted value of first sample data, the second result data comprises a second feature vector and a second predicted value of second sample data, and the sum of values output by the n decision trees is used for representing the type of input text, and the input text comprises at least one of the first text and the second text.

In some embodiments, each decision tree is split from the root node progressively downward, with the depth of the decision tree progressively increasing with node classification until the depth of the decision tree reaches a certain threshold. For example, according to the set parameter max_depth=6, the depth of the decision tree is determined to be 6.

In some embodiments, constructing an ith decision tree based on the first result data and the second result data; calculating the residual error of the predicted value and the true value of each child node of the ith decision tree; constructing an i+1th decision tree based on the first result data, the second result data and the residual error, wherein the construction step of the i+1th decision tree is the same as the construction step of the i decision tree; repeating the steps until n decision trees are constructed to form an integrated classifier.

In some embodiments, constructing the ith decision tree based on the first result data and the second result data comprises: calculating candidate division points respectively corresponding to the first sample data and the second sample data by adopting a division point algorithm; node splitting is carried out on the lowest layer node of the ith decision tree based on the candidate partition points; and under the condition that the number of layers of splitting of the ith decision tree reaches a preset number of layers, constructing to obtain the ith decision tree.

The dividing point algorithm takes dividing points according to the distribution of the features, the dividing points are used as dividing points to divide the whole feature interval into a plurality of segments, all feature values have corresponding segments, the dividing points replace real feature values, the essence of the dividing points is that the segments of continuous feature values are discrete, and therefore candidate dividing points are calculated.

In some embodiments, the node splitting of each decision tree is determined based on characteristics of the sample. By selecting a feature f with the greatest splitting value and a corresponding constant H, two child nodes are split: { x|x_f. Gtoreq.H } and { x|x_f < H }, defining { x|x_f. Gtoreq.H } to indicate that the value of the characteristic f of sample x is greater than or equal to H, and { x|x_f < H } to indicate that the value of the characteristic f of sample x is less than H. Wherein the split value is represented by an objective function Obj.

；

Where K represents the total number of decision trees, n is the number of samples,is the true value of the ith sample, < +.>For the predicted value of the ith sample, +.>A value representing the i-th sample, +.>Representing, for the ith sample, the predicted value of the kth decision tree,for loss function, for calculating the difference between the true and predicted values,/>For regularization term, representing complexity of decision tree, decisionThe more complex the tree the higher the value of the regularization term.

In some embodiments, the ith decision tree is constructed under the condition that the number of split layers of the ith decision tree reaches a preset number of layers. For example, the preset parameter max_depth=6, which indicates that the preset layer number is 6, and the layer number of the splitting of the ith decision tree is determined to be 6.

In some embodiments, prior to constructing the n decision trees, the method further comprises: acquiring a feature set, wherein the feature set comprises at least one of the following: the emotion keyword hits the marker bit feature, the user emotion portraits feature, the positive text similarity feature, the negative text similarity feature, the neutral text similarity feature, and the heat feature.

In some embodiments, n decision trees are constructed to form an integrated classifier based on the first result data, the second result data, and the feature set.

The method provided by the embodiment further improves the accuracy of text classification model prediction by constructing n decision trees to form an integrated classifier, compared with the method for predicting by using only the first classifier or the second classifier.

In some embodiments, the first classifier is generated by training a first pre-training model and the second classifier is generated by training a second pre-training model. Fig. 8 shows a flowchart of a training method for a text classification model according to a further exemplary embodiment of the present application, the method being performed by a training device, the method comprising at least one of the following steps.

Step 610: first sample data and second sample data are acquired.

Details of the specific implementation refer to step 610 of the embodiment of fig. 6, and are not described herein.

Step 615: the first pre-training model is trained based on the first sample data, generating a first classifier.

In some embodiments, the first pre-training model is a pre-training model with data classification capabilities.

In some embodiments, training the first pre-training model based on the first sample data with the first loss not decreasing as a training end target during p training periods, generating a first classifier; the first loss is an error between a real value corresponding to the first sample data and a predicted value, and p is a positive integer.

(1) The raw data is preprocessed.

Illustratively, data corresponding to 20 ten thousand texts related to the text emotion classification task is collected as original data, and is subjected to duplicate removal processing. And (3) manually labeling the original data to obtain preprocessed data with labels, wherein the labels comprise 3 labels of positive emotion, negative emotion and neutral emotion.

(2) A pre-training model is selected.

In some embodiments, a first pre-training model using dynamic mask processing is selected as the pre-training model to enhance the generalization ability of the first classifier.

(3) The first classifier is trained.

In some embodiments, parameters in the first pre-training model are set. Exemplary, the learning rate is set to 1×10 ^-5 The activation function selects the normalized exponential function (softmax), the loss function selects the categorical cross entropy function (categorical cross entropy), epoch=64, batch_size=64, max_length=128, num_labes=3, and wakeup_ratio=0.06. Where epoch=64 means that training will be performed for 64 rounds (epoch) to update the parameters of the model, each round will contain multiple steps (step) in each of which data of one batch (batch) will be processed. batch_size=64 tableThe number of samples used is 64, shown at each parameter update. max_length=128 indicates that the maximum length of the input data is 128 characters. num_labels=3 indicates that the number of types of prediction is 3. wakeup_ratio=0.06 means that the learning rate increases by 0.06 every time one round of training is completed.

In some embodiments, the data enhancement training is performed by performing random insert/delete/replace processing on a word in the text; or extracting part of characters from the long text as input, for example, extracting the first 128 characters from the long text including more than 128 characters as input, thereby avoiding information loss. Through the processing mode, the robustness of the first classifier is improved.

In some embodiments, the first pre-training model is adapted, the adaptation procedure comprising at least one of the following steps.

3) And (3) training is started, training indexes such as the first loss and the accuracy of the current model are measured every 5 steps, and the output display is performed.

4) The first loss (loss) is calculated for each step by a categorical cross entropy function (categorical cross-entropy). The classification cross entropy function formula is shown below.

；

Wherein,representing the true value of the ith sample, < +.>Representing the predicted value of the ith sample, the output size represents the number of categories of output tags, +. >The j-th feature representing the i-th sample,/>And the weight coefficient corresponding to the j-th feature is represented. For example, assume that output size=2, the true value y_true= [0,1]Predicted value y_pred= [0.4,0.6 ]]Loss= - (0×log (0.4) +1×log (0.6))= 0.5108256.

5) An early stop (early stop) mechanism is added, a training period p=10 is set, and the model training is stopped when the first loss is not reduced in 10 continuous training periods, so that a first classifier with completed training is obtained, and the problem of over-fitting is prevented.

Step 617: training a second pre-training model based on the second sample data to generate a second classifier.

In some embodiments, the second pre-training model is a pre-training model with data classification capabilities.

In some embodiments, training the second pre-training model based on the second sample data with the second loss not decreasing as a training end target for none of the q training periods, generating a second classifier; the second loss is an error between a real value corresponding to the second sample data and a predicted value, and q is a positive integer.

The specific implementation principle is the same as that of step 615, and will not be described here again.

Details of the specific implementation refer to step 630 of the embodiment of fig. 6, and are not described herein.

In summary, the method provided in this embodiment trains the first pre-training model to generate the first classifier, trains the second pre-training model to generate the second classifier, so that the first classifier and the second classifier are more suitable for text classification tasks, and accuracy of prediction is improved.

Fig. 9 shows a flowchart of a method for using a text classification model according to another exemplary embodiment of the present application, the method being performed by a classification device, the method comprising at least one of the following steps.

Step 910: input text is obtained.

In some embodiments, the data of the input text (abbreviated as input data) includes offline data, which is data stored in a memory, and online data, which is data generated by a specified website or application over a period of time, such as the online data corresponding to all comments in the last 5 minutes of website a.

Step 922: third result data is obtained by the first classifier based on the input text.

In some embodiments, the first classifier outputs third result data to the integrated classifier based on the input text.

Step 924: fourth result data is obtained by the second classifier based on the input text.

In some embodiments, the second classifier outputs fourth result data to the integrated classifier based on the input text.

Step 930: based on the third result data and the fourth result data, outputting classification results through n decision trees in the integrated classifier.

In some embodiments, based on the third result data and the fourth result data, outputting classification results through n decision trees in the integrated classifier; the sum of the values output by the n decision trees is used for representing the classification result, and n is a positive integer greater than 1.

Illustratively, for the input text of "this bad person is regrettably suffering from this bad person", the third result data includes a third predictive value of 0.3, i.e., a negative label; the fourth result data comprises a fourth predicted value of 0.8, namely a front label; the integrated classifier outputs a classification result of 0.7 according to the third result data and the fourth result data, namely a positive label, and the classification result is used for expressing that the text expresses positive emotion.

In summary, in the method provided in this embodiment, based on the third result data and the fourth result data, the classification result is output through n decision trees in the integrated classifier, and compared with the conventional text classification model, the integrated classifier is used for prediction, the first classifier is not adjusted, and the text classification model is more stable. And the type of the input text can be accurately predicted in the face of different input texts.

In some embodiments, whether to turn on the second classifier is determined according to the first concentration, and fig. 10 shows a flowchart of a method for using a text classification model according to another exemplary embodiment of the present application, where the method is performed by a classification device, and the method includes at least one of the following steps.

Step 910: input text is obtained.

Details of the specific implementation refer to step 910 of the embodiment of fig. 9, which is not described herein.

Step 915: under the condition that the first concentration is smaller than a first threshold value, the second classifier is not started; and opening the second classifier under the condition that the first concentration is greater than or equal to a first threshold value.

In some embodiments, the first concentration is a ratio of an amount of data of the first data to an amount of data of the input text, the first data being data corresponding to the second text with the keyword.

Illustratively, the first threshold is 10%, and the event classifier is turned on when the first concentration is 20% greater than 10%; when the first concentration is 1% less than 10%, the event classifier is not turned on.

In some embodiments, the default second classifier is an unopened state, and the second classifier is maintained in an unopened state when the first concentration is less than the first threshold.

In some embodiments, after the second classifier is turned on, the second classifier is turned off when the first concentration is less than a first threshold.

Details of the specific implementation refer to step 922 of the embodiment of fig. 9, and are not described herein.

Step 932: and under the condition that the second classifier is not started, outputting classification results through n decision trees in the integrated classifier based on the third result data.

Illustratively, for the "regrettably" piece of input text, the third result data includes a third predicted value of 0.3, i.e., a negative label; the integrated classifier based on the third result data, the final output classification result is 0.3, namely a negative label, which is used for indicating that the text expresses negative emotion.

In some embodiments, steps 924 and 930 in the embodiment of fig. 9 are implemented when the first concentration is greater than or equal to the first threshold, i.e., the second classifier is turned on, which is not described herein.

In summary, the method provided in this embodiment determines whether to turn on the second classifier according to the first concentration, and turns on the second classifier when the first concentration is greater than or equal to the first threshold, so that the prediction of the type of the input text is more accurate; and under the condition that the first concentration is smaller than a first threshold value, the second classifier is not started, so that the second classifier does not influence the first classifier, and the type of the input text is predicted according to the predicted value of the first classifier.

In the above embodiment, the steps having the same sequence number may be regarded as the same step. The embodiment corresponding to fig. 6, the embodiment corresponding to fig. 7, the embodiment corresponding to fig. 8, the embodiment corresponding to fig. 9, and the embodiment corresponding to fig. 10 may be implemented alone or in combination, and the present application is not limited thereto.

FIG. 11 illustrates a block diagram of a training apparatus for text classification models provided in an exemplary embodiment of the present application, which may be implemented as a computer device, or as part of a computer device, by software or hardware, or a combination of both, the apparatus comprising:

an obtaining module 1110, configured to obtain first sample data and second sample data, where the first sample data is used to represent a first text without keywords, the second sample data is used to represent a second text with keywords, and the keywords are characters that affect a classification result of a target character string, and the first text and the second text each include the target character string;

a prediction module 1120, configured to obtain first result data through a first classifier based on the first sample data, and obtain second result data through a second classifier based on the second sample data, where the first classifier is used for predicting a type of the first text, and the second classifier is used for predicting a type of the second text;

A construction module 1130, configured to construct n decision trees based on the first result data and the second result data to form an integrated classifier, where the integrated classifier is used to predict a type of at least one of the first text and the second text, and n is a positive integer greater than 1.

In one possible design of this embodiment, a construction module 1130 is configured to construct n decision trees with the same structure based on the first result data and the second result data to form an integrated classifier;

the method comprises the steps that residuals of predicted values and true values of all child nodes of an ith decision tree in n decision trees are required to be input into an (i+1) th decision tree, n is a preset value, i is a positive integer with a starting value of 1 and smaller than n, first result data comprise a first feature vector and a first predicted value of first sample data, second result data comprise a second feature vector and a second predicted value of second sample data, the sum of numerical values output by the n decision trees is used for representing the type of input text, and the input text comprises at least one of the first text and the second text.

In one possible design of this embodiment, a construction module 1130 is configured to construct an ith decision tree based on the first result data and the second result data;

Calculating the residual error of the predicted value and the true value of each child node of the ith decision tree;

constructing an i+1th decision tree based on the first result data, the second result data and the residual error, wherein the construction step of the i+1th decision tree is the same as the construction step of the i decision tree;

repeating the steps until n decision trees are constructed to form an integrated classifier.

In one possible design of this embodiment, a construction module 1130 is configured to calculate candidate division points corresponding to the first sample data and the second sample data respectively using a division point algorithm;

node splitting is carried out on the lowest layer node of the ith decision tree based on the candidate partition points;

and under the condition that the number of layers of splitting of the ith decision tree reaches a preset number of layers, constructing to obtain the ith decision tree.

In one possible design of this embodiment, the training module 1140 is configured to train a first pre-training model based on the first sample data to generate a first classifier, and train a second pre-training model based on the second sample data to generate a second classifier, where the first pre-training model and the second pre-training model are pre-training models with data classification capabilities.

In one possible design of this embodiment, the training module 1140 is configured to train the first pre-training model to generate a first classifier based on the first sample data with the first loss not decreasing as a training end target in p training periods;

The first loss is an error between a real value corresponding to the first sample data and a predicted value, and p is a positive integer.

In one possible design of this embodiment, the training module 1140 is configured to train the second pre-training model to generate a second classifier based on the second sample data with the second loss not decreasing as a training end target in q training periods;

the second loss is an error between a real value corresponding to the second sample data and a predicted value, and q is a positive integer.

In this embodiment, the acquisition module 1110 may be split into a plurality of acquisition modules, such as a first acquisition module, a second acquisition module, and a third acquisition module. The first acquisition module is used for acquiring first sample data, the second acquisition module is used for acquiring second sample data, and the third acquisition module is used for acquiring a feature set; or the first acquisition module is used for acquiring second sample data, the second acquisition module is used for acquiring a feature set, and the third acquisition module is used for acquiring the first sample data; or the first acquiring module is used for acquiring the feature set, the second acquiring module is used for acquiring the first sample data, and the third acquiring module is used for acquiring the second sample data, and the functions of the different acquiring modules are not limited in this embodiment.

The embodiment is illustrated with one acquisition module 1110, and the number of acquisition modules 1110 is not limited.

The functional description of the acquisition module 1110 may refer to the content of step 610 in the embodiment of fig. 6.

The functional description of the prediction module 1120 may refer to the contents of steps 622 and 624 in the embodiment of fig. 6.

The functional description of building block 1130 may refer to the contents of step 630 in the embodiment of fig. 6 and step 632 in the embodiment of fig. 7.

The functional description of the training module 1140 may refer to the contents of steps 615 and 617 in the embodiment of fig. 8.

FIG. 12 illustrates a block diagram of an apparatus for using a text classification model provided in an exemplary embodiment of the present application, which may be implemented as a computer device, or as part of a computer device, by software or hardware, or a combination of both, the apparatus comprising:

an acquisition module 1210 for acquiring an input text;

a prediction module 1220 configured to obtain third result data through the first classifier based on the input text, and obtain fourth result data through the second classifier based on the input text;

the output module 1230 is configured to output a classification result through n decision trees in the integrated classifier based on the third result data and the fourth result data, where n is a positive integer greater than 1.

In one possible design of this embodiment, the output module 1230 is configured to output the classification result through n decision trees in the integrated classifier based on the third result data and the fourth result data;

wherein the sum of the values output by the n decision trees is used for representing the classification result.

In one possible design of the present embodiment, the switch module 1240 is configured to not turn on the second classifier if the first concentration is less than the first threshold; opening the second classifier when the first concentration is greater than or equal to a first threshold;

the first density is the proportion of the data volume of the first data to the data volume of the input text, and the first data is the data corresponding to the second text with the keywords.

In one possible design of this embodiment, the third result data includes a third feature vector for representing the input text, and a third predicted value of the tag corresponding to the input text; the fourth result data includes a third feature vector representing the input text and a fourth predicted value of a label corresponding to the input text.

The functional description of the acquisition module 1210 may refer to the contents of step 910 in the embodiment of fig. 9.

The functional description of the prediction module 1220 may refer to the contents of step 922 and step 924 in the embodiment of fig. 9.

The functional description of the output module 1230 may refer to the contents of step 930 in the embodiment of fig. 9 and step 932 in the embodiment of fig. 10.

The function of the switch module 1240 may be described with reference to the contents of step 915 in the embodiment of fig. 10.

The embodiment of the application also provides a computer device, which comprises: a processor and a memory, wherein the memory stores at least one section of program; the processor is configured to execute at least one program in the memory to implement the training method of the text classification model and the usage method of the text classification model provided in the above method embodiments.

Fig. 13 shows a block diagram of a computer device 1300 provided in an exemplary embodiment of the present application. In general, the computer device 1300 includes: a processor 1301, and a memory 1302.

Processor 1301 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. Processor 1301 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). Processor 1301 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a central processor (Central Processing Unit, CPU), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, processor 1301 may include an image processor (Graphics Processing Unit, GPU) for taking care of rendering and rendering of content that the display screen is required to display. In some embodiments, the processor 1301 may also include an artificial intelligence (Artificial Intelligence, AI) processor for processing computing operations related to machine learning.

Memory 1302 may include one or more computer-readable storage media, which may be non-transitory. Memory 1302 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1302 is used to store at least one instruction for execution by processor 1301 to implement the training method of the text classification model and the method of using the text classification model provided by the method embodiments in the present application.

In some embodiments, the computer device 1300 may further optionally include: an input interface 1303 and an output interface 1304. The processor 1301, the memory 1302, the input interface 1303 and the output interface 1304 may be connected by buses or signal lines. The respective peripheral devices may be connected to the input interface 1303, the output interface 1304 through buses, signal lines, or a circuit board. Input interface 1303, output interface 1304 may be used to connect at least one input/output related peripheral device to processor 1301 and memory 1302. In some embodiments, the processor 1301, the memory 1302, and the input interface 1303, the output interface 1304 are integrated on the same chip or circuit board; in some other embodiments, any one or both of the processor 1301, the memory 1302, and the input interface 1303, output interface 1304 may be implemented on a separate chip or circuit board, which is not limited by the embodiments of the present application.

It will be appreciated by those skilled in the art that the structures shown above are not limiting of the computer device 1300, and may include more or fewer components than shown, or may combine certain components, or employ a different arrangement of components.

In an exemplary embodiment, a chip is also provided, the chip including programmable logic and/or program instructions for implementing the training method of the text classification model and the usage method of the text classification model provided in the method embodiment of the present application when the chip is run on the computer device 1300.

In an exemplary embodiment, there is also provided a computer program product including computer instructions stored in a computer-readable storage medium, the computer instructions being obtained from the computer-readable storage medium by a processor, the computer instructions being executed by the processor to implement the training method of the text classification model and the usage method of the text classification model provided by the method embodiments in the present application.

In an exemplary embodiment, there is also provided a computer readable storage medium having at least one section of program stored therein, the at least one section of program being loaded and executed by a processor to implement the training method of the text classification model and the usage method of the text classification model provided in the method embodiment of the present application.

Those of ordinary skill in the art will appreciate that all or a portion of the steps implementing the above embodiments may be implemented by hardware, or may be implemented by a program to instruct related hardware, and the program may be stored in a computer readable storage medium, where the computer readable storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the embodiments of the present application may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable storage medium. Computer-readable storage media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

The foregoing is illustrative of the present invention and is not to be construed as limiting thereof, but rather as being included within the spirit and principles of the present invention.

Claims

1. A method of training a text classification model, the text classification model comprising a first classifier, a second classifier, and an ensemble classifier, the method comprising:

acquiring first sample data and second sample data, wherein the first sample data is used for representing a first text without keywords, the second sample data is used for representing a second text with the keywords, the keywords are characters affecting the classification result of a target character string, and the first text and the second text both comprise the target character string;

acquiring first result data by the first classifier based on the first sample data, and acquiring second result data by the second classifier based on the second sample data, the first classifier being used for predicting the type of the first text, the second classifier being used for predicting the type of the second text;

based on the first result data and the second result data, constructing n decision trees to form the integrated classifier, wherein the integrated classifier is used for predicting the type of at least one of the first text and the second text, and n is a positive integer greater than 1.

2. The method of claim 1, wherein constructing n decision trees to form the integrated classifier based on the first result data and the second result data comprises:

constructing n decision trees with the same structure based on the first result data and the second result data to form the integrated classifier;

the residual errors of the predicted value and the true value of each child node of the ith decision tree in the n decision trees need to be input into the (i+1) th decision tree, n is a preset value, i is a positive integer with a starting value of 1 and less than n, the first result data comprise a first feature vector and a first predicted value of the first sample data, the second result data comprise a second feature vector and a second predicted value of the second sample data, and the sum of the numerical values output by the n decision trees is used for representing the type of input text, and the input text comprises at least one of the first text and the second text.

3. The method of claim 2, wherein constructing n decision trees of identical structure based on the first result data and the second result data to form the integrated classifier comprises:

Constructing the ith decision tree based on the first result data and the second result data;

constructing the (i+1) th decision tree based on the first result data, the second result data and the residual error, wherein the construction step of the (i+1) th decision tree is the same as the construction step of the (i) th decision tree;

repeating the steps until the n decision trees are constructed to form the integrated classifier.

4. The method of claim 2, wherein constructing the ith decision tree based on the first result data and the second result data comprises:

calculating candidate division points respectively corresponding to the first sample data and the second sample data by adopting a division point algorithm;

node splitting is carried out on the lowest layer node of the ith decision tree based on the candidate segmentation points;

5. The method according to any one of claims 1 to 4, further comprising:

Training a first pre-training model based on the first sample data, generating the first classifier, and training a second pre-training model based on the second sample data, generating the second classifier, wherein the first pre-training model and the second pre-training model are pre-training models with data classification capability.

6. The method of claim 5, wherein training a first pre-training model based on the first sample data generates the first classifier, comprising:

based on the first sample data, training the first pre-training model by taking the fact that the first loss is not reduced in p training periods as a training ending target, and generating the first classifier;

training a second pre-training model based on the second sample data to generate the second classifier, including:

based on the second sample data, training the second pre-training model by taking the fact that the second loss is not reduced in q training periods as a training ending target, and generating the second classifier;

wherein the first loss is an error between a real value and a predicted value corresponding to the first sample data, and p is a positive integer; the second loss is an error between a true value and a predicted value corresponding to the second sample data, and q is a positive integer.

7. A method of using a text classification model, the text classification model comprising a first classifier, a second classifier, and an integrated classifier, the integrated classifier comprising n decision trees, the method comprising:

acquiring an input text;

acquiring third result data through the first classifier based on the input text, and acquiring fourth result data through the second classifier based on the input text;

based on the third result data and the fourth result data, outputting classification results through the n decision trees in the integrated classifier, wherein n is a positive integer greater than 1.

8. The method of claim 7, wherein the outputting classification results through the n decision trees in the integrated classifier based on the third result data and the fourth result data comprises:

outputting the classification result through the n decision trees in the integrated classifier based on the third result data and the fourth result data;

9. The method according to claim 7 or 8, characterized in that the method further comprises:

Under the condition that the first concentration is smaller than a first threshold value, the second classifier is not started;

opening the second classifier if the first concentration is greater than or equal to the first threshold;

the first concentration is the proportion of the data quantity of first data to the data quantity of the input text, and the first data is data corresponding to a second text with keywords.

10. The method of claim 7, wherein the step of determining the position of the probe is performed,

the third result data comprises a third eigenvector used for representing the input text and a third predicted value of a label corresponding to the input text; the fourth result data includes the third feature vector for representing the input text and a fourth predicted value of a label corresponding to the input text.

11. A training apparatus for a text classification model, the text classification model comprising a first classifier, a second classifier, and an ensemble classifier, the apparatus comprising:

an acquisition module, configured to acquire first sample data and second sample data, where the first sample data is used to represent a first text without a keyword, the second sample data is used to represent a second text with the keyword, the keyword is a character affecting a classification result of a target character string, and the first text and the second text both include the target character string;

A prediction module, configured to obtain first result data through the first classifier based on the first sample data, and obtain second result data through the second classifier based on the second sample data, where the first classifier is used for predicting a type of the first text, and the second classifier is used for predicting a type of the second text;

the construction module is used for constructing n decision trees based on the first result data and the second result data to form the integrated classifier, the integrated classifier is used for predicting the type of at least one of the first text and the second text, and n is a positive integer greater than 1.

12. An apparatus for using a text classification model, wherein the text classification model comprises a first classifier, a second classifier, and an integrated classifier, the integrated classifier comprising n decision trees, the apparatus comprising:

the acquisition module is used for acquiring an input text;

And the output module is used for outputting classification results through the n decision trees in the integrated classifier based on the third result data and the fourth result data, wherein n is a positive integer greater than 1.

13. A computer device, the computer device comprising: a processor and a memory, wherein at least one section of program is stored in the memory; the processor is configured to execute the at least one program in the memory to implement the training method of the text classification model according to any one of claims 1 to 6 or the usage method of the text classification model according to any one of claims 7 to 10.

14. A computer-readable storage medium, wherein at least one program is stored in the computer-readable storage medium, and the at least one program is loaded and executed by a processor to implement the training method of the text classification model according to any one of claims 1 to 6, or the using method of the text classification model according to any one of claims 7 to 10.