CN112685539A

CN112685539A - Text classification model training method and device based on multi-task fusion

Info

Publication number: CN112685539A
Application number: CN202011622948.2A
Authority: CN
Inventors: 伍文成; 朱永强
Original assignee: Chengdu Wangan Technology Development Co ltd
Current assignee: Chengdu Wangan Technology Development Co ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-04-20
Anticipated expiration: 2040-12-31
Also published as: CN112685539B

Abstract

The application provides a text classification model training method and device based on multi-task fusion, and relates to the technical field of text classification. In the application, a first characterization vector and a second characterization vector are obtained based on two sample texts included in a sample text pair; based on the first token vector and the second token vector, a first class prediction vector and a second class prediction vector; processing a spliced characterization vector obtained based on the first characterization vector and the second characterization vector to obtain a binary similarity probability result; obtaining a prediction loss value based on the first category prediction vector, the second category prediction vector and the two-category similarity probability result; updating parameters of the target fusion model based on the predicted loss value to obtain an updated target fusion model; and constructing a text classification model based on the network parameters included in the updated target fusion model. Based on the method, the problem of poor classification effect in the existing text classification technology can be solved.

Description

Text classification model training method and device based on multi-task fusion

Technical Field

The application relates to the technical field of text classification, in particular to a text classification model training method and device based on multi-task fusion.

Background

Text classification is an important module in text processing, and the application of the text classification is very wide, such as spam filtering, news classification, part of speech tagging, emotion analysis and the like. The text classification refers to classifying and marking a text set (other entities or objects) according to a certain classification system or standard. For example, a relationship model between the document features and the document categories is found according to the labeled training document set, and then the new document is subjected to category judgment by using the learned relationship model.

The current mainstream text classification algorithm is statistical machine learning and deep learning. Statistical machine learning generally includes text preprocessing, feature extraction, text representation, and classifier training. Deep learning is different from manual feature selection of machine learning, and is more advantageous, for example, a deep learning network only needs to input original features of a text, automatically learns the features of the text and provides a classification result.

However, the inventor researches and finds that in the prior art, when the text is classified based on the deep learning network, the classification effect is poor.

Disclosure of Invention

In view of the above, an object of the present application is to provide a method and an apparatus for training a text classification model based on multi-task fusion, so as to solve the problem of poor classification effect in the existing text classification technology.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:

a text classification model training method based on multi-task fusion comprises the following steps:

obtaining a sample text pair, wherein the sample text pair comprises a forward sample text pair or a reverse sample text pair, two sample texts included in the forward sample text pair are mutually similar texts, and two sample texts included in the reverse sample text pair are mutually dissimilar sample texts;

processing two sample texts included in the sample text respectively through a vector conversion network layer, a feature extraction network layer and a pooling network layer to obtain a first characterization vector and a second characterization vector corresponding to the two sample texts respectively;

processing the first characterization vector and the second characterization vector through a first full-connection network layer respectively to obtain a corresponding first category prediction vector and a corresponding second category prediction vector;

splicing the first characterization vector and the second characterization vector through a feature splicing network layer to obtain a spliced characterization vector, and processing the spliced characterization vector through a second fully-connected network layer to obtain a corresponding binary similarity probability result;

calculating the first category prediction vector, the second category prediction vector and the second classification similarity probability result based on a preset loss function to obtain a prediction loss value;

updating parameters of a pre-constructed target fusion model based on the prediction loss value to obtain an updated target fusion model, wherein the target fusion model comprises the vector conversion network layer, the feature extraction network layer, the pooling network layer, the first fully-connected network layer, the feature splicing network layer and the second fully-connected network layer;

and constructing a text classification model based on network parameters included by the vector conversion network layer, the feature extraction network layer, the pooling network layer and the first full-connection network layer in the updated target fusion model, wherein the text classification model is used for classifying texts to be classified.

In a preferred option of the embodiment of the present application, in the text classification model training method based on multi-task fusion, the step of obtaining a sample text pair includes:

obtaining a target sample text, and determining a similar text of the target sample text based on a first preset rule or determining a non-similar text of the target sample text based on a second preset rule;

forming a sample text pair based on the target sample text and the similar text or based on the target sample text and the non-similar text.

In a preferred option of the embodiment of the present application, in the text classification model training method based on multi-task fusion, the step of determining the similar text of the target sample text based on the first preset rule or determining the dissimilar text of the target sample text based on the second preset rule includes:

calculating the weight of each word in the target sample text to obtain a first weight value;

determining at least one keyword in the target sample text based on the first weight value, wherein the first weight value of the keyword meets a first preset condition;

and in the target sample text, performing near word replacement on the at least one key word to obtain a similar text of the target sample text.

determining the weight of each statement in the target sample text based on the first weight value to obtain a second weight value;

determining at least one key sentence in the target sample text based on the second weight value, wherein the second weight value of the key sentence meets a second preset condition;

and deleting at least one other sentence except the at least one key sentence in the target sample text to obtain a similar text of the target sample text.

determining the sequence of each sentence in the target sample text;

and in the target sample text, replacing the sequence of at least one sentence to obtain a similar text of the target sample text.

determining category information and at least one key word of the target sample text, wherein the key word is determined based on the weight of each word in the target sample text;

determining, in a sample text database, a plurality of other sample texts having the same category as the target sample text based on the category information;

and in the determined plurality of other sample texts, taking the other sample texts with at least one key word as similar texts of the target sample text.

determining, in a sample text database, a plurality of other sample texts having different classes from the target sample text based on the class information;

and in the determined plurality of other sample texts, taking other sample texts without at least one key word as non-similar texts of the target sample text.

In a preferred selection of the embodiment of the present application, in the text classification model training method based on multi-task fusion, the step of calculating the first class prediction vector, the second class prediction vector, and the second classification similarity probability result based on a preset loss function to obtain a prediction loss value includes:

respectively calculating the first category prediction vector, the second category prediction vector and the second classification similarity probability result based on a cross entropy loss function to obtain a corresponding first prediction error value, a corresponding second prediction error value and a corresponding similarity probability prediction error value;

performing weighted summation calculation on the first prediction error value and the second prediction error value based on a preset first weight coefficient and a preset second weight coefficient to obtain a category prediction error value;

and based on a preset first gradient smoothing coefficient and a preset second gradient smoothing coefficient, carrying out weighted summation calculation on the class prediction error and the similarity probability prediction error value to obtain a prediction loss value.

In a preferred option of the embodiment of the present application, in the text classification model training method based on multi-task fusion, before the step of updating the parameters of the pre-constructed target fusion model based on the predicted loss values is performed, the method further includes a step of determining the first gradient smoothing coefficient and the second gradient smoothing coefficient, where the step includes:

constructing a text classification network model comprising a vector conversion network, a feature extraction network, a pooling network and a full-connection network;

constructing a text similarity classification network model comprising a vector conversion network, a feature extraction network, a pooling network, a feature splicing network and a full-connection network;

updating parameters of the text classification network model based on the obtained multiple first training texts to obtain a converged first loss value;

updating parameters of the text similarity classification network model based on the obtained second training texts to obtain a converged second loss value;

determining the first gradient smoothing coefficient and the second gradient smoothing coefficient based on the first penalty value and the second penalty value.

The embodiment of the present application further provides a text classification model training device based on multitask fusion, including:

the text obtaining module is used for obtaining a sample text pair, wherein the sample text pair comprises a forward sample text pair or a reverse sample text pair, two sample texts included in the forward sample text pair are similar texts, and two sample texts included in the reverse sample text pair are non-similar sample texts;

the text conversion module is used for processing the two sample texts included in the sample text pair respectively through the vector conversion network layer, the feature extraction network layer and the pooling network layer in sequence to obtain a first characterization vector and a second characterization vector corresponding to the two sample texts respectively;

the vector prediction module is used for processing the first characterization vector and the second characterization vector through a first fully-connected network layer respectively to obtain a corresponding first class prediction vector and a corresponding second class prediction vector;

the probability prediction module is used for splicing the first characterization vector and the second characterization vector through a feature splicing network layer to obtain a spliced characterization vector, and processing the spliced characterization vector through a second fully-connected network layer to obtain a corresponding two-classification similarity probability result;

the loss value calculation module is used for calculating the first category prediction vector, the second category prediction vector and the second classification similarity probability result based on a preset loss function to obtain a prediction loss value;

the model updating module is used for updating parameters of a pre-constructed target fusion model based on the prediction loss value to obtain an updated target fusion model, wherein the target fusion model comprises the vector conversion network layer, the feature extraction network layer, the pooling network layer, the first fully-connected network layer, the feature splicing network layer and the second fully-connected network layer;

and the model building module is used for building a text classification model based on network parameters included by the vector conversion network layer, the feature extraction network layer, the pooling network layer and the first full-connection network layer in the updated target fusion model, wherein the text classification model is used for classifying texts to be classified.

The method and the device for training the text classification model based on the multi-task fusion respectively process two sample texts to obtain a first characterization vector and a second characterization vector, on one hand, the first characterization vector and the second characterization vector can be respectively processed to obtain a first class prediction vector and a second class prediction vector, on the other hand, a spliced characterization vector obtained by splicing the first characterization vector and the second characterization vector can be processed to obtain a classification similarity probability result, then, the first class prediction vector, the second class prediction vector and the two-classification similarity probability result can be combined to obtain a prediction loss value, so that the target fusion model can be subjected to parameter updating based on the predicted loss value to obtain an updated target fusion model, and finally, and then constructing a text classification model based on the network parameters included in the updated target fusion model. Based on this, because the network parameters of the constructed text classification model are formed by training (updating parameters) prediction loss values based on the first class prediction vector, the second class prediction vector and the binary classification similarity probability result, that is, the similarity among texts is considered in the training process, and generally, the classes corresponding to the similar texts also have the similarity, the constructed text classification model has better anti-interference performance, so that the accuracy of the classification result is improved, and the problem of poor classification effect in the existing text classification technology is further improved.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

Fig. 1 is a block diagram of an electronic device according to an embodiment of the present disclosure.

Fig. 2 is a schematic flowchart of a text classification model training method based on multi-task fusion according to an embodiment of the present application.

Fig. 3 is a flowchart illustrating sub-steps included in step S110 in fig. 2.

Fig. 4 is a flowchart illustrating sub-steps included in step S150 in fig. 2.

Fig. 5 is a schematic network structure diagram of a target fusion model provided in the embodiment of the present application.

Fig. 6 is a flowchart illustrating steps of determining a gradient smoothing coefficient according to an embodiment of the present application.

Fig. 7 is a block diagram illustrating a text classification model training apparatus based on multi-task fusion according to an embodiment of the present application.

Icon: 10-an electronic device; 12-a memory; 14-a processor; 100-a text classification model training device based on multi-task fusion; 110-a text obtaining module; 120-a text conversion module; 130-a vector prediction module; 140-a probability prediction module; 150-loss value calculation module; 160-model update module; 170-model building module.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

As shown in fig. 1, an electronic device according to an embodiment of the present application may include a memory 12, a processor 14, and a text classification model training apparatus 100 based on multi-task fusion.

Wherein the memory 12 and the processor 14 are electrically connected directly or indirectly to realize data transmission or interaction. For example, they may be electrically connected to each other via one or more communication buses or signal lines. The text classification model training device 100 based on multitask fusion comprises at least one software functional module which can be stored in the memory 12 in the form of software or firmware (firmware). The processor 14 is configured to execute executable computer programs stored in the memory 12, such as software functional modules and computer programs included in the text classification model training apparatus 100 based on multi-task fusion, so as to implement the text classification model training method based on multi-task fusion provided by the embodiment of the present application.

Alternatively, the Memory 12 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.

The Processor 14 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), a System on Chip (SoC), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.

It will be appreciated that the configuration shown in fig. 1 is merely illustrative, and that the electronic device may further include more or fewer components than those shown in fig. 1, or have a different configuration than that shown in fig. 1, for example, and may further include a communication unit for information interaction with other devices (e.g., a background server).

With reference to fig. 2, an embodiment of the present application further provides a text classification model training method based on multi-task fusion, which is applicable to the electronic device. Wherein, the method steps defined by the flow related to the text classification model training method based on multi-task fusion can be realized by the electronic equipment. The specific process shown in FIG. 2 will be described in detail below.

Step S110, a sample text pair is obtained.

In this embodiment, the electronic device may obtain sample text pairs.

The sample text pairs comprise forward sample text pairs or reverse sample text pairs, the two sample texts of the forward sample text pairs are similar texts, and the two sample texts of the reverse sample text pairs are non-similar sample texts.

And step S120, processing the two sample texts included in the sample text pair respectively through the vector conversion network layer, the feature extraction network layer and the pooling network layer in sequence to obtain a first characterization vector and a second characterization vector corresponding to the two sample texts respectively.

In this embodiment, after obtaining the sample text pair based on step S110, the electronic device may sequentially process one sample text in the sample text pair through the vector conversion network layer, the feature extraction network layer, and the pooling network layer, and sequentially process the other sample text in the sample text pair through the vector conversion network layer, the feature extraction network layer, and the pooling network layer, so that a first token vector corresponding to the one sample text and a second token vector corresponding to the other sample text may be obtained respectively.

Step S130, the first token vector and the second token vector are processed through a first full-connection network layer, respectively, to obtain a corresponding first class prediction vector and a corresponding second class prediction vector.

In this embodiment, after obtaining the first token vector and the second token vector based on step S120, the electronic device may process the first token vector through a first fully-connected network layer and process the second token vector through the first fully-connected network layer, so that a first class prediction vector corresponding to the first token vector and a second class prediction vector corresponding to the second token vector may be obtained.

And step S140, splicing the first characterization vector and the second characterization vector through a feature splicing network layer to obtain a spliced characterization vector, and processing the spliced characterization vector through a second fully-connected network layer to obtain a corresponding binary similarity probability result.

In this embodiment, after obtaining the first token vector and the second token vector based on step S120, the electronic device may first splice the first token vector and the second token vector through a feature splicing network layer, and then may process the spliced token vector obtained through splicing through a second full-connection network layer, so that a result of the two classification similarity probabilities (i.e., the similarity probabilities of the two sample texts) corresponding to the spliced token vector may be obtained.

And S150, calculating the first category prediction vector, the second category prediction vector and the second classification similarity probability result based on a preset loss function to obtain a prediction loss value.

In this embodiment, after obtaining the first class prediction vector and the second class prediction vector based on step S130 and obtaining the second class similarity probability result based on step S140, the electronic device may calculate the first class prediction vector, the second class prediction vector and the second class similarity probability result based on a preset loss function, so that a prediction loss value may be obtained.

And step S160, updating parameters of a pre-constructed target fusion model based on the predicted loss value to obtain an updated target fusion model.

In this embodiment, after obtaining the predicted loss value based on step S150, the electronic device may perform parameter update on a pre-constructed target fusion model based on the predicted loss value, so that an updated target fusion model may be obtained.

Wherein the target fusion model comprises the vector conversion network layer, the feature extraction network layer, the pooling network layer, the first fully-connected network layer, the feature splicing network layer, and the second fully-connected network layer. That is, the target fusion model may be trained based on the sample text, resulting in an updated target fusion model.

Step S170, constructing a text classification model based on network parameters included by the vector conversion network layer, the feature extraction network layer, the pooling network layer and the first full-connection network layer in the updated target fusion model.

In this embodiment, after obtaining the updated target fusion model based on step S160, the electronic device may construct a text classification model based on network parameters included in the vector conversion network layer, the feature extraction network layer, the pooling network layer, and the first fully-connected network layer in the updated target fusion model.

That is to say, the constructed text classification model may include a vector conversion network, a feature extraction network, a pooling network, and a full-connection network, which correspond to the vector conversion network layer, the feature extraction network layer, the pooling network layer, and the first full-connection network layer, respectively, so that the text classification model may be used to classify the text to be classified to obtain a classification result.

Based on the method, because the network parameters of the constructed text classification model are formed by obtaining the prediction loss value training (parameter updating) based on the first class prediction vector, the second class prediction vector and the binary classification similarity probability result, namely, the similarity among the texts is considered in the training process, and generally, the classes corresponding to the similar texts also have the similarity, the constructed text classification model has better anti-interference performance, so that the accuracy of the classification result is improved, and the problem of poorer classification effect in the existing text classification technology is further improved.

In the first aspect, it should be noted that, in step S110, a specific manner of obtaining the sample text pair is not limited, and may be selected according to actual application requirements.

For example, in an alternative example, a sample text pair that has been formed by combining two sample texts may be obtained directly, and thus, the efficiency of model training may be improved.

For another example, in another alternative example, in order to adapt to different training requirements, in conjunction with fig. 3, step S110 may include step S111 and step S112, which are described in detail below.

Step S111, obtaining a target sample text, and determining a similar text of the target sample text based on a first preset rule or determining a non-similar text of the target sample text based on a second preset rule.

In this embodiment, when the sample text pair needs to be obtained, a target sample text may be obtained first, and then a similar text of the target sample text is determined based on a first preset rule, or a non-similar text of the target sample text may be determined based on a second preset rule.

Step S112, forming a sample text pair based on the target sample text and the similar text or based on the target sample text and the dissimilar text.

In this embodiment, after obtaining the similar text or the non-similar text based on step S111, a sample text pair may be formed based on the similar text and the target sample text, and at this time, the sample text pair is a forward sample text pair; alternatively, a sample text pair may be formed based on the non-similar text and the target sample text, in which case the sample text pair is an inverted sample text pair.

Optionally, in the above example, the specific manner of determining the similar text based on step S111 is not limited, and may be selected according to the actual application requirement.

In a first alternative example, similar text may be determined based on the following steps:

firstly, calculating the weight of each word in the target sample text to obtain a first weight value; secondly, determining at least one keyword in the target sample text based on the first weight value, wherein the first weight value of the keyword meets a first preset condition; then, in the target sample text, near word replacement is performed on the at least one key word to obtain a similar text of the target sample text.

It is to be understood that in the above example, the specific manner of calculating the weight of the word is not limited, and for example, the calculation may be based on tf-idf (term frequency-inverse text frequency index) algorithm.

In the above example, the specific content of the first preset condition is not limited, for example, it may be determined whether a first weight value of each word is greater than a preset value, and a word corresponding to each first weight value that is greater than the preset value is used as a keyword; alternatively, a target number of words with the largest first weight value may be determined, and the target number of words may be used as the keyword. And, the preset value and the specific value of the target number may be generated based on a configuration operation of a user.

In the above example, the specific manner of performing the near-sense word replacement on the keyword is not limited, for example, the near-sense word of each keyword may be determined by the near-sense word forest, and then the keyword is replaced with the corresponding near-sense word.

In the above example, in order to effectively determine each word to calculate the corresponding weight, the target sample text may be preprocessed, for example, the target sample text may be subjected to word segmentation processing, word de-stop processing, and the like.

In a second alternative example, similar text may be determined based on the following steps:

firstly, calculating the weight of each word in the target sample text to obtain a first weight value; secondly, determining the weight of each statement in the target sample text based on the first weight value to obtain a second weight value; then, determining at least one key sentence in the target sample text based on the second weight value, wherein the second weight value of the key sentence meets a second preset condition; and finally, deleting at least one other sentence except the at least one key sentence in the target sample text to obtain a similar text of the target sample text.

It is to be understood that, in the above example, the specific manner of determining the weight of each sentence is not limited, for example, an average value of the first weight values of each word or each keyword included in one sentence may be calculated, and the average value may be used as the second weight value of the sentence. The specific way of calculating the weights of the words may refer to the related description above.

In the above example, the specific content of the second preset condition is not limited, for example, it may be determined whether the second weight value of each sentence is greater than a preset threshold, and the sentence corresponding to each second weight value that is greater than the preset threshold is determined as a key sentence; alternatively, a predetermined number of sentences having the largest second weight value may be determined, and the predetermined number of sentences may be determined as the key sentences.

In the above example, the specific manner of the at least one other sentence is not limited, and for example, one or more other sentences may be randomly deleted in the target sample text.

In a third alternative example, similar text may be determined based on the following steps:

firstly, determining the sequence of each sentence in the target sample text; and secondly, in the target sample text, replacing the sequence of at least one sentence to obtain a similar text of the target sample text.

That is, in the above example, similar texts can be obtained by disordering the order of sentences in the target sample text (e.g., randomly or according to a certain regular order).

In a fourth alternative example, similar text may be determined based on the following steps:

firstly, determining category information and at least one key word of the target sample text, wherein the key word is determined based on the weight of each word in the target sample text; secondly, determining a plurality of other sample texts with the same category as the target sample text based on the category information in a sample text database; then, among the determined plurality of other sample texts, the other sample texts having at least one of the key words are taken as similar texts of the target sample text.

It is to be understood that, in the above example, the category information of the target sample text may be formed by performing classification mark on the target sample text in advance, such as a official text, a non-official text, an emotion text, a news text, a finance text, a sports text, and the like. And, the specific manner of determining the key words may refer to the related description above.

In the above example, the sample text database may be a database formed based on a plurality of sample texts obtained in advance and a classification label performed on each sample text, and the target sample text may also be obtained in the sample text database.

In the above example, the specific number of the determined at least one keyword is not limited, for example, in a specific application example, when the number of the keywords included in the target sample text is greater than or equal to 20, 20 keywords may be obtained; when the target sample text includes less than 20 number of key words, all key words may be obtained. Based on this, other sample texts having 5 or more than 5 key words may be taken as the similar texts when determining the similar texts, so that the similar texts and the target sample texts have the same number of 5 or more key words therebetween.

Optionally, in the above example, the specific manner of determining the non-similar text based on step S111 is not limited, and may be selected according to actual application requirements.

For example, in one alternative example, non-similar text may be determined based on the following steps:

firstly, determining category information and at least one key word of a target sample text, wherein the key word is determined based on the weight of each word in the target sample text; secondly, determining a plurality of other sample texts with different classes from the target sample text based on the class information in a sample text database; then, among the determined plurality of other sample texts, the other sample texts not having at least one of the key words are used as non-similar texts of the target sample text.

It is to be understood that, in the above example, the determination of the category information and the key words of the target sample text, and the determination of other sample texts of different categories, may refer to the related description above.

In the above example, the specific manner of determining the non-similar text based on the keyword is not limited, for example, in a specific application example, in combination with the foregoing example, the number of the same keyword between the non-similar text and the target sample text may be less than 5.

On the basis of the above example, it should be further explained for step S110 that the number of the obtained sample text pairs is not limited, and can be selected according to the actual application requirements.

For example, in an alternative example, the higher the classification accuracy requirement for the constructed model, the greater the number of sample text pairs that can be obtained, making the training basis more adequate.

Also, in forming different sample text pairs, they may be formed in different ways in the above-described examples, or may be formed in the same way in the above-described examples.

In the second aspect, it should be noted that, for step S120, the vector conversion layer may be an embedding layer, which is used to convert text into word vectors. The feature extraction layer may be three CNN (convolutional layer) network layers, and is configured to perform feature extraction on the word vectors to obtain the feature vectors. The pooling layer may be a POOL layer, and is configured to down-sample the obtained token vectors to obtain token vectors with reduced data size (such as the first token vector and the second token vector described above), and may not affect the classification result while reducing the data size.

In the third aspect, it should be noted that, in step S130, the first fully-connected network layer may refer to a sense layer, and is configured to classify features in feature vectors (such as the first token vector and the second token vector described above) to obtain class prediction vectors (such as the first class prediction vector and the second class prediction vector described above).

The specific configuration of the first fully-connected network layer is not limited, and for example, the first fully-connected network layer may include one fully-connected network layer, or may include multiple fully-connected network layers. In this embodiment, in order to make the first class of prediction vectors and the second class of prediction vectors have higher accuracy, the first fully-connected network layer may include two fully-connected network layers.

Based on this, the first characterization vector may sequentially pass through the two fully-connected network layers to obtain a corresponding first class prediction vector; the second characterization vector may sequentially pass through the two fully-connected network layers to obtain a corresponding second class prediction vector.

In a fourth aspect, it is to be noted that, in step S140, the feature concatenation layer may be a Concat layer, and is configured to concatenate a plurality of feature vectors (such as the first token vector and the second token vector described above) to obtain a corresponding concatenation vector (such as the concatenation token vector described above). The second fully-connected network layer may be a sense layer, and is configured to classify features in the feature vectors, for example, classify two feature vectors in the concatenated feature vectors, so as to determine a classification result of two corresponding sample texts, where the classification result is also a category prediction vector, and may also be referred to as a two-classification similarity probability result (i.e., a similarity probability of two sample texts) because the classification result belongs to two classifications.

The specific configuration of the second fully connected network layer is not limited, and for example, the second fully connected network layer may include one fully connected network layer, or may include multiple fully connected network layers. In this embodiment, to make the model training more efficient, the second fully-connected network layer may include one fully-connected network layer.

In the fifth aspect, it should be noted that, in step S150, a specific manner of calculating the predicted loss value is not limited, that is, specific content of the preset loss function is not limited.

For example, in an alternative example, in order to avoid smoothing the gradient magnitude of the two branch networks of the first fully-connected network layer and the second fully-connected network layer, and prevent the obtained model from being biased to one branch (task) due to different gradient orders, so as to ensure that the learning task of one branch converges to the vicinity of the optimal point, the learning task of the other branch can continue learning, and thus the learning results of both branches are optimal, in this embodiment, with reference to fig. 4, step S150 may include step S151, step S152, and step S153, which is described below in detail.

Step S151, calculating the first category prediction vector, the second category prediction vector, and the second category similarity probability result based on a cross entropy loss function, to obtain a corresponding first prediction error value, a corresponding second prediction error value, and a corresponding similarity probability prediction error value.

In this embodiment, after obtaining the first class prediction vector and the second class prediction vector based on step S130 and obtaining the second class similarity probability result based on step S140, the first class prediction vector may be calculated based on a cross entropy loss function to obtain a corresponding first prediction error value (i.e. an error between a prediction class and a true class of a first sample text in two sample texts); calculating the second type prediction vector based on the cross entropy loss function to obtain a corresponding second prediction error value (namely, an error between the prediction type and the real type of the first sample text in the two sample texts); the two-class similarity probability results can be calculated based on the cross entropy loss function to obtain corresponding similarity probability prediction error values (i.e., errors between the predicted similarity probability and the true similarity probability of the two sample texts).

Step S152, performing weighted summation calculation on the first prediction error value and the second prediction error value based on a preset first weight coefficient and a preset second weight coefficient to obtain a category prediction error value.

In this embodiment, after the first prediction error value and the second prediction error value are obtained based on step S151, a weighted summation calculation may be performed on the first prediction error value and the second prediction error value based on a preset first weight coefficient and a preset second weight coefficient, so that a corresponding category prediction error value may be obtained.

The weighting coefficient corresponding to the first prediction error value is the first weighting coefficient, and the weighting coefficient corresponding to the second prediction error value is the second weighting coefficient.

Step S153, based on the preset first gradient smoothing coefficient and the second gradient smoothing coefficient, performing weighted summation calculation on the category prediction error and the similarity probability prediction error value to obtain a prediction loss value.

In this embodiment, after obtaining the similarity probability prediction error value based on step S151 and the category prediction error value based on step S152, a weighted summation calculation may be performed on the similarity probability prediction error value and the category prediction error value based on a preset first gradient smoothing coefficient and a preset second gradient smoothing coefficient, so that a corresponding prediction loss value may be obtained.

Wherein the smoothing coefficient corresponding to the class prediction error value is the first gradient smoothing coefficient, and the smoothing coefficient corresponding to the similarity probability prediction error value is the second gradient smoothing coefficient.

Optionally, in the above example, specific values of the first weight coefficient and the second weight coefficient are not limited, and may be selected according to actual application requirements.

For example, in an alternative example, the first weight coefficient and the second weight coefficient may be the same, such as may both be 0.5, considering that the sample text has the same training importance for the two sample texts that are included.

In the sixth aspect, it should be noted that, in step S160, a specific configuration of the target fusion model is not limited, and may be selected according to actual application requirements.

For example, in combination with the foregoing example, in a specific application example, in combination with fig. 5, the target fusion model may include a vector conversion network layer, a three-layer convolution network layer, a pooling network layer, a first fully-connected network layer composed of two fully-connected network layers, a feature splicing network layer, and a second fully-connected network layer composed of one fully-connected layer.

In the seventh aspect, it should be noted that, in step S170, a specific manner for constructing the text classification model is not limited, and may be selected according to actual application requirements.

For example, in an alternative example, a model including a vector conversion network, a feature extraction network (three-layer convolutional network), a pooling network, and two fully-connected networks may be constructed, and then network parameters included in the vector conversion network layer, the feature extraction network layer, the pooling network layer, and the first fully-connected network layer in the updated target fusion model are input into a network corresponding to the model, so that the text classification model may be obtained.

On the basis of the above example, in order to ensure that the determined first gradient smoothing coefficient and the second gradient smoothing coefficient have higher reliability, the text classification model training method based on multi-task fusion may further include a step of determining the first gradient smoothing coefficient and the second gradient smoothing coefficient, which includes step S181, step S182, step S183, step S184, and step S185, in conjunction with fig. 6, and the details are as follows.

And step S181, constructing a text classification network model including a vector conversion network, a feature extraction network, a pooling network and a full-connection network.

In this embodiment, when the first gradient smoothing coefficient needs to be determined, a text classification network model including a vector transformation network, a feature extraction network, a pooling network, and a full-connection network may be constructed first. The vector conversion network, the feature extraction network and the pooling network may refer to the related descriptions of the vector conversion network, the feature extraction network layer and the pooling network layer, and the fully-connected network may refer to the related description of the first fully-connected network layer.

And step S182, constructing text similarity classification network models comprising a vector conversion network, a feature extraction network, a pooling network, a feature splicing network and a full-connection network.

In this embodiment, when the second gradient smoothing coefficient needs to be determined, a text similarity classification network model including a vector transformation network, a feature extraction network, a pooling network, a feature splicing network, and a full-connection network may be constructed first. The vector conversion network, the feature extraction network, the pooling network and the feature splicing network may refer to the above-mentioned description related to the vector conversion network, the feature extraction network layer, the pooling network layer and the feature splicing network layer, and the fully-connected network may refer to the above-mentioned description related to the second fully-connected network layer.

And step S183, updating parameters of the text classification network model based on the obtained multiple first training texts to obtain a converged first loss value.

In this embodiment, after the text classification network model is constructed based on step S181, parameter updating may be performed on the text classification network model based on the obtained multiple first training sample texts (for example, category prediction is sequentially performed on different first training samples, then, a predicted value and a true value are calculated to obtain a loss value, and then, parameter updating is performed on the text classification network model based on the loss value) until the loss value converges, so that a converged first loss value may be obtained.

Step S184, based on the obtained multiple second training texts, performing parameter updating on the text similarity classification network model to obtain a converged second loss value.

In this embodiment, after the text similarity classification network model is constructed based on step S182, parameter updating may be performed on the text similarity classification network model based on the obtained multiple second training sample texts (for example, text similarity prediction is sequentially performed on different second training samples, then, a predicted value and a true value are calculated to obtain a loss value, and then, parameter updating is performed on the text similarity classification network model based on the loss value) until the loss value converges, so that a converged second loss value may be obtained.

The second training sample may refer to a training sample pair, which includes two similar training texts or two dissimilar training texts, such as the aforementioned sample text pair.

Step S185 of determining the first gradient smoothing coefficient and the second gradient smoothing coefficient based on the first loss value and the second loss value.

In this embodiment, after obtaining the first loss value based on step S183 and obtaining the second loss value based on step S184, the first gradient smoothing coefficient and the second gradient smoothing coefficient may be determined based on the first loss value and the second loss value.

Alternatively, in the above example, the specific manner of determining the first gradient smoothing coefficient and the second gradient smoothing coefficient based on step S185 is not limited, and may be selected according to the actual application requirement.

For example, in an alternative example, a first gradient smoothing coefficient and a second gradient smoothing coefficient may be determined based on a ratio between the first loss value and the second loss value.

For another example, in another alternative example, the inverse of the first loss value may be used as a first gradient smoothing coefficient, and the inverse of the second loss value may be used as a second gradient smoothing coefficient.

With reference to fig. 7, an embodiment of the present application further provides a text classification model training apparatus 100 based on multi-task fusion, which is applicable to the electronic device. The device 100 for training the text classification model based on multi-task fusion may include a text obtaining module 110, a text converting module 120, a vector predicting module 130, a probability predicting module 140, a loss value calculating module 150, a model updating module 160, and a model constructing module 170.

The text obtaining module 110 may be configured to obtain a sample text pair, where the sample text pair includes a forward sample text pair or a reverse sample text pair, two sample texts included in the forward sample text pair are similar texts, and two sample texts included in the reverse sample text pair are non-similar sample texts. In this embodiment, the text obtaining module 110 may be configured to execute step S110 shown in fig. 2, and reference may be made to the foregoing description of step S110 for relevant contents of the text obtaining module 110.

The text conversion module 120 may be configured to process the two sample texts included in the sample text pair respectively through the vector conversion network layer, the feature extraction network layer, and the pooling network layer in sequence, so as to obtain a first characterization vector and a second characterization vector corresponding to the two sample texts, respectively. In this embodiment, the text conversion module 120 may be configured to perform step S120 shown in fig. 2, and reference may be made to the foregoing description of step S120 for relevant contents of the text conversion module 120.

The vector prediction module 130 may be configured to process the first token vector and the second token vector through a first fully-connected network layer, respectively, to obtain a corresponding first class prediction vector and a corresponding second class prediction vector. In this embodiment, the vector prediction module 130 may be configured to perform step S130 shown in fig. 2, and reference may be made to the description of step S130 for relevant contents of the vector prediction module 130.

The probability prediction module 140 may be configured to splice the first characterization vector and the second characterization vector through a feature splicing network layer to obtain a spliced characterization vector, and process the spliced characterization vector through a second fully-connected network layer to obtain corresponding two-class similarity probability results. In this embodiment, the probability prediction module 140 may be configured to execute step S140 shown in fig. 2, and reference may be made to the foregoing description of step S140 for relevant contents of the probability prediction module 140.

The loss value calculating module 150 may be configured to calculate the first category prediction vector, the second category prediction vector, and the second classification similarity probability result based on a preset loss function, so as to obtain a predicted loss value. In this embodiment, the loss value calculating module 150 may be configured to execute step S150 shown in fig. 2, and reference may be made to the foregoing description of step S150 for relevant contents of the loss value calculating module 150.

The model updating module 160 may be configured to perform parameter updating on a pre-constructed target fusion model based on the predicted loss value to obtain an updated target fusion model, where the target fusion model includes the vector conversion network layer, the feature extraction network layer, the pooling network layer, the first fully-connected network layer, the feature splicing network layer, and the second fully-connected network layer. In this embodiment, the model updating module 160 may be configured to execute step S160 shown in fig. 2, and reference may be made to the foregoing description of step S160 for relevant contents of the model updating module 160.

The model building module 170 may be configured to build a text classification model based on network parameters included in the vector conversion network layer, the feature extraction network layer, the pooling network layer, and the first fully-connected network layer in the updated target fusion model, where the text classification model is used to classify a text to be classified. In this embodiment, the model building module 170 may be configured to perform step S170 shown in fig. 2, and reference may be made to the description of step S170 in relation to the relevant content of the model building module 170.

In summary, the text classification model training method and device based on multi-task fusion provided by the application, respectively processing the two sample texts to obtain a first characterization vector and a second characterization vector, on one hand, the first characterization vector and the second characterization vector can be respectively processed to obtain a first class prediction vector and a second class prediction vector, on the other hand, a spliced characterization vector obtained by splicing the first characterization vector and the second characterization vector can be processed to obtain a classification similarity probability result, then, the first class prediction vector, the second class prediction vector and the two-classification similarity probability result can be combined to obtain a prediction loss value, so that the target fusion model can be subjected to parameter updating based on the predicted loss value to obtain an updated target fusion model, and finally, and then constructing a text classification model based on the network parameters included in the updated target fusion model. Based on this, because the network parameters of the constructed text classification model are formed by training (updating parameters) prediction loss values based on the first class prediction vector, the second class prediction vector and the binary classification similarity probability result, that is, the similarity among texts is considered in the training process, and generally, the classes corresponding to the similar texts also have the similarity, the constructed text classification model has better anti-interference performance, so that the accuracy of the classification result is improved, and the problem of poor classification effect in the existing text classification technology is further improved.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus and method embodiments described above are illustrative only, as the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A text classification model training method based on multi-task fusion is characterized by comprising the following steps:

2. The method for training the text classification model based on the multitask fusion according to the claim 1, wherein the step of obtaining the sample text pairs comprises the following steps:

3. The method for training the text classification model based on the multitask fusion according to claim 2, wherein the step of determining the similar text of the target sample text based on a first preset rule or determining the non-similar text of the target sample text based on a second preset rule comprises:

4. The method for training the text classification model based on the multitask fusion according to claim 2, wherein the step of determining the similar text of the target sample text based on a first preset rule or determining the non-similar text of the target sample text based on a second preset rule comprises:

5. The method for training the text classification model based on the multitask fusion according to claim 2, wherein the step of determining the similar text of the target sample text based on a first preset rule or determining the non-similar text of the target sample text based on a second preset rule comprises:

determining the sequence of each sentence in the target sample text;

6. The method for training the text classification model based on the multitask fusion according to claim 2, wherein the step of determining the similar text of the target sample text based on a first preset rule or determining the non-similar text of the target sample text based on a second preset rule comprises:

7. The method for training the text classification model based on the multitask fusion according to claim 2, wherein the step of determining the similar text of the target sample text based on a first preset rule or determining the non-similar text of the target sample text based on a second preset rule comprises:

8. The method for training the text classification model based on the multitask fusion according to any one of claims 1-7, wherein the step of calculating the first class prediction vector, the second class prediction vector and the second classification similarity probability result based on a preset loss function to obtain a prediction loss value comprises the following steps:

9. The method of claim 8, wherein before the step of updating the parameters of the pre-constructed target fusion model based on the predicted loss values, the method further comprises the step of determining the first gradient smoothing coefficient and the second gradient smoothing coefficient, the step comprising:

10. A text classification model training device based on multi-task fusion is characterized by comprising the following components: