CN117493548A

CN117493548A - Text classification method, training method and training device for model

Info

Publication number: CN117493548A
Application number: CN202310356023.5A
Authority: CN
Inventors: 白安琪; 蒋宁; 夏粉; 吴海英; 肖冰
Original assignee: Mashang Xiaofei Finance Co Ltd
Current assignee: Mashang Xiaofei Finance Co Ltd
Priority date: 2023-04-04
Filing date: 2023-04-04
Publication date: 2024-02-02

Abstract

The application discloses a text classification method, a training method of a model and a training device of the model, which are used for improving the text classification accuracy. The scheme comprises the following steps: acquiring a plurality of initial text segments and translation text segments with translation relations with the initial text segments; constructing a plurality of semantic text groups based on a plurality of initial text segments and a plurality of translation text segments, wherein any semantic text group comprises a plurality of text segments with translation relations, and the plurality of text segments belonging to the same semantic text group are expressed by a plurality of different preset languages; acquiring classification labels respectively corresponding to a plurality of semantic text groups, wherein the classification labels comprise emotion labels of semantic expressions of all text segments in the corresponding semantic text groups; and training the initial text classification model according to the labeling labels and training samples to obtain a text classification model, wherein the training samples comprise text segments in a plurality of semantic text groups, and the labeling labels are classification labels corresponding to the semantic text groups to which the training samples belong.

Description

Text classification method, training method and training device for model

Technical Field

The present disclosure relates to the field of model training, and in particular, to a text classification method, a training method for a model, and a training device for a model.

Background

At present, natural language processing (Natural Language Processing, NLP) tasks are often understood in terms of language basic semantics. Wherein the main tasks include semantic similarity, text classification, natural language reasoning and the like. Natural language reasoning is a process of reasoning about unknown information from existing text.

In speech communication, humans have specificity in the presentation of different situations. For example, the same person expresses that specificity exists under different time periods and different cognition. As another example, the same person may express specificity under the same cognitive, different concentricity. Therefore, based on the specificity of human language expression, there is no one-to-one correspondence between human features and language features, and it is difficult to classify human natural language.

How to improve the accuracy of text classification is a technical problem to be solved by the application.

Disclosure of Invention

The embodiment of the application aims to provide a text classification method, a training method of a model and a training device of the model, which are used for improving the text classification accuracy.

In a first aspect, a training method for a text classification model is provided, including:

acquiring a plurality of initial text segments and translation text segments with translation relations with the initial text segments, wherein the translation text segments are text segments obtained by translating the initial text segments into a plurality of preset languages;

Constructing a plurality of semantic text groups based on the initial text segments and the translated text segments, wherein any semantic text group comprises a plurality of text segments with translation relations, and the text segments belonging to the same semantic text group are expressed by a plurality of different preset languages;

acquiring classification labels respectively corresponding to the plurality of semantic text groups, wherein the classification labels comprise emotion labels of semantic expressions of all text segments in the corresponding semantic text groups;

training an initial text classification model according to a labeling label and a training sample to obtain the text classification model, wherein the training sample comprises each text segment in the plurality of semantic text groups, and the labeling label is a classification label corresponding to the semantic text group to which the training sample belongs.

In a second aspect, a text classification method is provided, including:

acquiring text segments to be classified;

generating a plurality of to-be-classified translation text segments expressed by a plurality of preset languages based on the to-be-classified text segments, wherein the emotion of the semantic expression of the plurality of to-be-classified translation text segments is the same as the emotion of the semantic expression of the to-be-classified text segments;

inputting the text segment to be classified and the plurality of translated text segments to be classified into a text classification model to obtain a classification prediction result of the text segment to be classified, wherein the text classification model is obtained by training an initial text classification model according to a labeling label and a training sample, the training sample comprises each text segment in a plurality of semantic text groups, the labeling label is a classification label corresponding to the semantic text group to which the training sample belongs, the plurality of semantic text groups are constructed based on a plurality of initial text segments and a plurality of translated text segments, any one semantic text group comprises a plurality of text segments with translation relations, the plurality of translated text segments are text segments obtained by translating the initial text segments into a plurality of preset languages, the plurality of text segments belonging to the same semantic text group are expressed by a plurality of different preset languages, and the classification label comprises emotion labels of semantic expression of each text segment in the corresponding semantic text group.

In a third aspect, there is provided a text classification apparatus comprising:

the acquisition module acquires text segments to be classified;

the generation module is used for generating a plurality of to-be-classified translation text segments expressed by a plurality of preset languages based on the to-be-classified text segments, wherein the emotion of the semantic expression of the plurality of to-be-classified translation text segments is the same as that of the semantic expression of the to-be-classified text segments;

the text classification model is obtained by training an initial text classification model according to a labeling label and a training sample, wherein the training sample comprises each text segment in a plurality of semantic text groups, the labeling label is a classification label corresponding to the semantic text group to which the training sample belongs, the plurality of semantic text groups are constructed based on a plurality of initial text segments and a plurality of translation text segments, any one semantic text group comprises a plurality of text segments with translation relations, the plurality of translation text segments are text segments obtained by translating the initial text segments into a plurality of preset languages, the plurality of text segments belonging to the same semantic text group are expressed by a plurality of different preset languages, and the classification label comprises a semantic expressed emotion label of each text segment in the corresponding semantic text group.

In a fourth aspect, a training device for a text classification model is provided, including:

the text obtaining module is used for obtaining a plurality of initial text segments and translation text segments with translation relations with the initial text segments, wherein the translation text segments are text segments obtained by translating the initial text segments into a plurality of preset languages;

the construction module is used for constructing a plurality of semantic text groups based on a plurality of initial text segments and a plurality of translation text segments, wherein any semantic text group comprises a plurality of text segments with translation relations, and the plurality of text segments belonging to the same semantic text group are expressed by a plurality of different preset languages;

the label acquisition module acquires classification labels respectively corresponding to the plurality of semantic text groups, wherein the classification labels comprise emotion labels of semantic expressions of all text segments in the corresponding semantic text groups;

the training module trains an initial text classification model according to a labeling label and a training sample to obtain the text classification model, wherein the training sample comprises text segments in the semantic text groups, and the labeling label is a classification label corresponding to the semantic text group to which the training sample belongs.

In a fifth aspect, there is provided an electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program implementing the steps of the method as in the first or second aspect when executed by the processor.

In a sixth aspect, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method as in the first or second aspect.

In the embodiment of the application, first, a plurality of initial text segments and a translation text segment having a translation relationship with the initial text segments are obtained, wherein the translation text segment is a text segment obtained by translating the initial text segment into a plurality of preset languages, then, a plurality of semantic text groups are constructed based on the plurality of initial text segments and the plurality of translation text segments, any one of the semantic text groups comprises a plurality of text segments having a translation relationship, and a plurality of text segments belonging to the same semantic text group are expressed in a plurality of different preset languages. The text segments expressed by the multiple preset languages can realize training on the model based on the difference among the multiple preset languages, and the text segments in different languages can have the mutual verification effect so as to avoid the condition that language characteristics are lost due to the expression limitation of the unique language. And then, respectively corresponding classification labels of the plurality of semantic text groups are obtained, wherein the classification labels comprise emotion labels of semantic expressions of all text segments in the corresponding semantic text groups. The classification labels represent emotions expressed by semantics of a plurality of text segments in the corresponding semantic text groups, and the text segments expressed in different preset languages are associated based on the emotion energy expressed by the semantics, so that the texts expressed in different languages are associated with each other. Finally, training an initial text classification model according to a labeling label and a training sample to obtain a text classification model, wherein the training sample comprises each text segment in a plurality of semantic text groups, and the labeling label is a classification label corresponding to the semantic text group to which the training sample belongs, so that the model can learn the emotion expression capability of different languages, thereby avoiding classification limitation possibly caused by a unique language training model and further improving the text classification accuracy of the model.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

fig. 1 is a schematic flow chart of a training method of a text classification model according to an embodiment of the present application.

Fig. 2 is a second flowchart of a training method of a text classification model according to an embodiment of the present application.

Fig. 3 is a third flow chart of a training method of a text classification model according to an embodiment of the present application.

Fig. 4 is a flowchart of a training method of a text classification model according to an embodiment of the present application.

Fig. 5 is a flowchart of a training method of a text classification model according to an embodiment of the present application.

Fig. 6 is a schematic flow chart of a text classification method according to an embodiment of the present application.

Fig. 7 is a second flowchart of a text classification method according to an embodiment of the present application.

Fig. 8 is a third flow chart of a text classification method according to an embodiment of the present application.

Fig. 9 is a flowchart of a text classification method according to an embodiment of the present application.

Fig. 10 is a flowchart of a text classification method according to an embodiment of the present application.

Fig. 11 is a schematic structural diagram of a text classification device according to an embodiment of the present application.

Fig. 12 is a schematic structural view of an electronic device of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application. The reference numerals in the present application are only used to distinguish the steps in the scheme, and are not used to limit the execution sequence of the steps, and the specific execution sequence controls the description in the specification.

In the field of natural language processing, there is a problem that it is difficult to accurately classify text, which limits the development and application of natural language processing in various scenes. The field of Natural language processing involves Natural language understanding (Natural Language Understanding, NLU) and Natural language generation (Natural-language generation, NLG). In the field, if the language of the user cannot be identified and classified, session understanding is difficult to realize, intelligent answer to the questions of the user is more difficult, and intelligent communication cannot be realized.

In order to solve the problems in the prior art, an embodiment of the present application provides a training method for a text classification model, as shown in fig. 1, including:

s11: and acquiring a plurality of initial text segments and translation text segments with translation relations with the initial text segments, wherein the translation text segments are text segments obtained by translating the initial text segments into a plurality of preset languages.

The initial text segment may be a text segment containing one or more sentences, and the translated text segment may be obtained by translating the initial text segment, or a translated text segment having a translation relationship with the initial text segment may be obtained from a preset database. For example, a plurality of preset languages can be set according to actual requirements, and the translated text segment can be obtained by translating the initial text segment into the preset language. In this step, a plurality of initial text segments are acquired, wherein any one of the initial text segments corresponds to at least one translated text segment, and a translation relationship exists between the translated text segment and the corresponding initial text segment. Optionally, the text segments with translation relationships have at least two commonalities: the expressed semantics are the same, and the expressed emotion colors are the same.

The types and the numbers of the preset languages can be set according to the requirements, and the preset languages comprise mandarin, chongqing, guangdong, shanghai, and the like. In addition, the plurality of different preset languages may include different language systems, different language families, different language branches, and different languages. Assuming that the initial text segment is expressed in Chinese, the initial text segment may be translated into English, henan, or other language to obtain a translated text segment. Optionally, the initial text segment is stored in association with a translated text segment having a translation relationship for retrieval in a subsequent step. The text segments with the translation relations are expressed through different preset languages, and the semantic content is consistent.

Alternatively, a language having a large difference from the language used for the initial text segment is selected as the preset language. Based on the large difference of expression modes among different languages, the text segments with translation relations expressed by the different languages can show slight differences in terms of mood, emotion, logic and the like. Therefore, the initial text segment and the translation text segment with the translation relation can express more abundant details.

In practical applications, the initial text segment may be a dialogue text of a character in a dialogue scene, or may be a paragraph in an article, for example. The text segment under the scene can be obtained according to the actual application scene, for example, if text classification is required in the dialogue scene, the dialogue is selected as the initial text segment, and then translation is performed according to the preset language to obtain the translated text segment.

S12: and constructing a plurality of semantic text groups based on the initial text segments and the translated text segments, wherein any semantic text group comprises a plurality of text segments with translation relations, and the text segments belonging to the same semantic text group are expressed by a plurality of different preset languages.

In the step, based on the text translation relation, an association relation is established between an initial text segment and a plurality of translated text segments with translation relation to form a semantic text group. The semantic text group comprises a plurality of text segments expressed in different preset languages, and the semantic emotion expressed by the text segments is the same.

Optionally, to improve the quality of the semantic text group, a plurality of text segments with translation relations can be checked and corrected through manual verification, so that the semantic emotion of the plurality of text segments in the generated semantic text group is highly uniform, and the quality of a subsequent training model is optimized.

In the embodiment of the application, the semantic text group comprises a plurality of text segments with the same semantics and the same expression emotion.

In addition, the number of text segments contained in different semantic text groups may be different, but the number of text segments contained in any one semantic text group is not less than two.

S13: and obtaining classification labels respectively corresponding to the plurality of semantic text groups, wherein the classification labels comprise emotion labels of semantic expressions of all text segments in the corresponding semantic text groups.

For any semantic text group, firstly, respectively determining emotion labels of semantic expressions for all text segments in the semantic text group, and then summarizing the emotion labels determined by all the text segments into classification labels corresponding to the semantic text group.

For example, assume a semantic text group contains three text segments expressed in three languages, mandarin, chongqing, cantonese. The emotion labels corresponding to the Mandarin text segments comprise [ no concentricity; strong ], emotion labels corresponding to Chongqing text segments comprise [ no concentricity; small sadness ], and emotion labels corresponding to Guangdong text segments comprise strong; transferring contradiction points ], then summarizing emotion labels corresponding to the three text segments into [ no concentricity ]; strong force; little sadness; transfer contradiction points ], as the classification label corresponding to the semantic text group.

S14: training an initial text classification model according to a labeling label and a training sample to obtain the text classification model, wherein the training sample comprises each text segment in the plurality of semantic text groups, and the labeling label is a classification label corresponding to the semantic text group to which the training sample belongs.

And training a text classification model based on the semantic text group and the corresponding classification labels. Still referring to the above examples, the semantic text group includes three text segments expressed by three languages of mandarin, chongqing, guangdong, and the corresponding classification labels are [ no concentricity; strong force; little sadness; transfer contradiction points ]. Then, the semantic text group can split three training samples, namely a Mandarin text segment, a Chongqing text segment and a Cantonese text segment. The training labels corresponding to the three training samples are all classification labels corresponding to the semantic text group, namely [ no concentricity ]; strong force; little sadness; transfer contradiction points ].

According to the scheme provided by the embodiment of the application, the multiple text segments expressed by multiple preset languages can realize training on the model based on the difference of the expression of the multiple preset languages, and the text segments in different languages can achieve the mutual verification effect so as to avoid the condition that the language characteristics are lost due to the expression limitation of the unique language. The classification labels represent emotions expressed by semantics of text segments in corresponding semantic text groups, and the emotion expressed based on the semantics can be associated with text segments expressed in different preset languages, so that texts expressed in different languages are associated with each other. And then, respectively taking each text segment in the plurality of semantic text groups as a training sample, taking a classification label corresponding to the semantic text group to which the training sample belongs as a training label, and training a text classification model, so that the model can learn the emotion expression capability of different languages, thereby avoiding the classification limitation possibly caused by a unique language training model, and further effectively improving the classification accuracy of the model to the text.

The following describes the present application with reference to examples, and in practical application, the following steps are performed for a role session scenario:

1) Acquiring all session data of a role in a corpus, wherein the corpus can be a corpus containing mandarin, various dialects and foreign languages;

2) Constructing text translation relations among different languages, storing the text translation relations in a database, and recording the text translation relations as S_1 in a translation corpus;

the translation relation among different languages can be constructed by a labeling person through a manual labeling mode. The labeling requirements include: a translation relationship is identified as having a high degree of consistency between text segments in different languages in semantic details, including but not limited to emotion colors, etc.

3) Selecting three text fragments with text translation relations as a group, and marking portrait feature points;

the labeling sample format is as follows:

Text Label

the list of image feature points expressed in three languages of Mandarin [ sep ] Chongqing [ sep ] Guangdong ] represents list of image feature points expressed in three languages of Mandarin [ sep ] Henan [ sep ] Shanghai: the different languages are separated by a separator [ sep ].

The portrait characteristic point labels are, for example: [ no concentricity; strong force; little sadness; transfer contradiction Point ]

5) The marked data is split, the same meaning of three languages is marked during marking, the three are split, and the three sentences can be respectively corresponding to the following image feature points to be used as training samples.

Examples:

Text Label

image feature point list expressed in three languages of Mandarin

Image feature point list expressed in Chongqing three languages

Image feature point list expressed in three languages of Cantonese

And training the model by using a training sample Text and a corresponding training Label Label.

According to the scheme provided by the embodiment of the application, the variability of speech of each person in different emotions and different scenes is considered, analysis is reduced to smaller granularity, and model classification accuracy obtained through training is improved. In practical application, the scheme provided by the embodiment of the application can deeply dig the specificity of people in language meaning, can be used for grouping of the back-thrust people, and is beneficial to analysis and application of dialogue scenes. In addition, for a plurality of users belonging to the same group or class, consistency is often provided on language meaning, and in language analysis, the users can treat the users equally, so that intelligent reply in a dialogue scene is realized.

Based on the solution provided in the foregoing embodiment, optionally, as shown in fig. 2, in step S13, obtaining classification labels corresponding to the plurality of semantic text groups respectively includes:

s21: sub-classification labels corresponding to all text segments in the semantic text group are obtained, wherein the semantic text group comprises at least one sub-classification label corresponding to the text segment and sub-classification labels corresponding to other text segments in the same group.

Wherein the semantic text group includes at least two text segments corresponding to different sub-category labels. In other words, in a semantic text group, there is at least one text segment with a sub-category label that is different from the sub-category labels of other text segments.

In this step, sub-category labels are obtained for each text segment in the semantic text group. For example, assume a semantic text group contains three text segments expressed in three languages, mandarin, chongqing, cantonese. The sub-classification labels corresponding to the Mandarin text segments include [ no concentricity; strong ], the sub-classification labels corresponding to the Chongqing text segment comprise [ no concentricity; minor sadness ], the sub-classification labels corresponding to the Guangdong text segment comprise [ strong; transfer contradiction points ].

S22: and determining the union of sub-classification labels corresponding to the text segments in the semantic text group as the classification label corresponding to the semantic text group.

In the example, the emotion labels corresponding to the three text segments are summarized as [ no concentricity; strong force; little sadness; transfer contradiction points ], as the classification label corresponding to the semantic text group.

In training the model, if the model is trained using only text segments expressed in one language, the corresponding classification labels may lose part of the classification due to the limitations of the expression in a single language. In comparison, according to the scheme provided by the embodiment of the application, the union of sub-classification labels corresponding to each text segment in the semantic text group is determined as the classification label corresponding to the semantic text group, so that the classification label can express details on a multi-language expression mode. And then, taking text segments in the semantic text group as training samples, and taking classification labels corresponding to the semantic text group as training labels, wherein the training labels can be used for showing details of various different language expressions, so that the model learns richer semantic features.

Based on the solution provided in the foregoing embodiment, optionally, as shown in fig. 3, in step S14, training an initial text classification model according to the labeling label and the training sample to obtain the text classification model includes:

S31: and respectively segmenting the training samples through a segmentation model.

In this step, word segmentation may be performed on the training samples first, and then encoding may be performed on each word segment separately. Wherein word segmentation may be performed on the training samples using a pre-trained word segmentation model. Specifically, a word segmentation model matched with a preset language can be selected to improve word segmentation rationality. The word segmentation result can be a sequence containing a plurality of segmented words, and the positions of the segmented words in the sequence are consistent with the sequence of the segmented words in the training sample.

S32: and inputting the segmented training sample into a word embedding layer to obtain a text coding result corresponding to the training sample.

In this step, encoding is performed on the segmentation results to obtain text encoding results that characterize the multidimensional encoding features of the corresponding training samples. Specifically, the word embedding layer is used for mapping each word into a numerical value, so that a text coding result is obtained. That is, the text encoding result is a training sample after word segmentation expressed in a numerical form, and features of each word segment in the training sample can be represented.

S33: and classifying and predicting the text coding result based on an attention mechanism to obtain a prediction label corresponding to the training sample.

Attention, the Attention mechanism, is a mechanism that captures long-range dependencies by focusing on different parts of the input vector. In the step, classification prediction is carried out on the characteristics expressed by the text codes based on an attention mechanism, and the obtained prediction label characterization model predicts the category of the training sample.

S34: the text classification model is trained based on the losses between the predictive labels and the labeling labels corresponding to the same training samples.

Loss between the predicted label and the training label can indirectly express the completion degree of model training, and the model training aims at reducing the gap between the predicted label predicted by the model and the real training label, so that the result predicted by the model is closer to the real result. In this step, iterative training may be performed on the model according to the loss, and training may be stopped until the loss is stabilized within a certain range, to obtain a trained model.

The following describes the scheme with reference to examples, and the model training process is as follows:

a. inputting training samples and corresponding training labels into a language model model_alpha to be trained;

b. setting task parameters as multi-label classification;

c. performing word segmentation processing on the sequence text by using a model_alpha;

d. Coding the text after word segmentation;

e. splicing the encoding results to generate a final text encoding;

f. the text coding result passes through a text mechanism, a Linear layer and a softmax layer, and a predictive label is obtained by the softmax layer;

g. continuously calculating the loss between the predicted label and the actual label, and stopping training when the loss is stabilized within a threshold range;

h. a new classification model of the image features is obtained and is denoted as model_beta.

Based on the scheme provided in the foregoing embodiment, optionally, as shown in fig. 4, in step S32, a plurality of training samples after word segmentation are input into a word embedding layer to obtain text encoding results corresponding to the plurality of training samples, where the text encoding results include:

s41: and inputting the training samples after word segmentation into a word segmentation coding layer, a sentence coding layer and a position coding layer in the word embedding layer to obtain word feature vectors output by the word segmentation coding layer, sample difference feature vectors output by the sentence coding layer and language sequence feature vectors output by the position coding layer.

In this example, the word embedding layer may include three layers, and the training sample after word segmentation may be a plurality of words based on a word order.

First, the word segmentation result is input to a position coding layer, namely a token embedding layer, so that each word in the word segmentation result is processed into a vector with fixed dimension.

Then, the output result of the token embedding layer is input into a clause coding layer, namely a segmentment embedding layer, and the segmentment embedding layer has the function of distinguishing the clauses, and different clauses in a text segment are represented by different numerical values, so that the output result of the layer can express the relation between words and the clauses and between the clauses and the text segment.

Next, the output result of the segmentment embedding layer is input to a position coding layer, namely positional embedding layer, and the positional embedding layer is used for outputting the word order feature, and the word order is represented by different values.

S42: and splicing the word feature vector, the sample difference feature vector and the word order feature vector to obtain text coding results respectively corresponding to the training samples.

In this step, splicing is performed on the three-layer output result. For example, the dimensions of the three output results may be the same, and in this step, the elements are added to obtain the text encoding result corresponding to the training sample.

Based on the solution provided in the above embodiment, optionally, before step S14, an initial text classification model may be obtained by means of model fine tuning. Wherein an initialization (i.e., fine tuning) of the multilingual pre-training model may be performed based on the semantic text groups to arrive at an initial text classification model. The multilingual pre-training model can be selected according to actual requirements, for example, a mBERT (monolingual Bidirectional Encoder Representation from Transformers) model can be selected. In the fine tuning process, the plurality of semantic text groups can be used as initial training samples, the mBERT model is fine-tuned by adopting an unsupervised corpus, and a fine-tuned initial text classification model is obtained and is marked as model_alpha.

The initial text classification model can be suitable for the application scene of the training target through the fine adjustment of the model, a part of features are initially learned by the model through the fine adjustment, and then the features of the training sample can be more efficiently learned by the initial text classification model in the training process of the subsequent step S14, so that the training efficiency is improved, and the training effect is optimized.

Based on the solution provided in the foregoing embodiment, optionally, as shown in fig. 5, before step S11, the method further includes:

S51: and respectively obtaining corpus of a plurality of preset languages, wherein the corpus comprises corpus expressed by the preset languages.

Generally, there are differences in the expression patterns of different languages. In the step, corpus of a plurality of preset languages are respectively obtained, and the corpus in the corpus can express the characteristics of the corresponding preset languages on the expression mode.

S52: constructing a plurality of translation corpus groups according to the corpuses of the plurality of preset languages, wherein the translation corpus groups comprise a plurality of corpuses with translation relations, and the plurality of corpuses belonging to the same translation corpus group are respectively from corpuses of different preset languages.

In this step, based on the corpus corresponding to different preset languages, a translation relationship between the corpora of different languages is established. For example, for a Mandarin word, words expressed by Cantonese of the same semantics are searched according to the semantics of the Mandarin word in sentences, paragraphs and articles, so that a text translation relationship is established between the Mandarin word and the corresponding Cantonese word. Wherein the Mandarin words are from the Mandarin corpus and the Cantonese words are from the Cantonese corpus. Based on the establishment mode of the text translation relationship, word corresponding relationships among a plurality of preset languages are established, and then a translation corpus is established.

S53: and constructing a translation corpus based on the plurality of translation corpus tissues, wherein the translation corpus is used for expressing translation relations among the plurality of preset languages.

After obtaining a plurality of translation corpus sets, a translation corpus is further constructed. The translation corpus comprises corpora expressed by a plurality of preset languages, and text translation relations among the corpora in different preset languages are recorded.

The text translation relation of the preset language comprises the corresponding relation of the same semantic meaning among the expression modes in different preset languages. The corresponding relation between words may be included, for example, a corresponding relation exists between words expressed in mandarin and synonyms expressed in a plurality of preset dialects. In addition to words, the text translation relationship may include correspondence between phrases and colloquial phrases.

The step S11 of obtaining a plurality of initial text segments and translated text segments having a translation relationship with the initial text segments includes:

s54: and acquiring a plurality of initial text segments.

In this step, any one of the obtained initial text segments may be a text segment containing one or more sentences, and different initial text segments may contain different numbers of sentences.

The obtained initial text segment may be a text segment in a desired application scene. For example, if a text classification model is required to classify an article, the initial text segment obtained may be a paragraph in the article. If a text classification model is required to classify a conversation between users, the initial text segment obtained may be the conversation content of the user in the conversation scene, or the like.

In addition, the initial text segment can be acquired based on the technical field of the actual application scene, for example, the text classification model is needed to classify the articles in the chemical technical field, and then the initial text segment can be acquired for the chemical field.

S55: and translating the plurality of initial text segments based on the pre-constructed translation corpus to obtain translation text segments with translation relations with the initial text segments.

Based on the above text translation relationship, the initial text segment is translated into a plurality of translated text segments expressed in a plurality of preset languages in this step. For example, a text segment expressed in Mandarin is translated into a text segment expressed in Shanghai and a text segment expressed in Sichuan, thereby obtaining text segments expressed in three different languages that express the same semantic emotion.

Based on the solution provided in the foregoing embodiment, optionally, the degree of difference between the plurality of preset languages is greater than a preset degree of difference.

Generally, there is a difference in expression between different languages, but there is a difference in size between languages. In some cases, the difference between languages used in two regions that are geographically close is small, and conversely, the difference between languages used in two regions that are geographically far apart is large. In the scheme provided by the embodiment of the application, multiple languages with the difference degree larger than the preset difference degree can be selected as preset languages, so that the semantic emotion of the text segment can be fully expressed through the multiple languages.

In order to solve the problems in the prior art, the embodiment of the present application further provides a text classification method, as shown in fig. 6, including:

s61: and acquiring the text segment to be classified.

The text segment to be classified may include a single sentence or a plurality of sentences, and the plurality of sentences are arranged into text segments based on a word order, and the text segment to be classified is a text segment expressed in any one of the preset languages. In this example, the text segment to be classified is denoted as Input1.

S62: generating a plurality of to-be-classified translation text segments expressed by a plurality of preset languages based on the to-be-classified text segments, wherein the emotion of the semantic expression of the plurality of to-be-classified translation text segments is the same as the emotion of the semantic expression of the to-be-classified text segments.

In the step, the translation can be executed on the basis of the translation corpus, and the quality of the translation text segments can be optimized in a manual auditing mode, so that a plurality of translation text segments are consistent with the text segments to be classified in terms of semantics and emotion.

For example, if the text segment to be classified is a text segment expressed in mandarin chinese, the text segment to be classified may be translated into a guangdong text segment, a Sichuan text segment, or the like, respectively, according to the translation corpus in this step.

S63: inputting the text segment to be classified and the plurality of translated text segments to be classified into a text classification model to obtain a classification prediction result of the text segment to be classified, wherein the text classification model is obtained by training an initial text classification model according to a labeling label and a training sample, the training sample comprises each text segment in a plurality of semantic text groups, the labeling label is a classification label corresponding to the semantic text group to which the training sample belongs, the plurality of semantic text groups are constructed based on a plurality of initial text segments and a plurality of translated text segments, any one semantic text group comprises a plurality of text segments with translation relations, the plurality of translated text segments are text segments obtained by translating the initial text segments into a plurality of preset languages, the plurality of text segments belonging to the same semantic text group are expressed by a plurality of different preset languages, and the classification label comprises emotion labels of semantic expression of each text segment in the corresponding semantic text group.

In this step, the text segment to be classified and the plurality of translation text segments are respectively input into the text classification model, and the result output by the model to the text segment to be classified and the result output by the model to the plurality of translation text segments are summarized as the classification prediction result of the text segment to be classified, where the classification prediction result may specifically be the prediction label described in the above example.

In the scheme, the text segments to be classified are translated, prediction output is carried out on the text segments before and after translation through the model, and then the output results are summarized as the prediction results of the text segments to be classified. The semantic emotion expressed by the text segment to be classified can be expressed in different languages through translation, so that a model can recognize finer semantic emotion, and the accuracy of a prediction result is improved.

The text classification model applied in the embodiment of the application is obtained by training each text segment in a plurality of semantic text groups and classification labels corresponding to the semantic text groups to which the text segments belong, wherein the text segments expressed by a plurality of preset languages can realize training of the model based on the differences expressed by the plurality of preset languages, and the text segments in different languages can have the effect of mutual verification so as to avoid the condition that language characteristics are lost due to the expression limitation of a unique language. The classification labels represent emotions expressed by semantics of text segments in corresponding semantic text groups, and the emotion expressed based on the semantics can be associated with text segments expressed in different preset languages, so that texts expressed in different languages are associated with each other. The text classification model has the capability of expressing emotion in different languages, so that classification limitation possibly caused by a unique language training model is avoided, and the classification accuracy of the model to the text is effectively improved.

Based on the scheme provided by the embodiment, optionally, the text segment to be classified includes a target dialogue text segment between a target user and a dialogue user, where the dialogue user is a user participating in the same dialogue with the target user.

The text segment to be classified can be a time sequence-based dialogue text, and the text segment to be classified can comprise sentences corresponding to a plurality of users respectively, wherein the sentences are arranged based on time sequence.

As shown in fig. 7, in the step S63, inputting the text segment to be classified and the plurality of translated text segments to be classified into a text classification model to obtain a classification prediction result of the text segment to be classified, including:

s71: inputting the target dialogue text segment and the first translation text segment into a text classification model to obtain a first classification result, wherein the first translation text segment is translated by the target dialogue text segment;

the first translation text is a translation text of a target dialogue text segment, and all dialogue texts between the target user and at least one dialogue user are contained. And carrying out prediction classification on the target dialogue text segment and the first translation text segment thereof through a text classification model to obtain a first classification result. The first classification result includes an output result of the model for the target dialog text segment and an output result of the model for the first translation text segment.

S72: and inputting the text segment of the target user in the target dialogue text segment and a second translation text segment into a text classification model to obtain a second classification result, wherein the second translation text segment is translated from the text segment of the target user in the target dialogue text segment.

In the step, the corresponding part of the target user in the target dialogue text segment is extracted, the sentence of the target user and the corresponding second translation text input model are subjected to classification prediction, and the obtained prediction result is only aimed at the content expressed by the target user. And the second classification result comprises an output result of the model on the text segment of the target user and an output result of the model on the second translation text segment, which are similar to the classification result.

S73: and inputting the selected text segment and the third translation text segment in the target dialogue text segment into a text classification model to obtain a third classification result, wherein the selected text segment comprises the text segment of the target user and at least part of the text segment of the dialogue user, and the third translation text segment is translated from the selected text segment in the target dialogue text segment.

In the step, firstly, sentences corresponding to the target user are selected from the target dialogue text segments, then sentences of other users are selected from the rest dialogue text segments, sentences corresponding to the target user and the sentences of the selected other users are combined, a third translation text segment is translated, and then the third translation text segment is input into a text classification model, and a third classification result is obtained. The third classification result includes the output result of the model for the pre-translation and post-translation text segments, the output result of the model for the text segments of the target user, and the output result of the model for the second translation text segments, as in the classification result described above.

S74: and determining a classification prediction result of the target dialogue text segment according to the first classification result, the second classification result and the third classification result.

In a dialog scenario, the sentences output by the target user are closely related to the other user sentences that are involved in the dialog. Based on the above, the scheme provided by the embodiment of the application aims at the sentences of the target user to be identified, and the sentences of other users in the dialogue are identified as the implicit text common input model, so that the accuracy of identifying the semantic emotion expressed by the target user in the dialogue scene is improved.

Next, the present solution will be described by taking an application scenario of a customer dialogue agent as an example. In this example, the target dialog text segment Input1 and all the translated text thereof are Input into the trained text classification model model_β, and the classification of the image features of the current limited dialog is obtained. The text of the agent in the dialogue is used as a recessive text input model, so that the classification accuracy of the client sentences is improved. The method is realized by the following steps:

a. input1 containing seat conversation text is input into a model_beta model, portrait feature classification is obtained, and the result is recorded as A;

b. input1 without the seat conversation text is input into a model_beta model, portrait characteristic classification is obtained, and the result is recorded as B;

c. Input1 which reserves all the customer texts and randomly reserves part of the seat session texts is input into a model_beta model, portrait feature classification is obtained, and the result is marked as C.

d. And obtaining the classification result of the client text according to the classification results A, B and C.

Based on the solution provided in the foregoing embodiment, optionally, as shown in fig. 8, in step S84, determining a classification prediction result of the target dialog text segment according to the first classification result, the second classification result, and the third classification result includes:

s81: determining a classification result intersection of the first classification result, the second classification result and the third classification result, and determining a classification result union of the first classification result, the second classification result and the third classification result.

Taking the classification results A, B and C obtained by the above example as an example, the union of the above three classification results is denoted as A.u.B.u.C, and the intersection of the above three classification results is denoted as A.u.B.u.C.

S82: randomly extracting classification result elements based on a preset proportion from the difference set of the classification result union set and the classification result intersection set.

The union contains the semantic emotion expressed by each user participating in the conversation in the target conversation text segment, and the intersection contains the semantic emotion expressed by the target user. The difference set of the union set and the intersection set contains semantic emotion expressed by other users except the target user, and the semantic emotion has a certain influence on the target user. In the step, the classification result elements are extracted from the semantic emotion expressed by other users according to the preset proportion. Optionally, the above-mentioned preset ratio ratio=0.3.

S83: and determining each element in the classification result intersection and the classification result element as a classification prediction result of the target dialogue text segment.

In this step, the above-mentioned classification result intersection A.u.B.u.C and the extracted classification result elements are summarized into classification prediction result for expressing the semantic emotion mainly expressed by the target dialogue text segment using the target user as core.

Based on the solution provided in the foregoing embodiment, optionally, the method further includes:

and determining classified portrait keywords according to the classified prediction result of the text segment to be classified, and then inputting the classified portrait keywords into a pre-training language model to generate a classified portrait descriptive sentence.

The method and the device can be used for realizing visual display of the obtained classification prediction result. Wherein, the classification prediction result can be expressed by a readable classification portrait keyword. For example, the classification portrayal keyword corresponding to the classification prediction result may be "non-concentric; strong force; little sadness; transfer contradictory points ", etc. Classifying the portrait keywords can be beneficial to generating expression sentences with high readability.

The pre-training language model may be, for example, MT5 (Multilingual Text-to-Text Transfer Transformer) model, and sentence-making is performed on the classified portrait keywords, so as to generate a classified portrait description sentence with high readability. The classification portrait descriptive sentence can display the model prediction result in a high-readability form, which is beneficial to generating adaptive reply content aiming at the sentence of the user in the dialogue scene.

Specifically, when the MT5 model is applied, the classified portrait keyword may be converted into a word vector, and a portrait feature point based on time may be drawn by using time and the word vector as two coordinate axes, and a descriptive sentence may be generated based on the portrait feature point in the coordinate.

Based on the solution provided in the foregoing embodiment, optionally, as shown in fig. 9, the method further includes:

s91: and acquiring the historical dialogue text of the target user and the target dialogue text to which the target dialogue text segment belongs.

The history dialogue text comprises the target dialogue text segment, and the history dialogue text, the target dialogue text and the target dialogue text segment are texts output by a target user in the dialogue process.

In this example, the associated target dialog text and the historical dialog text of the target user are obtained based on the target dialog text segment to be classified. Wherein the target dialog text segment belongs to the target dialog text. Both the historical dialog text and the target dialog text contain sentences of the target user.

The method and the device have the advantages that the historical dialogue text and the target dialogue text are obtained, the identification range is further expanded, and the characteristics of the target user dialogue are expressed more comprehensively according to sentences of the target user in the whole dialogue and historical dialogue sentences of the target user.

S92: replacing at least part of the history dialogue text with the text expressed in the preset language to form translated history dialogue text, and replacing at least part of the target dialogue text with the text expressed in the preset language to form translated target dialogue text.

In this step, partial translation substitution is performed on the history dialog text and the target dialog text, respectively. For example, a lottery is executed on the sentences in the history dialogue text, the selected sentences are translated into the text expressed by other preset languages, and the original history dialogue text is filled back, so that the translated history dialogue text is consistent with the semantic quantity before translation, and the translated history dialogue text does not lose the semantic. Correspondingly, similar processing is also carried out on the target dialogue text, and lottery translation is carried out and the translated target dialogue text is obtained through filling.

S93: and inputting the translated historical dialogue text into the text classification model to obtain a historical classification prediction result, and inputting the translated target dialogue text into the text classification model to obtain a text classification prediction result.

In the step, the translated historical dialogue text and the translated target dialogue text are respectively input into a text classification model to obtain corresponding classification results. The historical dialog classification result may express the characteristics of the historical dialog of the target user, in other words, express the expression characteristics of the target user. The target text classification result may represent the expressive features of the target user in the target dialog.

After the step S73, the method further includes:

s94: determining a first similarity of the classification prediction result of the target dialog text segment to the historical classification prediction result, and determining a second similarity of the classification prediction result of the target dialog text segment to the text classification prediction result.

In this step, the calculated similarity can express the difference between the past expression features of the target user, the expression features in the target dialogue, and the expression features in the target dialogue text segment.

Specifically, the first similarity expresses whether the expression characteristics of the target user in the target dialogue text segment are similar to the past expression characteristics of the target user, and can show whether the expression mode and the expression emotion of the target user in the target text segment have larger changes.

Similarly, the second similarity expresses whether the characteristics expressed by the target user in the text segment of the target dialog are similar to the expressed characteristics in the whole-length target dialog.

The first similarity and the second similarity can express whether the target user has obvious transition compared with the conventional expression mode, and can express the emotion intensity degree in the target dialogue text segment to a certain extent.

S95: and generating an importance evaluation sentence of the target dialogue text segment according to the first similarity and the second similarity.

Since the first similarity and the second similarity can express the difference of the target user in semantic emotion in the target text segment, an importance evaluation sentence can be generated according to the two similarities. Taking the target dialogue text as an example, if the second similarity is higher, the emotion expressed by the semantics of the target user in the target dialogue text segment is consistent with the emotion expressed by the whole dialogue text, and if the second similarity is lower, the semantic emotion of the target user in the target dialogue text segment is obviously changed, and the emotion of the target user in the target dialogue text segment is possibly changed greatly. Based on the method, an importance evaluation sentence is generated to express the importance of the target text segment in the whole target text, and the importance evaluation sentence can be used for assisting in identifying emotion transition of the target user, so that the method is beneficial to generating proper reply content in an intelligent reply application scene.

The following describes the solution provided in the embodiments of the present application with reference to an example, where in an application scenario where an agent and a client are talking, the solution performs the following steps:

First, the following three types of text are entered:

input1, the limited session fragment currently required to be detected.

Input2, full-pass dialogue.

Input3, all history sessions.

In this example, the target object to be detected is a client, but in this embodiment, the background conversation text, i.e. the agent, is input as a hidden text.

And then, randomly extracting some session client texts in the whole-pass session and the history session respectively, and performing multi-language replacement to generate different whole-pass texts and history texts. Wherein the preset language used for executing the multi-language substitution can be selected randomly.

The importance of the limited session segments to be detected is then assessed based on the full text and the historical text. And generating an importance evaluation sentence of the current limited text input1 in the whole text and even all history sessions according to the evaluation results of the limited text input1 in the whole text and all history sessions.

Based on the solution provided in the foregoing embodiment, optionally, as shown in fig. 10, in step S94, determining a first similarity between the classification prediction result of the target dialog text segment and the first classification prediction result, and determining a second similarity between the classification prediction result of the target dialog text segment and the second classification prediction result includes:

S101: and calling a word embedding module in the text classification model to respectively determine a target portrait characteristic word vector corresponding to the classification prediction result of the target dialogue text segment, a first portrait characteristic word vector corresponding to the first classification prediction result and a second portrait characteristic word vector corresponding to the second classification prediction result.

In this example, in the case that the classification result itself is in the form of a word, the token_embedding module may be directly called to obtain a word vector corresponding to each classification result.

S102: the similarity between the target portrait feature word vector and the first portrait feature word vector is determined to be the first similarity, and the similarity between the target portrait feature word vector and the second portrait feature word vector is determined to be the second similarity.

For example, the present example may be implemented by the following steps:

a. the whole text is input into a model-beta model, and the elements in the portrait characteristic list are in word form, so that word vectors of all portrait characteristic words are obtained by calling a token_embedding module directly after the model is input.

b. The function of the current limited text input1 in the full text is the similarity between the word vector of the portrait feature of the current limited text input1 and the word vector of the portrait feature of the full-text session, and the similarity can be specifically a cosine value between the word vectors.

c. The function of the current limited text input1 in all the history session texts is the similarity of the word vector of the portrait characteristic of the current limited text input1 and the word vector of the portrait characteristic of all the history sessions, and the similarity can be specifically a cosine value between the word vectors.

Based on the solution provided in the foregoing embodiment, optionally, in step S93, inputting the translated historical dialog text into the text classification model to obtain a first classification prediction result, including:

first, since the history dialogue text includes the target dialogue text segment, the translated history dialogue text also includes the translated target dialogue text segment. Dividing the translated historical dialog text into a front historical dialog text positioned before the target dialog text segment and a rear historical dialog text positioned after the target dialog text segment in the step;

in this example, the translated historical dialog text is divided based on the target dialog text segment to obtain a front segment and a rear segment of the historical dialog text, respectively. Wherein the front-stage historical dialog text is positioned before the target dialog text segment based on time, and the back-stage historical dialog text is positioned after the target dialog text segment based on time.

Then, inputting the front-section historical dialogue text into a text classification model to obtain a first classification result of the front-section historical dialogue text, and inputting the rear-section historical dialogue text into a text classification model to obtain a second classification result of the rear-section historical dialogue text;

in this example, the front-stage history dialogue text and the rear-stage history dialogue text are input into a text classification model respectively, so as to obtain a corresponding front-stage history dialogue classification result and a corresponding rear-stage history dialogue classification result. According to the scheme provided by the embodiment, the historical dialogue text is divided based on the target dialogue text segment, and classification recognition is carried out on the front part and the rear part of the target dialogue text segment, so that the semantic emotion of the target user before the target dialogue text segment and the semantic emotion of the target user after the target dialogue text segment occurs can be effectively reflected.

And finally, determining the historical dialog classification result based on a first weight, the first classification result, a second weight and the second classification result, wherein the first weight is the weight occupied by the front-section historical dialog text in the historical dialog text, and the second weight is the weight occupied by the rear-section historical dialog text in the historical dialog text.

In the step, weighted summation is performed on the front-section and rear-section historical dialogue classification results based on the first weight and the second weight, so that the historical dialogue classification results are determined based on the text weight, the accuracy of the historical dialogue classification results is effectively improved, and the importance evaluation accuracy of the target dialogue text segment is further improved.

Based on the solution provided in the foregoing embodiment, optionally, in step S93, inputting the translated target dialog text into the text classification model to obtain a second classification prediction result, including:

first, the translated target dialog text is divided into a front segment of target dialog text preceding the target dialog text segment and a rear segment of target dialog text following the target dialog text segment.

Similar to the above-mentioned process of processing the history dialogue text in the example, in this example, the target dialogue text is divided according to the target dialogue text segment, so as to obtain the front segment and the rear segment of the target dialogue text respectively. Wherein the front target dialogue text is positioned before the target dialogue text segment based on time, and the rear target dialogue text is positioned after the target dialogue text segment based on time.

And then, inputting the front-stage target dialogue text into a text classification model to obtain a third classification result of the front-stage target dialogue text, and inputting the rear-stage target dialogue text into a text classification model to obtain a fourth classification result of the rear-stage target dialogue text.

In this example, the front-stage target dialogue text and the rear-stage target dialogue text are respectively input into a text classification model to obtain a corresponding front-stage target dialogue classification result and rear-stage target dialogue classification result. According to the scheme provided by the embodiment, the target dialogue text is divided based on the target dialogue text segment, and classification recognition is carried out on the front part and the rear part of the target dialogue text segment, so that the semantic emotion of the target user before the target dialogue text segment and the semantic emotion of the target user after the target dialogue text segment occurs can be effectively reflected.

And finally, determining the target text classification result based on a third weight, the third classification result, a fourth weight and the fourth classification result, wherein the third weight is the weight occupied by the front-stage target dialogue text in the target dialogue text, and the fourth weight is the weight occupied by the rear-stage target dialogue text in the target dialogue text.

And determining the target text classification result based on a weighted result of a third weight and the front-stage target dialogue classification result and a weighted result of a fourth weight and the rear-stage target dialogue classification result, wherein the third weight is the weight occupied by the front-stage target dialogue text in the target dialogue text, and the fourth weight is the weight occupied by the rear-stage target dialogue text in the target dialogue text.

In the step, weighted summation is performed on the front-section and rear-section target dialogue classification results based on the third weight and the fourth weight, so that the target text classification result is determined based on the text weight, the accuracy of the historical dialogue classification result is effectively improved, and the importance evaluation accuracy of the target dialogue text section is further improved.

Based on the solution provided in the foregoing embodiment, optionally, after step S73, the method further includes:

firstly, a classification prediction result of a target dialogue text segment corresponding to each of a plurality of target users is obtained.

The present example can be applied to a scene in which classification is performed on a plurality of target users. In this step, classification prediction results of a plurality of target users are obtained. The target users may be different users participating in the same dialogue, or may be non-related users who have not performed the dialogue.

And secondly, clustering the plurality of target users based on classification results respectively corresponding to the plurality of target users.

In this step, a plurality of target users are clustered based on classification prediction results corresponding to the respective target users. In part of application scenes, the classification prediction results corresponding to the target users express the semantic emotion expressed by the target users, and the clustering is performed on a plurality of target users based on the classification prediction results, so that the users expressing the similar semantic emotion can be gathered into one type. Furthermore, the same reply processing can be executed for a plurality of users in the same category, which is beneficial to the classification and intelligent reply of the user group in the actual application scene.

Based on the solution provided in the foregoing embodiment, optionally, clustering the multiple target users based on classification results corresponding to the multiple target users, includes:

firstly, a word embedding module in the text classification model is called to respectively determine target portrait characteristic word vectors corresponding to classification prediction results of the plurality of target users.

If the classification prediction result is expressed in terms of words, the corresponding target portrait feature word vector can be obtained by calling a word embedding module. If the classification prediction result is expressed in other modes, the classification prediction result can be converted into a word form, and then the target portrait characteristic word vector is obtained through a word embedding module.

And secondly, determining the similarity and the dissimilarity among a plurality of target portrait characteristic word vectors.

The target portrait feature word vector can express the semantic emotion features of the corresponding target users, and can indirectly express the difference of different target users in expression by calculating the similarity and the dissimilarity degree.

And finally, clustering the plurality of target users according to the similarity and the dissimilarity among the plurality of target portrait feature word vectors.

And clustering a plurality of target users based on the similarity and the dissimilarity determined in the steps, and converging the similar target users into one class. The clustering may be performed based on a preset number, so that the number of users included in each cluster obtained by the clustering is within a certain interval.

The following describes the present solution with reference to an example, in the scenario of a customer to agent conversation, the following steps are performed:

1) The limited session text of n clients in the same session context is input, wherein the session context may be, for example, a session scene of purchasing the same item, a session scene of purchasing the same kind of item, a pre-sale or after-sale session scene, a complaint session scene, etc.

Taking 3 clients as an example, the input restricted session text includes:

Input1 client 1 currently needs to detect limited session segments (client needs to be detected, but background session text such as agent needs to be Input as implicit text).

Input2 client 2 currently needs to detect limited session segments (client needs to be detected, but background session text, i.e. the agent, needs to be entered as implicit text).

Input3 client 3 currently needs to detect limited session segments (client needs to be detected, but background session text, i.e. agent, needs to be entered as implicit text).

2) And acquiring all available multilingual translation texts of Input1, input2 and Input3 based on the translation corpus S_1 constructed by any example.

3) Input1, input2, input3 and all translation texts thereof are respectively Input into a trained model_beta, and the classification of the image characteristics of the current limited session is obtained.

The agent text is used as a hidden text auxiliary for classification in the Input text, and specifically, for the actual operation mode of the agent text as the hidden text, reference may be made to the related content of the classification results A, B and C obtained based on Input1 in the above example. In this example, similar operations may be performed on Input2 and Input3, so as to obtain image feature classifications corresponding to Input1, input2, and Input3, respectively.

4) And evaluating the commonality and the dissimilarity of the image features of input1, input2 and input 3.

The evaluation method comprises the following steps:

d. for input1, input2 and input3, respectively inputting the corresponding integral text and the corresponding portrait features of the history session text into a model_beta model, wherein the elements in the portrait feature list are in word form, so that word vectors of all portrait feature words are obtained by calling a token_embedding module directly after the model is input.

e. Image feature commonality= (word vector of current limited text input1 image feature) and similarity (cosine value) of current limited text input1, input2 image feature (word vector of current limited text input1 image feature)

f. Image feature dissimilarity=1- (word vector of current limited text input1 portrait feature) for current limited text input1, input2, and (word vector of current limited text input1 portrait feature) similarity (cosine value)

g. The image characteristic commonality and dissimilarity values of input1 and input3 can be obtained in the same way, and the image characteristic commonality and dissimilarity values of input2 and input3 can be obtained.

And clustering a plurality of target users according to the parameter values such as the commonality, the dissimilarity and the like, thereby realizing the user classification based on the language expression characteristics.

According to the scheme provided by the embodiment of the application, the continuous drawing scheme of the main portrait is put forward in the branch of the deep learning framework for carrying out fine-granularity grouping on the language main body based on the limited language facts, the current portrait of the conversation main body is deeply mined, and the portrait change in the whole conversation, the importance and decisive position analysis of the current portrait in the whole conversation and all historical conversations can assist in conversation logic understanding, conversation atmosphere analysis and intelligent communication.

The solution provided by the embodiment of the application provides a portrait commonality analysis solution between different conversation bodies in a branch of a deep learning framework for carrying out fine-granularity clustering on the language bodies based on limited language facts, so that portrait commonalities between different conversation bodies (such as different clients) under the same conversation scene (such as the same conversation in face of an agent) are deeply mined, the aim of facing the different conversation bodies is achieved, but based on the fact that portrait commonalities exist, a standardized superior strategy and conversation aim is selected, and the too much pressure of the number of manual agents and robot agents is reduced to a certain extent.

According to the scheme provided by the embodiment of the application, the difference analysis scheme under the portrait commonality among different conversation bodies is put forward in the branch of the deep learning framework for carrying out fine-granularity grouping on the language bodies based on the limited language facts, so that the difference analysis under the portrait commonality among different conversation bodies is deeply mined in different conversations, and a more flexible 'thousands of people' coping strategy is guided to be selected under the same operation by the seat.

In the above example, three languages are selected for illustration in the training process of the image feature point model, and the three languages are considered to achieve the effect of multiparty verification when the image feature points are manually marked, so that the problem of missing of marking information of the image feature points caused by insufficient semantic expression capability of one or two languages or insufficient knowledge of a marker on the meaning details of some languages can be solved. It should be understood that two languages or a greater number of languages than three may also be selected.

In addition, the method uses multiple languages for pre-training, wherein the multiple languages are not limited too much. In practical application, the larger the difference between the languages is, the better the effect is, and when the preset language is selected, the selection can be performed according to the relation between the languages. For example, the relatives among the preset languages need to be greater than the languages of a certain cluster to be combined for training, so that the technical effect that more comprehensive portrait feature points are represented by the languages with certain differences in expression is achieved.

In addition, in the method, a mode of pretraining by using multiple languages is adopted, and in order to improve the quality of the training corpus, a mode of manual annotator auditing can be adopted to obtain the high-quality training corpus. The annotator needs to be familiar with a preset language, for example, the preset language is a native language of the annotator, and in general, the annotator has deeper and more subtle grasp on the semantic details of the native language, so that the quality of the training corpus can be effectively improved.

In order to solve the problems in the prior art, optionally, an embodiment of the present application further provides a text classification device 110, as shown in fig. 11, including:

an acquisition module 111 that acquires a text segment to be classified;

the generation module 112 generates a plurality of to-be-classified translation text segments expressed by a plurality of preset languages based on the to-be-classified text segments, wherein the emotion of the semantic expression of the plurality of to-be-classified translation text segments is the same as the emotion of the semantic expression of the to-be-classified text segments;

the classification module 113 inputs the text segment to be classified and the plurality of translated text segments to be classified into a text classification model to obtain a classification prediction result of the text segment to be classified, wherein the text classification model is obtained by training an initial text classification model according to a label tag and a training sample, the training sample comprises each text segment in a plurality of semantic text groups, the label tag is a classification tag corresponding to the semantic text group to which the training sample belongs, the plurality of semantic text groups are constructed based on a plurality of initial text segments and a plurality of translated text segments, any one of the semantic text groups comprises a plurality of text segments with translation relations, the plurality of translated text segments are text segments obtained by translating the initial text segments into a plurality of preset languages, the plurality of text segments belonging to the same semantic text group are expressed by a plurality of different preset languages, and the classification tag comprises a semantic tag of semantic expression of each text segment in the corresponding semantic text group.

According to the device provided by the embodiment of the application, the text segments to be classified are translated, prediction output is carried out on the text segments before and after translation through the model, and the output results are summarized as the prediction results of the text segments to be classified. The semantic emotion expressed by the text segment to be classified can be expressed in different languages through translation, so that a model can recognize finer semantic emotion, and the accuracy of a prediction result is improved. The text classification model applied in the embodiment of the application is obtained by training each text segment in a plurality of semantic text groups and classification labels corresponding to the semantic text groups to which the text segments belong, wherein the text segments expressed by a plurality of preset languages can realize training of the model based on the differences expressed by the plurality of preset languages, and the text segments in different languages can have the effect of mutual verification so as to avoid the condition that language characteristics are lost due to the expression limitation of a unique language. The classification labels represent emotions expressed by semantics of text segments in corresponding semantic text groups, and the emotion expressed based on the semantics can be associated with text segments expressed in different preset languages, so that texts expressed in different languages are associated with each other. The text classification model has the capability of expressing emotion in different languages, so that classification limitation possibly caused by a unique language training model is avoided, and the classification accuracy of the model to the text is effectively improved.

The above modules in the apparatus provided in the embodiments of the present application may further implement the method steps provided in the method embodiments described above. Alternatively, the apparatus provided in the embodiments of the present application may further include other modules besides the foregoing modules, so as to implement the method steps provided in the embodiments of the foregoing method. The device provided by the embodiment of the application can achieve the technical effects achieved by the embodiment of the method.

In order to solve the problems in the prior art, optionally, an embodiment of the present application further provides a training device for a text classification model, including:

Through the device provided by the embodiment of the application, the plurality of text segments expressed by the plurality of preset languages can realize training on the model based on the difference of the expressions of the plurality of preset languages, and the text segments in different languages can have the mutual verification effect, so that the condition that the language characteristics are lost due to the expression limitation of the unique language is avoided. The classification labels represent emotions expressed by semantics of text segments in corresponding semantic text groups, and the emotion expressed based on the semantics can be associated with text segments expressed in different preset languages, so that texts expressed in different languages are associated with each other. And then, respectively taking each text segment in the plurality of semantic text groups as a training sample, taking a classification label corresponding to the semantic text group to which the training sample belongs as a training label, and training a text classification model, so that the model can learn the emotion expression capability of different languages, thereby avoiding the classification limitation possibly caused by a unique language training model, and further effectively improving the classification accuracy of the model to the text.

Fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. Referring to fig. 12, at the hardware level, the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, network interface, and memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture ) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 12, but not only one bus or type of bus.

And the memory is used for storing programs. In particular, the program may include program code including computer-operating instructions. The memory may include memory and non-volatile storage and provide instructions and data to the processor.

The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the text classification model training device on a logic level. The processor executes the program stored in the memory, and is specifically configured to execute the text classification method or the training method of the text classification model according to any one of the foregoing embodiments.

The text classification method or the training method of the text classification model according to any of the above embodiments may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in one or more embodiments of the present description may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with one or more embodiments of the present disclosure may be embodied directly in a hardware decoding processor or in a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.

The electronic device may further execute the text classification method or the training method of the text classification model according to any one of the embodiments, which is not described herein.

Of course, in addition to the software implementation, the electronic device in this specification does not exclude other implementations, such as a logic device or a combination of software and hardware, that is, the execution subject of the following process is not limited to each logic unit, but may also be hardware or a logic device.

The embodiment of the present application further provides a computer readable storage medium, on which a computer program is stored, where the computer program when executed by a processor implements each process of the above text classification method or training method embodiment of the text classification model, and the same technical effect can be achieved, so that repetition is avoided, and details are not repeated here. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. A method for training a text classification model, comprising:

2. The method of claim 1, wherein obtaining the respective classification tags for the plurality of semantic text groups comprises:

Sub-classification labels respectively corresponding to all text segments in the semantic text group are obtained, wherein the semantic text group comprises sub-classification labels corresponding to at least one text segment and sub-classification labels corresponding to other text segments in the same group;

and determining the union of sub-classification labels corresponding to the text segments in the semantic text group as the classification label corresponding to the semantic text group.

3. The method of claim 1 or 2, wherein training an initial text classification model based on labeling and training samples to obtain the text classification model comprises:

respectively segmenting the training samples through a segmentation model;

inputting the segmented training sample into a word embedding layer to obtain a text coding result corresponding to the training sample;

classifying and predicting the text coding result based on an attention mechanism to obtain a prediction label corresponding to the training sample;

the text classification model is trained based on the losses between the predictive labels and the labeling labels corresponding to the same training samples.

4. The method of claim 3, wherein inputting the segmented training sample into the word embedding layer to obtain the text encoding result corresponding to the training sample, comprises:

Inputting the training samples after word segmentation into a word segmentation coding layer, a sentence coding layer and a position coding layer in the word embedding layer to obtain word feature vectors output by the word segmentation coding layer, sample difference feature vectors output by the sentence coding layer and language sequence feature vectors output by the position coding layer;

and splicing the word feature vector, the sample difference feature vector and the word order feature vector to obtain text coding results respectively corresponding to the training samples.

5. The method of claim 1 or 2, further comprising, prior to obtaining a plurality of initial text segments and translated text segments having a translation relationship with the initial text segments:

respectively obtaining corpus of a plurality of preset languages, wherein the corpus comprises corpus expressed by the preset languages;

constructing a plurality of translation corpus groups according to the corpuses of the plurality of preset languages, wherein the translation corpus groups comprise a plurality of corpuses with translation relations, and the plurality of corpuses belonging to the same translation corpus group are respectively from corpuses of different preset languages;

constructing a translation corpus based on the plurality of translation corpus groups, wherein the translation corpus is used for expressing translation relations among the plurality of preset languages;

The method for obtaining the plurality of initial text segments and the translation text segments with translation relations with the initial text segments comprises the following steps:

acquiring a plurality of initial text segments;

and translating the plurality of initial text segments based on the pre-constructed translation corpus to obtain translation text segments with translation relations with the initial text segments.

6. A method of text classification, comprising:

acquiring text segments to be classified;

7. The method of claim 6, wherein the text segment to be classified comprises a target dialog text segment between a target user and a dialog user, the dialog user being a user participating in the same dialog as the target user;

inputting the text segment to be classified and the plurality of translated text segments to be classified into a text classification model to obtain a classification prediction result of the text segment to be classified, wherein the method comprises the following steps:

inputting the target dialogue text segment and the first translation text segment into a text classification model to obtain a first classification result, wherein the first translation text segment is translated by the target dialogue text segment;

inputting a text segment of a target user in the target dialogue text segment and a second translation text segment into a text classification model to obtain a second classification result, wherein the second translation text segment is obtained by translating the text segment of the target user in the target dialogue text segment;

inputting a decimated text segment and a third translated text segment in the target dialogue text segment into a text classification model to obtain a third classification result, wherein the decimated text segment comprises a text segment of a target user and at least a part of text segments of the dialogue user, and the third translated text segment is translated from the decimated text segment in the target dialogue text segment;

And determining a classification prediction result of the target dialogue text segment according to the first classification result, the second classification result and the third classification result.

8. The method of claim 7, wherein determining a classification prediction result for the target dialog text segment based on the first classification result, the second classification result, and the third classification result comprises:

determining a classification result intersection of the first classification result, the second classification result and the third classification result, and determining a classification result union of the first classification result, the second classification result and the third classification result;

randomly extracting classification result elements from the difference set of the classification result union set and the classification result intersection set based on a preset proportion;

and determining each element in the classification result intersection and the classification result element as a classification prediction result of the target dialogue text segment.

9. The method of claim 7 or 8, wherein the method further comprises:

acquiring a history dialogue text of the target user and a target dialogue text to which the target dialogue text segment belongs;

replacing at least part of the history dialogue text with the text expressed in the preset language to form translated history dialogue text, and replacing at least part of the target dialogue text with the text expressed in the preset language to form translated target dialogue text;

Inputting the translated historical dialogue text into the text classification model to obtain a historical classification prediction result, and inputting the translated target dialogue text into the text classification model to obtain a text classification prediction result;

wherein after determining the classification prediction result of the target dialog text segment according to the first classification result, the second classification result and the third classification result, the method further comprises:

determining a first similarity of the classification prediction result of the target dialog text segment and the historical classification prediction result, and determining a second similarity of the classification prediction result of the target dialog text segment and the text classification prediction result;

and generating an importance evaluation sentence of the target dialogue text segment according to the first similarity and the second similarity.

10. The method of claim 9, wherein determining a first similarity of the classification prediction result of the target dialog text segment to the historical classification prediction result, and determining a second similarity of the classification prediction result of the target dialog text segment to the text classification prediction result, comprises:

invoking a word embedding module in the text classification model to respectively determine a target portrait characteristic word vector corresponding to a classification prediction result of the target dialogue text segment, a first portrait characteristic word vector corresponding to the historical classification prediction result and a second portrait characteristic word vector corresponding to the text classification prediction result;

The similarity between the target portrait feature word vector and the first portrait feature word vector is determined to be the first similarity, and the similarity between the target portrait feature word vector and the second portrait feature word vector is determined to be the second similarity.

11. A text classification device, comprising:

the acquisition module acquires text segments to be classified;

12. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor, performs the steps of the method according to any one of claims 1 to 5 or 6 to 10.

13. A computer-readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the steps of the method according to any of claims 1 to 5 or 6 to 10.