CN114756678A

CN114756678A - Unknown intention text identification method and device

Info

Publication number: CN114756678A
Application number: CN202210307174.7A
Authority: CN
Inventors: 李健铨; 刘小康; 穆晶晶; 胡加明
Original assignee: Dingfu Intelligent Technology Co ltd
Current assignee: Dingfu Intelligent Technology Co ltd
Priority date: 2022-03-25
Filing date: 2022-03-25
Publication date: 2022-07-15

Abstract

The embodiment of the application provides a method and a device for identifying an unknown intention text. The scheme comprises the following steps: acquiring K positive samples and S negative samples corresponding to each training sample, wherein K and S are positive integers greater than or equal to 1; obtaining sentence representations of training samples and corresponding positive samples and negative samples by using a classifier, enabling the sentence representations of the samples of the same type to be gathered together, and enabling the sentence representations of different types to be far away from each other; determining a decision center of each category according to sentence representation, and learning a decision boundary of each category; judging whether the text to be recognized is positioned outside decision boundaries of all categories; if yes, the text to be recognized is determined to be the unknown intention text. According to the embodiment of the application, contrast learning and classification learning are introduced in the stage of training the classifier, sentence expressions of samples of the same category are gathered together, the sentence expressions of different categories are far away from each other, the effect is better when a decision boundary is trained, and the classifier can accurately recognize texts with unknown intentions.

Description

Unknown intention text identification method and device

Technical Field

The application relates to the technical field of natural language processing, in particular to a method and a device for identifying unknown intention texts.

Background

Text classification is one of the basic tasks in the field of natural language processing technology, and has a very rich application in real life, for example, applications such as public opinion monitoring, news classification, emotion classification based on natural language processing technology are all realized through the text classification task.

At present, a text classification task trains a classification model through training samples of several fixed classes, so that the classification model can identify texts of several fixed classes from unknown texts, however, for unknown texts (namely unknown intentions) which do not belong to the several fixed classes, the classification model cannot classify the texts. For example: in a news classification scenario, if training samples include tags of three categories of sports, economy and entertainment, a classification model trained by using the training samples of the three categories can only classify texts to be recognized of the three categories of sports, economy and entertainment, but texts to be recognized of life categories are unknown intentions for the classification model, but the classification model cannot recognize the unknown intentions.

Additionally, in some scenarios, there may be many text classes, and the class labels of the training samples may only cover the class classes, i.e., the class labels of the training samples are incomplete. For example: in the field of travel mode identification, the class labels of the training samples may include walking, bus taking, bicycle riding and driving, but the travel modes may also include network car booking, train taking, multi-mode transfer and the like, and for the classification model, network car booking, train taking, multi-mode transfer and the like are unknown intentions which cannot be identified.

Disclosure of Invention

The embodiment of the application provides a method and a device for identifying an unknown intention text, which can accurately identify the unknown intention text from the text to be identified.

In a first aspect, an embodiment of the present application provides a method for identifying an unknown intention text, including: acquiring K positive samples and S negative samples corresponding to each training sample, wherein the positive samples are randomly acquired from the same class samples of the training samples, the negative samples are randomly acquired from different class samples of the training samples, and both K and S are positive integers greater than or equal to 1; obtaining sentence expressions of training samples and corresponding positive samples and negative samples by using a classifier, enabling the sentence expressions of the samples of the same class to be gathered together by the classifier through comparing a learning loss function, and enabling the sentence expressions of different classes to be far away from each other through classifying the learning loss function; determining a decision center of each category according to sentence representation, and learning a decision boundary of each category; acquiring the similarity of the text to be recognized and the decision centers of all categories to determine a target category corresponding to the maximum similarity; judging whether the text to be recognized is positioned outside the decision boundary of the target category; if the text to be recognized is located outside the decision boundary of the target category, determining that the text to be recognized is an unknown intention text; and if the text to be recognized is located in the decision boundary of the target category, determining that the text to be classified belongs to the target category.

According to the method provided by the embodiment of the application, the contrast learning and the classification learning are introduced in the stage of training the classifier, so that the sentence representations of the samples in the same category are gathered together, the sentence representations in different categories are far away from each other, the effect is better when the decision boundary is trained, and the classifier can more accurately identify the text with unknown intention from the text to be identified.

In one implementation, the contrast learning penalty function is constructed from the distance between a training sample and any of its positive samples, and the sum of the distances between the training sample and all of its negative samples.

In one implementation, the contrast learning Loss function is embodied as the following Loss₁：

Where N is the number of positive samples, v_iNormalized result, v, representing sentence representation of training samples_jNormalized result, v, representing the sentence representation of positive samples^-Normalized result, V, of sentence representation representing negative samples⁺Represents the set of all positive samples, V^-Denotes the set of all negative samples, τ is the hyper-parameter, exp (v)_i·v_jT) represents the distance between the training sample and any positive sample thereof, Σ_v-∈V-[expv_i·v^-/τ)+expv_j·v^-/τ)]Representing the sum of the distances between the training sample and all its negative samples.

In one implementation, the classification learning loss function is constructed from the cosine distance between the sentence representation of the training sample and the representation of the true label corresponding to its class, and the sum of the cosine distances between the sentence representation of the training sample and the representations of all other class labels.

In one implementation, the class learning penalty function is embodied as Loss₂：

Wherein z is_iSentence representation, θ, representing training samples_yiRepresentation of the true label, θ, representing the training sample_jRepresentation of labels of other classes, cos (θ)_yi,z_i) The cosine distance, cos (θ), between the sentence representation representing the training sample and the representation of the true label corresponding to its class_j,z_i) The cosine distance between the sentence representation of the training sample and the representation of other class labels is represented, m is a preset parameter, and s is a preset multiple.

In one implementation, learning the decision boundary for each category includes: constructing a decision boundary optimization function according to the numerical relationship between the cosine distance between the sentence representation of the training sample and the decision center corresponding to the category of the sentence representation of the training sample and the decision radius, wherein the numerical relationship comprises that the cosine distance between the sentence representation of the training sample and the decision center corresponding to the category of the sentence representation of the training sample is greater than the decision boundary of the category, or the cosine distance between the sentence representation of the training sample and the decision center corresponding to the category of the sentence representation of the training sample is less than or equal to the decision boundary of the category; and learning the decision boundary of each category cluster according to a decision boundary optimization function.

In one implementation, the decision boundary optimization function is embodied as L _b：

Where N is the number of positive samples, Δ_yiRadius of decision representing class, c_yiDecision center, z, representing a category_iSentence representation, cos (c), representing a training sample_yi,z_i) Representing training samples z_iAnd decision center c_yiThe cosine distance between.

In one implementation, the classifier employs the following overall LOSS function LOSS:

LOSS＝Loss₁×a+(1-a)×Loss₂

wherein a is an adjustable hyper-parameter.

In one implementation, the representation of the tag is obtained by: obtaining sentence representations of all training samples of the label using a classifier; the central point of the sentence representations of all training samples of the label is taken as the sentence representation of the label.

In a second aspect, an embodiment of the present application provides an apparatus for recognizing an unknown intention text, including: a processor and a memory, the memory including program instructions which, when executed by the processor, cause the apparatus for recognizing an unknown intended text to perform the method steps of: acquiring K positive samples and S negative samples corresponding to each training sample, wherein the positive samples are randomly acquired from the same class samples of the training samples, the negative samples are randomly acquired from different class samples of the training samples, and both K and S are positive integers greater than or equal to 1; obtaining sentence expressions of training samples and corresponding positive samples and negative samples by using a classifier, enabling the sentence expressions of the samples of the same category to be gathered together by the classifier through comparing a learning loss function, and enabling the sentence expressions of different categories to be far away from each other through the classification learning loss function; determining a decision center of each category according to sentence representation, and learning a decision boundary of each category; acquiring the similarity of the text to be recognized and the decision center of each category to determine a target category corresponding to the maximum similarity; judging whether the text to be recognized is positioned outside the decision boundary of the target category; if the text to be recognized is located outside the decision boundary of the target category, determining that the text to be recognized is the unknown intention text; and if the text to be recognized is positioned in the decision boundary of the target category, determining that the text to be classified belongs to the target category.

The device provided by the embodiment of the application introduces contrast learning and classification learning at the stage of training the classifier, so that sentences of samples of the same category are represented and gathered together, sentences of different categories are represented and kept away from each other, the effect is better when a decision boundary is trained, and the classifier can more accurately identify texts with unknown intentions from texts to be identified.

Drawings

Fig. 1 is a schematic structural diagram of a classifier provided in an embodiment of the present application;

FIG. 2 is a flowchart of a method for recognizing unknown intention text according to an embodiment of the present application;

FIG. 3 is a flow chart for learning decision boundaries for each category provided by embodiments of the present application;

fig. 4 is a schematic structural diagram of an apparatus for recognizing an unknown intention text according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of another unknown intention text recognition apparatus provided in an embodiment of the present application.

Detailed Description

Text classification is one of the basic tasks in the field of natural language processing technology, and has a very rich application in real life, for example, applications such as public opinion monitoring, news classification, emotion classification, etc. implemented based on natural language processing technology are implemented by the text classification task.

At present, a text classification task trains a classification model through training samples of several fixed classes, so that the classification model can identify texts of several fixed classes from unknown texts, however, for unknown texts (i.e. unknown intents) which do not belong to the several fixed classes, the classification model cannot classify the texts. For example: in a news classification scenario, if training samples include tags of three categories of sports, economy and entertainment, a classification model trained by using the training samples of the three categories can only classify texts to be recognized of the three categories of sports, economy and entertainment, but texts to be recognized of life categories are unknown intentions for the classification model, but the classification model cannot recognize the unknown intentions.

Additionally, in some scenarios, there may be many text classes, and the class labels of the training samples may only cover the class classes, i.e., the class labels of the training samples are incomplete. For example: in the field of travel mode identification, class labels of training samples may include walking, bus riding, bicycle riding and driving, but travel modes may also include network car booking, train riding, multi-mode transfer and the like, for classification models, network car booking, train riding, multi-mode transfer and the like are unknown intentions, and the current classification models cannot identify the unknown intentions.

In addition, the current classification model is usually obtained by deep learning model training, and the deep learning model can only give class judgment of the input text in the trained class. For the input text of the untrained class, the deep learning model also gives the class with the highest probability in all the known classes, so that the input text can be classified into the wrong class.

In order to more accurately identify texts with unknown intentions from texts to be identified, the embodiment of the application provides an identification method of texts with unknown intentions. The method may be implemented by training a classification model based on a deep learning algorithm or by other algorithms or means. The training of the classification model may include two stages as a whole, where the first stage is to train the classifier and the second stage is to train the decision boundary. A decision boundary is here understood to be a boundary of a class, which can be used to determine whether a certain sample belongs to a certain class. For example: if the sample of a certain category is positioned in the decision boundary of the certain category, the sample is indicated to belong to the category; if a sample of a certain class is outside the decision boundary of a certain class, it is indicated that the sample does not belong to this class.

The classification model can adopt a pretrained language model such as BERT, roberta, GPT, UniLM and the like as a feature extractor. The classification model may be a deep learning model of an arbitrary structure, such as: the deep learning model is built by RNN, CNN and transformer. Fig. 1 is a schematic structural diagram of a BERT model shown in an embodiment of the present application. As shown in fig. 1, the BERT model as a feature extractor may include an Input Encoding layer embed, a position Encoding layer position Encoding, and N transform blocks. In the stage of training the classifier, an Input Encoding layer is used for carrying out Embedding Encoding on a training sample, a position Encoding layer is used for adding position Encoding to the Embedding Encoding of the training sample, and N transform blocks are used for extracting sentence representation of the training sample.

Fig. 2 is a flowchart of a method for identifying an unknown intention text according to an embodiment of the present application. As shown in fig. 2, the method may include the following steps S101 to S105. Step S101 and step S102 correspond to a stage of training a classifier, and step S103 corresponds to a stage of training a decision boundary.

Step S101, K positive samples and S negative samples corresponding to each training sample are obtained, the positive samples are randomly obtained from the same class samples of the training samples, the negative samples are randomly obtained from different class samples of the training samples, and both K and S are positive integers larger than or equal to 1.

In this embodiment, the training samples may be texts of known categories, such as: word, phrase, sentence, etc. There may be multiple known classes, each of which may contain one or more training samples. For any training sample, the other samples belonging to one class may be used as positive samples, and the samples belonging to a different class may be used as negative samples.

In order to train the classifier, in the embodiment of the present application, for each training sample, K positive samples, for example, 2 positive samples, 3 positive samples, and the like, are randomly selected from the samples of all classes thereof, and S negative samples, for example, 2 negative samples, 3 negative samples, and the like, are randomly selected from the samples of different classes thereof, so as to construct the input of the feature extractor.

And step S102, obtaining sentence expressions of training samples and corresponding positive samples and negative samples by using a classifier, enabling the sentence expressions of the samples of the same category to be mutually gathered by the classifier through comparing a learning loss function, and enabling the sentence expressions of different categories to be mutually separated through the classification learning loss function.

Compared with the traditional mode that sentence representations are gathered and separated from each other only through the classification learning loss function, the embodiment of the application also introduces the comparison learning loss function on the basis of the classification learning loss function. The comparative learning loss function and the classification learning loss function are different in division, the comparative learning loss function is used for enabling sentence expressions of samples of the same type to be gathered together, and the classification learning loss function is used for enabling sentence expressions of different types to be far away from each other. Because the comparison learning can focus on learning common characteristics of samples of the same category, the method provided by the embodiment of the application can enable the gathering effect of sentence expression of the samples of the same category to be better, and is beneficial to improving the accuracy of a follow-up learning decision center and a decision boundary.

In specific implementation, in order to obtain sentence representations of the training samples, the positive samples and the negative samples, the Embedding codes of the training samples, the positive samples and the negative samples may be obtained through an Embedding Layer, and then the Embedding codes are input to the feature extractor to obtain corresponding sentence representations.

Taking the example of the feature extractor as BERT or Robert, the sentence representation of the sample (comprising training samples, sentence representations of positive samples and negative samples) may be a vector corresponding to the first character or the first participle of the output vector of the sample by the feature extractor, i.e., a vector corresponding to [ CLS ] bits.

For example: the participle result of the training sample "Olympic Association man relay" is "Olympic Association/man/relay", so the sentence representation of the training sample is the vector corresponding to the first participle "Olympic Association" output by the feature extractor.

According to the embodiment of the application, sentence representations of the same type of samples are gathered together through comparison learning at the output end of the feature extractor, and sentence representations of different types are separated from each other through classification learning.

In one implementation, the goal of the contrast learning may be through the contrast learning Loss function Loss₁In one implementation, the comparative learning loss function may be constructed according to the distance between the training sample and any positive sample thereof, and the sum of the distances between the training sample and all negative samples thereof.

Exemplary, the comparative learning Loss function Loss₁May be in the form of:

where N is the number of positive samples, v_iNormalized result, v, representing a sentence representation of a training sample_jNormalization result, v, of a sentence representation representing positive samples^-Normalization result, V, of sentence representation representing negative examples⁺Represents the set of all positive samples, V^-Denotes the set of all negative samples, τ is the hyper-parameter, exp (v)_i·v_jT) represents the distance between the training sample and any positive sample thereof, Σ_v-∈V-[exp(v_i·v^-/τ)+exp(v_j·v^-/τ)]Representing the sum of the distances between the training sample and all its negative samples.

In one implementation, the normalization of the sentence representation can be achieved using the following formula:

wherein, X represents the normalized result of sentence representation,

expressing sentence to represent vector, n is dimension of sentence to represent vector, x_iRepresenting the ith dimension value in the sentence representation vector.

In one implementation, the objective of class learning may be through the class learning Loss function Loss₂In one implementation, the taxonomic learning loss function can be constructed from the sum of the cosine distances between the sentence representations of the training samples and the representations of the true tags corresponding to their classes, and the cosine distances between the sentence representations of the training samples and the representations of all other class tags.

Exemplary, class learning Loss function Loss₂May be in the form of:

wherein z is_iRepresentation of sentences, theta, representing training samples_yiRepresentation of the true label, θ, representing the training sample_jRepresentation of labels of other classes, cos (θ)_yi,z_i) The cosine distance, cos (θ), between the sentence representation representing the training sample and the representation of the true label corresponding to its class_j,z_i) The cosine distance between the sentence representation representing the training sample and the representation of other class labels, m is a preset parameter, s is a preset multiple, and m and s are both modifiable parameters.

Illustratively, the Loss function Loss is learned in classification₂For example, s can take a value of 10, 15, 20, etc., and m can take an arbitrary value between 0.3 and 0.5, so that the training sample can be obtainedThe cosine distance between the sentence representation of this text and the representation of the real label corresponding to its category is greater than m.

It should be added here that, in the embodiment of the present application, the representation of the category label can be implemented in three ways:

a first implementation is to initialize the representation of the class label randomly and then learn in the classifier.

The second implementation is to add label description text to the category label, input the category label and its label description text embed coding into the feature extractor, and take the first character of the output vector of the feature extractor or the vector corresponding to the first participle, i.e. the vector corresponding to [ CLS ] bit, as the representation of the category label.

For example, for the category label "sports", the label description text thereof may be "is a kind of physical and cultural activities of human society", and thus the text input into the feature extractor may be "sports: is a physical education activity and a social culture activity of the human society.

A third implementation is to obtain a representation of all training samples for each class label by the feature extractor, and then take the central point of the representation of all training samples for each class label as the representation of each class label.

Illustratively, the representation of the category label may be obtained by the following formula:

wherein, c_kRepresentation of a class label representing the kth class, z_iFor the sentence representation of the i-th training sample in the category, S_kRepresents the set of all training samples in the kth class, | S_kL represents the number of training samples in the kth class.

Based on the two training targets of contrast learning and classification learning introduced in the training classifier stage, the total LOSS function LOSS of the training classifier stage may be:

LOSS＝Loss₁×a+(1-a)×Loss₂

therein, Loss₁Loss function for contrast learning, Loss₂And a is an adjustable super parameter which is used for adjusting the weight occupied by the contrast learning and the boundary learning when the classifier is trained.

And step S103, determining a decision center of each category according to sentence representation, and learning a decision boundary of each category.

Wherein, the decision center may be a central point of all training samples in the category in the semantic space. When the third implementation manner is adopted to obtain the representation of the category label, the representation of the category label can be used as a decision center.

Fig. 3 is a flowchart for learning a decision boundary of each category according to an embodiment of the present application.

As shown in fig. 3, in one implementation, the decision boundary of each category can be obtained by:

step S301, a decision boundary optimization function is constructed according to the numerical relationship between the cosine distance between the sentence representation of the training sample and the decision center corresponding to the category of the training sample and the decision radius.

The numerical relationship comprises that the cosine distance between the sentence representation of the training sample and the decision center corresponding to the category of the sentence representation of the training sample is larger than the decision boundary of the category, or the cosine distance between the sentence representation of the training sample and the decision center corresponding to the category of the sentence representation of the training sample is smaller than or equal to the decision boundary of the category.

Step S302, learning the decision boundary of each category cluster according to a decision boundary optimization function.

Different from the traditional method for measuring the similarity by using the Euclidean distance in decision boundary learning, the method for measuring the similarity between the training sample and the decision center by using the cosine distance is adopted in the embodiment of the application. Among these are considered: the euclidean distance is more important for the absolute distance between the weighing samples, and the cosine distance is more important for the difference between the two samples in a certain direction (for example, an intention), so that the use of the cosine distance for measuring the similarity between the training sample and the decision center can better reflect whether the training sample is similar to or the same as the decision center in the intention.

Exemplary, decision boundary optimization function L_bThe following forms are possible:

where N is the number of positive samples, Δ_yiThe radius of the decision representing the category,

decision center for the representation of class, z_iA sentence representation representing a training sample,

representing training samples z_iAnd decision center

Cosine distance of between, δ_iIndicating whether the training sample is inside the decision boundary. The optimization function is such that L_bAnd smaller is an optimization goal.

Wherein: the larger the cosine distance is, the larger the similarity between the training sample and the decision center is, and the closer the distance between the training sample and the decision center is; the smaller the cosine distance, the smaller the similarity between the training sample and the decision center, and the farther the distance between the training sample and the decision center. Therefore, the above formula is given as

And the decision radius delta_yiCarrying out comparison;

the larger the distance between the training sample and the decision center is;

the smaller the distance between the training sample and the decision center.

According to the optimization function, the main idea of decision boundary learning is as follows: if the training sample of a certain category is inside the decision boundary of the category, the decision boundary is narrowed down to be close to the training sample, and if the training sample of a certain category is outside the decision boundary of the category, the decision boundary is enlarged to contain the training sample. Therefore, the decision boundary of each category can be adaptively adjusted according to the position of the training sample of the category, so that as many training samples of the same category as possible are positioned in the decision boundary of the category, and training samples outside the category are not positioned in the decision boundary of the category as much as possible, so that the learned decision boundary is more accurate. For example: when 1 and training sample z_iAnd decision center c_yiHas a cosine distance greater than delta_yiWhen delta_iThe optimization objective of the optimization function is actually 1

Then, in order to make L_bSmaller, the boundary Δ may be increased_yi。

In addition, if the similarity between the training samples and the decision center is measured by the euclidean distance, the decision boundary optimization function L is determined _bMay be in the form of:

where N is the number of positive samples, Δ_yiRadius of decision representing class, c_yiDecision center for representing categories，z_iA sentence representation representing a training sample,

representing training samples z_iAnd decision center c_yiOf between the Euclidean distance, delta_iIndicating whether the training sample is inside the decision boundary. The optimization function is such that L_bAnd smaller is an optimization goal.

Wherein: the larger the Euclidean distance is, the smaller the similarity between the training sample and the decision center is, and the farther the distance between the training sample and the decision center is; the smaller the Euclidean distance is, the greater the similarity between the training sample and the decision center is, and the closer the distance between the training sample and the decision center is.

And step S104, acquiring the similarity of the text to be recognized and the decision centers of all categories to determine the target category corresponding to the maximum similarity.

In step S104, after the text to be recognized is input to the classifier, the classifier may calculate the similarity between the text to be recognized and the decision center of each category, respectively, so as to determine the target category with the largest similarity.

Wherein:

if the cosine distance is used for representing the similarity, the larger the cosine distance between the text to be recognized and the decision center is, the larger the similarity between the text to be recognized and the decision center is, and conversely, the smaller the cosine distance between the text to be recognized and the decision center is, the smaller the similarity between the text to be recognized and the decision center is. Therefore, the class corresponding to the maximum value of the cosine distance is the target class.

If the similarity is expressed by the Euclidean distance, the larger the Euclidean distance between the text to be recognized and the decision center is, the smaller the similarity between the text to be recognized and the decision center is, and conversely, the smaller the Euclidean distance between the text to be recognized and the decision center is, the larger the similarity between the text to be recognized and the decision center is. Therefore, the class corresponding to the minimum value of the euclidean distance is the target class.

Step S105, judging whether the text to be recognized is located outside the decision boundary of the target category.

Wherein:

if the similarity is expressed in terms of cosine distance, the distance between the text to be recognized and the decision center of the target category may be expressed as: 1-cosine distance. If the 1-cosine distance is larger than the decision radius of the target category, the text to be recognized is positioned outside the decision boundary of the target category; if the 1-cosine distance is less than the decision radius of the target class, it is indicated that the text to be recognized is located within the decision boundary of the target class.

If the similarity is expressed by the Euclidean distance, if the Euclidean distance is greater than the decision radius of the target category, the text to be recognized is positioned outside the decision boundary of the target category; and if the Euclidean distance is smaller than the decision radius of the target category, the text to be recognized is positioned in the decision boundary of the target category.

In addition, for the case that the 1-cosine distance is equal to the decision radius of the target category and the case that the euclidean distance is equal to the decision radius of the target category, the text to be recognized may be considered to be located outside the decision boundary of the target category, or may be considered to be located within the decision boundary of the target category.

And S106, if the text to be recognized is located outside the decision boundary of the target category, determining that the text to be recognized is an unknown intention text.

And S107, if the text to be recognized is located in the decision boundary of the target category, determining that the text to be classified belongs to the target category.

The above steps S104-S106 can be implemented in a test phase or a production phase of unknown intention text recognition.

The above embodiments describe various aspects of the method for recognizing unknown intention texts provided by the present application. It is to be understood that each device or module, in order to implement the above-described functions, includes a corresponding hardware structure and/or software module for performing each function. Those of skill in the art will readily appreciate that the various hardware and method steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Fig. 4 is a schematic structural diagram of an apparatus for recognizing an unknown intention text according to an embodiment of the present application. As shown in fig. 4, the apparatus includes hardware modules for implementing the method for recognizing an unknown intention text provided by the embodiment of the present application, and includes: a processor 210 and a memory 220, the memory 220 comprising program instructions 230, which when executed by the processor 210, cause the apparatus for recognizing an unknown intended text to perform the following method steps:

Acquiring K positive samples and S negative samples corresponding to each training sample, wherein the positive samples are randomly acquired from the same class samples of the training samples, the negative samples are randomly acquired from different class samples of the training samples, and both K and S are positive integers greater than or equal to 1;

obtaining sentence expressions of training samples and corresponding positive samples and negative samples by using a classifier, enabling the sentence expressions of the samples of the same class to be gathered together by the classifier through comparing a learning loss function, and enabling the sentence expressions of different classes to be far away from each other through classifying the learning loss function;

determining a decision center of each category according to sentence representation, and learning a decision boundary of each category;

acquiring the similarity of the text to be recognized and the decision centers of all categories to determine a target category corresponding to the maximum similarity;

judging whether the text to be recognized is positioned outside the decision boundary of the target category;

if the text to be recognized is located outside the decision boundary of the target category, determining that the text to be recognized is an unknown intention text;

and if the text to be recognized is located in the decision boundary of the target category, determining that the text to be classified belongs to the target category.

Fig. 5 is a schematic structural diagram of another unknown intention text recognition apparatus provided in an embodiment of the present application. As shown in fig. 5, the apparatus includes software modules for implementing the method for recognizing an unknown intention text provided by the embodiment of the present application, including:

A sample obtaining module 310, configured to obtain K positive samples and S negative samples corresponding to each training sample, where the positive samples are randomly obtained from samples of the same category of the training samples, the negative samples are randomly obtained from samples of different categories of the training samples, and K and S are both positive integers greater than or equal to 1;

the first training module 320 is configured to obtain training samples and corresponding sentence representations of positive samples and negative samples of the training samples by using a classifier, and the classifier gathers the sentence representations of the samples of the same category by comparing learning loss functions and keeps the sentence representations of different categories away from each other by classifying learning loss functions;

a second training module 330, configured to determine a decision center of each category according to the sentence representation, and learn a decision boundary of each category;

the prediction module 340 is configured to obtain similarity between the text to be recognized and the decision centers of the categories, so as to determine a target category corresponding to the maximum similarity;

the prediction module 340 is further configured to determine whether the text to be recognized is located outside the decision boundary of the target category;

the prediction module 340 is further configured to determine that the text to be recognized is an unknown intention text if the text to be recognized is located outside the decision boundary of the target category;

The prediction module 340 is further configured to determine that the text to be classified belongs to the target category if the text to be recognized is located within the decision boundary of the target category.

According to the device provided by the embodiment of the application, the contrast learning and the classification learning are introduced at the stage of training the classifier, so that the sentence representations of the samples of the same type are gathered together, the sentence representations of different types are far away from each other, the effect is better when the decision boundary is trained, and the classifier can more accurately identify the text with unknown intention from the text to be identified.

It is easily understood that, on the basis of the several embodiments provided in the present application, a person skilled in the art may combine, split, recombine, etc. the embodiments of the present application to obtain other embodiments, which do not depart from the scope of the present application.

The above embodiments, objects, technical solutions and advantages of the embodiments of the present application are described in further detail, it should be understood that the above embodiments are only specific embodiments of the present application, and are not intended to limit the scope of the embodiments of the present application, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the embodiments of the present application should be included in the scope of the embodiments of the present application.

Claims

1. A method for recognizing unknown intention text, comprising:

obtaining K positive samples and S negative samples corresponding to each training sample, wherein the positive samples are randomly obtained from the same class samples of the training samples, the negative samples are randomly obtained from different class samples of the training samples, and both K and S are positive integers greater than or equal to 1;

obtaining the training samples and the sentence representations of the positive samples and the negative samples corresponding to the training samples by using a classifier, wherein the classifier enables the sentence representations of the samples of the same category to be gathered together by comparing with a learning loss function, and enables the sentence representations of different categories to be far away from each other by classifying with the learning loss function;

determining a decision center of each category according to the sentence representation, and learning a decision boundary of each category;

2. The method of claim 1, wherein the contrast learning penalty function is constructed from a distance between the training sample and any of the positive samples thereof, and a sum of distances between the training sample and all of the negative samples thereof.

3. The method of claim 2, wherein the contrast learning penalty function is specified by the following Loss₁：

Where N is the number of positive samples, v_iNormalized result, v, representing a sentence representation of a training sample_jNormalization result, v, of a sentence representation representing positive samples^-Normalization result, V, of sentence representation representing negative examples⁺Represents the set of all positive samples, V^-Denotes the set of all negative samples, τ is the hyper-parameter, exp (v)_i·v_j/τ) represents the distance between the training sample and any of its positive samples, the

Representing the distance between the training sample and all of its negative samplesAnd (c).

4. The method of claim 3, wherein the classification learning loss function is constructed from a cosine distance between the sentence representations of the training samples and the representations of the true tags corresponding to their classes, and a sum of cosine distances between the sentence representations of the training samples and the representations of all other class tags.

5. The method of claim 4, wherein the class learning penalty function is specified by the following Loss₂：

Wherein z is_iRepresentation of sentences, theta, representing training samples_yiRepresentation of the true label, θ, representing the training sample_jRepresentation of labels of other classes, cos (θ)_yi,z_i) Cosine distance, cos (theta), between sentence representations representing the training samples and representations of the true labels corresponding to their classes_j,z_i) And expressing cosine distances between sentence expressions of the training samples and expressions of other class labels, wherein m is a preset parameter, and s is a preset multiple.

6. The method of claim 1, wherein learning the decision boundary for each category comprises:

constructing a decision boundary optimization function according to a numerical relationship between the cosine distance between the sentence representation of the training sample and the decision center corresponding to the category of the sentence representation of the training sample and the decision radius, wherein the numerical relationship comprises that the cosine distance between the sentence representation of the training sample and the decision center corresponding to the category of the sentence representation of the training sample is greater than the decision boundary of the category, or the cosine distance between the sentence representation of the training sample and the decision center corresponding to the category of the sentence representation of the training sample is less than or equal to the decision boundary of the category;

And learning the decision boundary of each category cluster according to the decision boundary optimization function.

7. The method according to claim 6, wherein the decision boundary optimization function is specified by L_b：

Where N is the number of positive samples, Δ_yiRadius of decision representing class, c_yiDecision center for the representation of class, z_iSentence representation, cos (c), representing training samples_yi,z_i) Representing training samples z_iAnd decision center c_yiThe cosine distance between.

8. The method of claim 5, wherein the classifier employs the following overall LOSS function LOSS:

LOSS＝Loss₁×a+(1-a)×Loss₂

wherein a is an adjustable hyper-parameter.

9. The method of claim 4, wherein the representation of the tag is obtained by:

obtaining sentence representations of all training samples of the labels using the classifier;

and taking the central point of sentence representations of all training samples of the label as the sentence representation of the label.

10. An apparatus for recognizing unknown intention text, comprising: a processor and a memory, said memory including program instructions therein which, when executed by said processor, cause said apparatus for recognizing an unknown intended text to perform the method steps of:

Acquiring K positive samples and S negative samples corresponding to each training sample, wherein the positive samples are randomly acquired from samples of the same category of the training samples, the negative samples are randomly acquired from samples of different categories of the training samples, and both K and S are positive integers greater than or equal to 1;

obtaining sentence representations of the training samples and the corresponding positive samples and negative samples by using a classifier, wherein the classifier enables the sentence representations of the samples of the same class to be gathered together by comparing learning loss functions, and enables the sentence representations of different classes to be far away from each other by classifying the learning loss functions;