CN113139063A

CN113139063A - Intention recognition method, device, equipment and storage medium

Info

Publication number: CN113139063A
Application number: CN202110682988.4A
Authority: CN
Inventors: 丁嘉罗; 董世超; 乔建秀
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-06-21
Filing date: 2021-06-21
Publication date: 2021-07-20
Anticipated expiration: 2041-06-21
Also published as: CN113139063B

Abstract

The invention relates to the field of artificial intelligence, and provides an intention recognition method, an intention recognition device, an intention recognition equipment and a storage medium, wherein a Transformer model can be pre-trained by utilizing first sample data of a source domain to obtain a feature extraction model, the feature extraction model is externally connected with a first preset classifier to obtain a first initial model, the first initial model is supervised and learned by utilizing second sample data associated with a target task to obtain a first model, data of a target domain is obtained to be used as third sample data, the first model is externally connected with a second preset classifier to obtain a second initial model, the second initial model is subjected to confrontation training by utilizing the third sample data to obtain an intention recognition model, data to be processed is input to the intention recognition model, and the data to be processed is output to be used as a target intention. The invention can be used for realizing the accurate recognition of the intention by combining the idea of transfer learning and a multi-level classification framework. The invention also relates to blockchain techniques, and the intention recognition model can be stored on blockchain nodes.

Description

Intention recognition method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an intention identification method, device, equipment and storage medium.

Background

At present, a dialog system and an intelligent assistant are typical applications of artificial intelligence, and are used as an efficient and friendly interactive interface and widely applied to various service scene software. Among the core components of a dialog system is an intent understanding module. The operation mode of intention understanding is to receive an instruction or a conversation in a natural language and judge the intention category of a sender.

Typically, intent recognition will be modeled as a supervised text classification task. Namely, a large amount of dialog texts are labeled, and a machine learning model is trained in a supervised mode to have the capability of judging the type of an input text. To ensure the model effect, such a training paradigm usually has three assumptions:

1. a large amount of training data exists or can be labeled;

2. the training data and the prediction data are of the same distribution;

3. the training model can extract the corresponding domain knowledge.

However, in a real industrial environment, the above three assumptions do not always exist. In an industrial environment, the labeling and acquisition of data often require high cost, and particularly in a deep learning paradigm, the training of a model usually requires a large amount of training data; in addition, a large amount of domain-specific data exists in the industry, but the distribution is different from the definition distribution of target task data, so that no way is available for direct training and prediction; finally, with a single task of supervised learning, the training mode usually captures limited information, and there is no way to obtain more knowledge from more domain corpora and related tasks.

Disclosure of Invention

The embodiment of the invention provides an intention identification method, an intention identification device, intention identification equipment and a storage medium, which can be used for accurately identifying an intention by combining a transfer learning idea and a multi-level classification framework.

In a first aspect, an embodiment of the present invention provides an intention identification method, which includes:

in response to an intention identification instruction, determining a source domain according to the intention identification instruction, and constructing first sample data;

pre-training a Transformer model by using the first sample data to obtain a feature extraction model;

determining a target task, and acquiring data associated with the target task as second sample data;

externally connecting the feature extraction model with a first preset classifier to obtain a first initial model, and performing supervised learning on the first initial model by using the second sample data to obtain a first model;

determining a target domain, and acquiring data of the target domain as third sample data;

connecting the first model with a second preset classifier externally to obtain a second initial model;

constructing a target loss function;

performing countermeasure training on the second initial model by using the third sample data based on the target loss function to obtain an intention recognition model;

and acquiring data to be processed, inputting the data to be processed into the intention recognition model, and acquiring the output of the intention recognition model as a target intention.

According to the preferred embodiment of the present invention, the pre-training the Transformer model by using the first sample data to obtain the feature extraction model includes:

acquiring general corpus data from the first sample data;

randomly selecting words from the general corpus data, and replacing the selected words with masks;

randomly disturbing the sentence relation of the general corpus data;

performing masking prediction training on the Transformer model according to the mask, and performing next sentence prediction training on the Transformer model by using the disordered sentences to obtain an intermediate model;

and acquiring the data of the source domain from the first sample data, and retraining the intermediate model by using the data of the source domain to obtain the feature extraction model.

According to a preferred embodiment of the present invention, the performing supervised learning on the first initial model by using the second sample data to obtain a first model includes:

inputting the second sample data into the feature extraction model for feature extraction to obtain an embedded vector representation of the second sample data;

acquiring a label of each data in the second sample data;

taking the embedded vector representation of the second sample data as a training sample, training the first preset classifier according to the label of each data until the loss of the first preset classifier reaches convergence, and stopping training;

determining a current first initial model as the first model.

According to a preferred embodiment of the present invention, said constructing the objective loss function comprises:

constructing a class classification loss function for the first preset classifier;

constructing a domain classification loss function;

configuring a first weight for the class classification loss function and a second weight for the domain classification loss function;

and calculating a weighted sum according to the first weight, the second weight, the class classification loss function and the domain classification loss function to obtain the target loss function.

According to a preferred embodiment of the present invention, the formula of the class classification loss function is as follows:

wherein the content of the first and second substances,

a loss function representing the class of the signal,

representing the first predictionOutput correspondence of classifier

The probability of the label is determined by the probability of the label,

an embedded vector representation representing the output of the feature extraction model,

which represents the (i) th input(s),

to represent

And i is a positive integer corresponding to the label.

According to a preferred embodiment of the present invention, the formula of the domain classification loss function is as follows:

wherein the content of the first and second substances,

representing the domain classification loss function in question,

a domain label is represented that indicates a domain tag,

indicating that the second preset classifier output label is

The probability of (c).

According to a preferred embodiment of the present invention, the performing a countermeasure training on the second initial model by using the third sample data based on the target loss function to obtain an intention recognition model includes:

randomly extracting data with a preset proportion from the third sample data, and recording the label of the extracted data as the label of the source domain to obtain fourth sample data;

and performing simulated reversal gradient training on the second initial model by using the fourth sample data until the value of the target loss function is not reduced any more, and stopping training to obtain the intention recognition model.

In a second aspect, an embodiment of the present invention provides an intention identifying apparatus, which includes:

the construction unit is used for responding to the intention identification instruction, determining a source domain according to the intention identification instruction and constructing first sample data;

the training unit is used for pre-training the Transformer model by utilizing the first sample data to obtain a feature extraction model;

the acquisition unit is used for determining a target task and acquiring data associated with the target task as second sample data;

the learning unit is used for externally connecting the feature extraction model with a first preset classifier to obtain a first initial model, and performing supervised learning on the first initial model by using the second sample data to obtain a first model;

the acquiring unit is further configured to determine a target domain and acquire data of the target domain as third sample data;

the external unit is used for externally connecting the first model with a second preset classifier to obtain a second initial model;

the construction unit is also used for constructing a target loss function;

the training unit is further configured to perform countermeasure training on the second initial model by using the third sample data based on the target loss function to obtain an intention recognition model;

the acquisition unit is further configured to acquire data to be processed, input the data to be processed to the intention recognition model, and acquire an output of the intention recognition model as a target intention.

In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the intention identifying method according to the first aspect when executing the computer program.

In a fourth aspect, the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, causes the processor to execute the intention identifying method according to the first aspect.

The embodiment of the invention provides an intention recognition method, an intention recognition device, computer equipment and a storage medium, which can pre-train a Transformer model by utilizing first sample data of a source domain to obtain a feature extraction model, externally connect the feature extraction model with a first preset classifier to obtain a first initial model, perform supervised learning on the first initial model by utilizing second sample data associated with a target task to obtain a first model, obtain data of a target domain as third sample data, externally connect the first model with a second preset classifier to obtain a second initial model, perform confrontation training on the second initial model by utilizing the third sample data to obtain an intention recognition model based on a target loss function, input data to be processed into the intention recognition model, and output the intention as a target intention. The invention can be used for realizing the accurate recognition of the intention by combining the idea of transfer learning and a multi-level classification framework.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of an intention identification method according to an embodiment of the present invention;

FIG. 2 is a schematic block diagram of an intent recognition apparatus provided by an embodiment of the present invention;

FIG. 3 is a schematic block diagram of a computer device provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Fig. 1 is a schematic flow chart of an intention identification method according to an embodiment of the invention.

S10, responding to the intention identification instruction, determining a source domain according to the intention identification instruction, and constructing first sample data.

In this embodiment, the intention identification instruction may be configured to be triggered periodically or by a relevant worker, and the present invention is not limited thereto.

In the present embodiment, the source domain refers to a domain having sufficient data satisfying training requirements as a support, such as an insurance domain.

In at least one embodiment of the invention, the determining a source domain from the intent recognition instruction and constructing the first sample data comprises:

analyzing the intention identification instruction to obtain information carried by the intention identification instruction;

acquiring a label corresponding to the field;

constructing a regular expression according to the acquired label;

traversing the information carried by the intention identification instruction by using the regular expression, and determining the traversed information as a domain name;

determining the source domain according to the domain name;

acquiring data of the source domain;

calling a general corpus and acquiring data of the general corpus;

integrating the acquired data as the first sample data.

For example: the label can be NAME, and the constructed regular expression is NAME ().

Through the embodiment, the source domain can be accurately determined based on the label and the regular expression, and the training sample is further constructed by combining the source domain and the general language library so as to ensure the effect of subsequent model training.

And S11, pre-training the Transformer model by using the first sample data to obtain a feature extraction model.

In this embodiment, the pre-training the Transformer model by using the first sample data to obtain a feature extraction model includes:

acquiring general corpus data from the first sample data;

randomly disturbing the sentence relation of the general corpus data;

For example, data in an arbitrary chinese encyclopedia database is used as the general corpus data, such as sentences in the general corpus data: "why the company A can say that you are the first choice for insurance

Because the company A is selected, the top company of the insurance industry is also selected as a development platform ". Randomly selecting the word "company a" from the sentence, and replacing "company a" with "mask", resulting in a replaced sentence: why "mask" can be said to be the first choice of the insurance you are working on

Because a mask is selected, a top company of the insurance industry is selected as a development platform. And taking the replaced sentences as the input of the transform model, if the model is wrong in prediction and loss is increased, carrying out gradient reduction to update model parameters, and in the non-label unsupervised learning process, training aims to enable the model to capture more context information as much as possible to correctly predict words of the mask.

Meanwhile, taking the above sentence as an example, the sentence relationship of the sentence is randomly disturbed, and the original sentence is changed into: "because of selecting the first company, the top company of the insurance industry is also selected as the development platform. Why company A can say that you are the first choice of insurance ", i.e. the 50% ordering is randomly disturbed. It can be understood that before the confusion, the sentence is in a top-bottom sentence relationship, but after the confusion, the sentence is not in a top-bottom sentence relationship, and the training is to make the model recognize the top-bottom sentence relationship as much as possible.

By masking prediction training and next sentence prediction training, the randomly masked word and sentence relation is predicted according to limited context information, so that the model learns text semantic information and text representation capability based on context.

Furthermore, the intermediate model is retrained through the data of the source domain (such as an insurance domain), so that the model has the semantic information and language expression capability of a specific domain.

In the above embodiment, the feature representation capability of the model in the specific field is enhanced by pre-training based on a large amount of general corpus which is easy to obtain and re-training based on a small amount of source domain corpus which is difficult to obtain, and by adopting the hierarchical and progressive training mode, the difference in corpus quantity can be fully utilized, and the requirement of the vertical field can be met.

S12, determining a target task, and acquiring data associated with the target task as second sample data.

In this embodiment, the target task may include intention recognition, and the target task may be determined according to the intention recognition instruction or according to a configuration of a user, and the present invention is not limited thereto.

In at least one embodiment of the present invention, the acquiring data associated with the target task as second sample data includes:

acquiring historical data of the target task;

identifying tagged data from the historical data as the second sample data; and/or

And acquiring data from the historical data, and performing label processing on the acquired data to obtain the second sample data.

It should be noted that, because supervised training is performed subsequently, a large amount of labeled data is required to be used as data support.

And S13, externally connecting the feature extraction model with a first preset classifier to obtain a first initial model, and performing supervised learning on the first initial model by using the second sample data to obtain a first model.

In this embodiment, the first preset classifier may be any conventional classification model, such as a full connectivity layer combined with a Softmax activation function.

In this embodiment, the structure of the first initial model is that the trained feature extraction model is externally connected with a first preset classifier.

Specifically, the performing supervised learning on the first initial model by using the second sample data to obtain a first model includes:

acquiring a label of each data in the second sample data;

determining a current first initial model as the first model.

It can be understood that, since the first initial model is an integral model formed by the feature extraction model externally connecting the first preset classifier, the parameters of the feature extraction model and the first preset classifier are updated simultaneously when supervised learning is performed.

Through the embodiment, the model can be finely adjusted by combining with an actual target task.

Of course, according to a specific service scenario, a certain proportion of parameters are frozen to the feature extraction model and are not updated. In this way, only a small fraction of the parameters are updated with the first pre-set classifier to achieve fine tuning of the feature extraction model. For example: when the feature extraction model is a 12-layer (base) model, the first 10 transformers are frozen upwards from the bottom layer, and the specific number of frozen transformers belongs to a hyper-parameter, and needs to be adjusted and determined according to actual effects.

S14, determining a target domain and acquiring data of the target domain as third sample data.

In the present embodiment, the target domain refers to a domain relatively lacking data support, such as a bank domain. The target domain may be determined according to the intention identifying instruction, or may be determined according to a configuration of a user, which is not limited in the present invention.

It should be noted that, due to the lack of sufficient data support, the data in the target domain may be tagged, untagged, or partially tagged.

And S15, connecting the first model with a second preset classifier externally to obtain a second initial model.

In this embodiment, the second preset classifier may also be any conventional classification model, such as a full connectivity layer combined with a Softmax activation function, and the present invention is not limited thereto.

The structure of the second initial model is that the first model obtained by training is externally connected with the second preset classifier, namely the second initial model is composed of the feature extraction model, the first preset classifier and the second preset classifier.

S16, constructing an objective loss function.

In keeping with the above embodiments, the constructing the target loss function includes:

constructing a domain classification loss function;

The first weight and the second weight may be configured by user-defined, and may be adjusted dynamically according to an actual application scenario, which is not limited in the present invention.

Through the configuration of the loss function, the class classification loss function and the domain classification loss function are integrated together to be used as the overall loss, so that the trained model can meet the requirements of class classification and domain classification at the same time.

Specifically, the formula of the class classification loss function is as follows:

wherein the content of the first and second substances,

a loss function representing the class of the signal,

indicating that the first preset classifier output corresponds to

The probability of the label is determined by the probability of the label,

which represents the (i) th input(s),

to represent

And i is a positive integer corresponding to the label.

Through the above embodiment, the class classification loss function is first constructed for the first preset classifier, so as to realize correct classification of the classes.

Further, the domain classification loss function is formulated as follows:

wherein the content of the first and second substances,

representing the domain classification loss function in question,

a domain label is represented that indicates a domain tag,

indicating that the second preset classifier output label is

The probability of (c).

Wherein the content of the first and second substances,

representing a particular domain, e.g. configurable

A value of 1 indicates the target domain,

and 0 represents the source domain.

Through the implementation mode, the domain classification loss function is constructed for the second preset classifier so as to realize correct classification of the domain.

And S17, performing countermeasure training on the second initial model by using the third sample data based on the target loss function to obtain an intention recognition model.

In this embodiment, the performing countermeasure training on the second initial model by using the third sample data based on the target loss function to obtain the intention recognition model includes:

Specifically, the intention recognition model includes a feature extraction model, a class classifier (i.e., the first preset classifier), and a domain classifier (i.e., the second preset classifier).

Wherein the function of the feature extraction model is:

(1) extracting features required by classification of a subsequent classifier;

(2) and mapping the source domain data and the target domain data to the same space.

The function of the category classifier is as follows: and classifying the extracted source domain data characteristics.

The function of the domain classifier is as follows: and judging whether the extracted characteristic information is from a source domain or a target domain.

The whole training process can be adapted to a paradigm of countermeasure learning, and in the training process, a part of target domain labels of the same batch are dynamically changed into source domain labels, so that a countermeasure relation is formed to reverse the gradient, the training difficulty of the domain discriminator is increased, and the domain classifier cannot correctly judge which domain the characteristic information comes from. After the loss function of the domain classifier reaches the minimum, the target domain data and the source domain data are mixed together, and at the moment, the feature extractor can map the target domain data and the source domain data to the same feature space, so that the problem of inconsistent data distribution is solved.

Meanwhile, the feature extraction model is limited to ensure that the extracted feature information can be used for subsequent classification tasks, and the problems of small data volume of a target domain and lack of classification labels are solved, so that two loss functions of class classification loss and domain classification loss are added together according to a certain proportion to obtain an overall loss function. And updating parameters of the three parts of the framework simultaneously by a gradient descent method based on the overall loss function. When the total loss function reaches the minimum, the target domain data and the source domain data can be mapped to the same feature space and correctly classified.

In actual work, when the data of a target domain is less, the difficulty in obtaining training corpora is higher, or the target domain has a large amount of unlabeled data but the cost of manual labeling is higher, the traditional supervised learning, especially the deep learning, is adopted, and a large amount of labeled data is needed to fit a model in the training process. By the knowledge and the domain migration method, the existing models and the learned ability can be reused for similar tasks or tasks in the same domain in a small-amount (Few-shot) or non-label (zero-shot) mode, so that the data cost and the labor cost can be reduced, high-quality intention identification service is provided based on limited data, new intention identification can be flexibly added, and the method has easy expandability; a bi-directional coder representation model containing multi-domain text-related information can be obtained while being reusable to other text-related tasks.

S18, acquiring data to be processed, inputting the data to be processed into the intention recognition model, and acquiring the output of the intention recognition model as a target intention.

In this embodiment, the data to be processed may be uploaded by a relevant worker, which is not limited in the present invention.

In the embodiment, the intention recognition model obtained through training can be combined with the idea of transfer learning and a multi-level classification framework to realize accurate recognition of the intention.

It should be noted that, in order to further ensure the security of the data and avoid malicious tampering of the data, the intention identification model may be stored on the blockchain node.

According to the technical scheme, the method can utilize first sample data of a source domain to pre-train a Transformer model to obtain a feature extraction model, the feature extraction model is externally connected with a first preset classifier to obtain a first initial model, second sample data associated with a target task is utilized to perform supervised learning on the first initial model to obtain the first model, data of a target domain is obtained to serve as third sample data, the first model is externally connected with a second preset classifier to obtain a second initial model, the second initial model is subjected to countermeasure training on the basis of a target loss function and the third sample data to obtain an intention identification model, data to be processed are input to the intention identification model, and the data to be processed are output to serve as a target intention. The invention can be used for realizing the accurate recognition of the intention by combining the idea of transfer learning and a multi-level classification framework.

Embodiments of the present invention further provide an intention identification apparatus, which is configured to execute any one of the embodiments of the aforementioned intention identification method. Specifically, referring to fig. 2, fig. 2 is a schematic block diagram of an intention identifying apparatus according to an embodiment of the present invention.

As shown in fig. 2, the intention recognition apparatus 100 includes: the device comprises a construction unit 101, a training unit 102, an acquisition unit 103, a learning unit 104 and an external unit 105.

In response to the intention identifying instruction, the constructing unit 101 determines a source domain from the intention identifying instruction, and constructs first sample data.

In at least one embodiment of the present invention, the constructing unit 101 determines the source domain according to the intention identifying instruction, and constructing the first sample data includes:

acquiring a label corresponding to the field;

constructing a regular expression according to the acquired label;

determining the source domain according to the domain name;

acquiring data of the source domain;

calling a general corpus and acquiring data of the general corpus;

integrating the acquired data as the first sample data.

The training unit 102 pre-trains the Transformer model by using the first sample data to obtain a feature extraction model.

In this embodiment, the pre-training the transform model by the training unit 102 using the first sample data to obtain the feature extraction model includes:

acquiring general corpus data from the first sample data;

randomly disturbing the sentence relation of the general corpus data;

The acquisition unit 103 determines a target task and acquires data associated with the target task as second sample data.

In at least one embodiment of the present invention, the acquiring unit 103 acquires data associated with the target task as second sample data includes:

acquiring historical data of the target task;

The learning unit 104 connects the feature extraction model with a first preset classifier to obtain a first initial model, and performs supervised learning on the first initial model by using the second sample data to obtain a first model.

Specifically, the learning unit 104 performs supervised learning on the first initial model by using the second sample data, and obtaining a first model includes:

acquiring a label of each data in the second sample data;

determining a current first initial model as the first model.

The acquiring unit 103 determines a target domain and acquires data of the target domain as third sample data.

The external connection unit 105 connects the first model with a second preset classifier to obtain a second initial model.

The construction unit 101 constructs an objective loss function.

In line with the above embodiment, the constructing unit 101 constructs the target loss function, including:

constructing a domain classification loss function;

wherein the content of the first and second substances,

a loss function representing the class of the signal,

indicating that the first preset classifier output corresponds to

The probability of the label is determined by the probability of the label,

which represents the (i) th input(s),

to represent

And i is a positive integer corresponding to the label.

Further, the domain classification loss function is formulated as follows:

wherein the content of the first and second substances,

representing the domain classification loss function in question,

a domain label is represented that indicates a domain tag,

indicating that the second preset classifier output label is

The probability of (c).

Wherein the content of the first and second substances,

representing a particular domain, e.g. configurable

A value of 1 indicates the target domain,

and 0 represents the source domain.

The training unit 102 performs countermeasure training on the second initial model by using the third sample data based on the target loss function, to obtain an intention recognition model.

In this embodiment, the training unit 102, based on the target loss function, and performing a countermeasure training on the second initial model by using the third sample data, to obtain the intention recognition model includes:

Wherein the function of the feature extraction model is:

(1) extracting features required by classification of a subsequent classifier;

The acquisition unit 103 acquires data to be processed, inputs the data to be processed to the intention recognition model, and acquires an output of the intention recognition model as a target intention.

The above-mentioned intention recognition means may be implemented in the form of a computer program which can be run on a computer device as shown in fig. 3.

Referring to fig. 3, fig. 3 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device 500 is a server, and the server may be an independent server or a server cluster composed of a plurality of servers.

Referring to fig. 3, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a storage medium 503 and an internal memory 504.

The storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, may cause the processor 502 to perform the intent recognition method.

The processor 502 is used to provide computing and control capabilities that support the operation of the overall computer device 500.

The internal memory 504 provides an environment for the execution of the computer program 5032 in the storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 can be caused to execute the intention identification method.

The network interface 505 is used for network communication, such as providing transmission of data information. Those skilled in the art will appreciate that the configuration shown in fig. 3 is a block diagram of only a portion of the configuration associated with aspects of the present invention and is not intended to limit the computing device 500 to which aspects of the present invention may be applied, and that a particular computing device 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

The processor 502 is configured to run the computer program 5032 stored in the memory to implement the intention identifying method disclosed in the embodiment of the invention.

Those skilled in the art will appreciate that the embodiment of a computer device illustrated in fig. 3 does not constitute a limitation on the specific construction of the computer device, and in other embodiments a computer device may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may only include a memory and a processor, and in such embodiments, the structures and functions of the memory and the processor are consistent with those of the embodiment shown in fig. 3, and are not described herein again.

It should be understood that, in the embodiment of the present invention, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In another embodiment of the invention, a computer-readable storage medium is provided. The computer-readable storage medium may be a nonvolatile computer-readable storage medium or a volatile computer-readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program realizes the intention identifying method disclosed by the embodiment of the invention when being executed by a processor.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only a logical division, and there may be other divisions when the actual implementation is performed, or units having the same function may be grouped into one unit, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An intent recognition method, comprising:

constructing a target loss function;

2. The method of claim 1, wherein the pre-training the Transformer model with the first sample data to obtain a feature extraction model comprises:

acquiring general corpus data from the first sample data;

randomly disturbing the sentence relation of the general corpus data;

3. The method according to claim 1, wherein the supervised learning of the first initial model using the second sample data to obtain the first model comprises:

acquiring a label of each data in the second sample data;

determining a current first initial model as the first model.

4. The intent recognition method of claim 1, wherein said constructing an objective loss function comprises:

constructing a domain classification loss function;

5. The intent recognition method of claim 4, wherein the category classification loss function is formulated as follows:

wherein the content of the first and second substances,

a loss function representing the class of the signal,

indicating that the first preset classifier output corresponds to

The probability of the label is determined by the probability of the label,

which represents the (i) th input(s),

to represent

And i is a positive integer corresponding to the label.

6. The intent recognition method of claim 5, wherein the domain classification loss function is formulated as follows:

wherein the content of the first and second substances,

representing the domain classification loss function in question,

a domain label is represented that indicates a domain tag,

indicating that the second preset classifier output label is

The probability of (c).

7. The method of claim 1, wherein performing a countermeasure training on the second initial model using the third sample data based on the target loss function to obtain an intent recognition model comprises:

8. An intention recognition apparatus, comprising:

the construction unit is also used for constructing a target loss function;

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the intent recognition method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to execute the intention identification method according to any one of claims 1 to 7.