CN115687934A - Intention recognition method and device, computer equipment and storage medium - Google Patents

Intention recognition method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN115687934A
CN115687934A CN202211718565.4A CN202211718565A CN115687934A CN 115687934 A CN115687934 A CN 115687934A CN 202211718565 A CN202211718565 A CN 202211718565A CN 115687934 A CN115687934 A CN 115687934A
Authority
CN
China
Prior art keywords
data
model
network model
sample
intention recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211718565.4A
Other languages
Chinese (zh)
Inventor
刘伟华
左勇
马金民
林超超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Athena Eyes Co Ltd
Original Assignee
Athena Eyes Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Athena Eyes Co Ltd filed Critical Athena Eyes Co Ltd
Priority to CN202211718565.4A priority Critical patent/CN115687934A/en
Publication of CN115687934A publication Critical patent/CN115687934A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an intention identification method, an intention identification device, computer equipment and a storage medium, wherein the intention identification method comprises the following steps: obtaining sample data, performing feature extraction on the sample data by adopting an initial bert network model to obtain feature data, calculating a loss value based on the feature data by adopting a composite loss function, wherein the composite loss function is constructed based on a ternary loss function and a classified cross entropy, performing back propagation training on the initial bert network model based on the loss value until the loss value is smaller than a preset threshold value to obtain a trained bert network model, taking the trained bert network model as a target intention recognition model, and performing intention recognition by adopting the target intention recognition model. The loss is calculated after measurement and classification are carried out according to the data, the target intention recognition model obtained through training in the mode has better compatibility with the sample data which are not uniformly distributed, and the accuracy of the model obtained through training for intention recognition is improved.

Description

Intention recognition method and device, computer equipment and storage medium
Technical Field
The present invention relates to the field of natural language processing, and in particular, to an intention recognition method, apparatus, computer device, and medium.
Background
With the rapid development of artificial intelligence technology, task processing application is more and more widely performed by adopting an intelligent means, in some intelligent questioning and answering, user intention identification is required, and in a dialog system, a user intention identification result is a key input used by the system for machine action decision, and is usually named by 'verb + noun', for example, relevant case symptom query and the like. With respect to currently existing intent recognition techniques, it is common in academia and industry to classify user intent recognition as a classification problem, i.e., to classify user utterances into predefined intent categories, which mainly includes rule-based intent recognition techniques, statistical machine learning-based intent recognition techniques, and deep learning-based intent recognition techniques.
The inventor realizes that the prior art has at least the following technical problems in the process of implementing the invention:
rule-based user intent recognition techniques, such as: the mapping rule is generated by counting the historical data, and the defects exist: 1. there are computational complexities and domain dependencies for machine learning technology-based dialog technique algorithms. 2. By using supervised learning, a large amount of resources based on the existing data set have relatively high requirements on computing resources, the condition of small data amount is small, overfitting is easy to occur easily, and the methods can not accurately understand deep semantic information of a user text.
The intention recognition method based on deep learning, which carries out intention prediction through the learning of a neural network, has the following defects: 1. the requirement on data quality is high, the influence of data distribution on the classification result is large, and the training time is long on the basis of no pre-training model or language model.
Therefore, the conventional method has the problem that the identification accuracy of the intention identification is low when the sample data amount is small or the sample data distribution is uneven.
Disclosure of Invention
The embodiment of the invention provides an intention identification method, an intention identification device, computer equipment and a storage medium, so as to improve the accuracy of intention identification.
In order to solve the above technical problem, an embodiment of the present application provides an intention identifying method, including:
acquiring sample data;
performing feature extraction on the sample data by adopting an initial bert network model to obtain feature data;
calculating a loss value based on the characteristic data by adopting a composite loss function, wherein the composite loss function is constructed based on a ternary loss function and a classification cross entropy;
carrying out back propagation training on the initial bert network model based on the loss value until the loss value is smaller than a preset threshold value to obtain a trained bert network model, and taking the trained bert network model as a target intention recognition model;
and performing intention recognition by adopting the target intention recognition model.
Optionally, the ternary loss function is:
L tripletloss =max(d(a,m)-d(a,n)+margin,0)
wherein a is an anchor sample of the reference sample, m is a positive sample, n is a negative sample,d() And as a distance function, margin is a parameter larger than 0, and is used for pulling in the distance between the anchor sample a and the positive sample m and pulling out the distance between the anchor sample a and the negative sample n.
Optionally, the composite loss function is:
Figure 987345DEST_PATH_IMAGE001
LOSS=Hp,q)+w*L tripletlos
wherein the content of the first and second substances,Hp,q) In order to classify the cross-entropy function,x i is as followsiThe number of the samples is one,sas to the number of samples,wis a hyperparameter。
Optionally, the initial bert network model is based on a Transformer architecture, and a multi-head attention mechanism is adopted.
Optionally, the performing feature extraction on the sample data by using an initial bert network model to obtain feature data includes:
calculating the coded information of the jth attention module by the following formulahead j
Figure 728773DEST_PATH_IMAGE002
Wherein the content of the first and second substances,Attentionin order to calculate the function for attention,
Figure 32716DEST_PATH_IMAGE003
calculating a weight matrix when the input vectors Q, K and V pass through the jth self-attention module;
and splicing and fusing each piece of coding information to obtain characteristic data.
Optionally, the splicing and fusing each piece of the encoded information to obtain feature data includes:
sequentially splicing each piece of coded information to obtain a spliced matrix;
and fusing the splicing matrixes by adopting a preset fusion mode to obtain the characteristic data.
Optionally, the performing intent recognition by using the target intent recognition model comprises:
obtaining corpus data to be identified;
and inputting the corpus data to be recognized into a target intention recognition model, and performing feature extraction and classification by using the target intention recognition model to obtain an intention recognition result.
In order to solve the above technical problem, an embodiment of the present application further provides an intention identifying apparatus, including:
the sample acquisition module is used for acquiring sample data;
the characteristic extraction module is used for extracting the characteristics of the sample data by adopting an initial bert network model to obtain characteristic data;
a loss value calculation module for calculating a loss value based on the feature data by using a composite loss function, wherein the composite loss function is constructed based on a ternary loss function and a classification cross entropy;
the model training module is used for carrying out back propagation training on the initial bert network model based on the loss value until the loss value is smaller than a preset threshold value, obtaining a trained bert network model, and taking the trained bert network model as a target intention recognition model;
and the intention recognition module is used for recognizing the intention by adopting the target intention recognition model.
Optionally, the feature extraction module includes:
an information calculating unit for calculating the encoded information of the jth attention module by the following formulahead j
Figure 104577DEST_PATH_IMAGE002
Wherein the content of the first and second substances,Attentionin order to calculate the function for attention,
Figure 295518DEST_PATH_IMAGE003
calculating a weight matrix when the input vectors Q, K and V pass through the jth self-attention module;
and the information fusion unit is used for splicing and fusing each piece of coding information to obtain characteristic data.
Optionally, the information fusion unit includes:
the matrix splicing subunit is used for sequentially splicing each piece of the coding information to obtain a spliced matrix;
and the matrix fusion subunit is used for fusing the splicing matrix by adopting a preset fusion mode to obtain the characteristic data.
Optionally, the intent recognition module comprises:
the system comprises a to-be-identified data acquisition unit, a to-be-identified data acquisition unit and a to-be-identified data acquisition unit, wherein the to-be-identified data acquisition unit is used for acquiring corpus data to be identified;
and the intention identification unit is used for inputting the corpus data to be identified into a target intention identification model, and performing feature extraction and classification by using the target intention identification model to obtain an intention identification result.
In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the above intention identifying method when executing the computer program.
In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which stores a computer program, and the computer program realizes the steps of the above intention identifying method when executed by a processor.
According to the intention identification method, the intention identification device, the computer equipment and the storage medium, sample data are obtained, the initial bert network model is adopted to carry out feature extraction on the sample data to obtain feature data, a composite loss function is adopted, a loss value is calculated based on the feature data, the composite loss function is constructed based on a ternary loss function and a classification cross entropy, back propagation training is carried out on the initial bert network model based on the loss value until the loss value is smaller than a preset threshold value to obtain a trained bert network model, the trained bert network model is used as a target intention identification model, and the target intention identification model is adopted to carry out intention identification. The loss is calculated after measurement and classification are carried out according to the data, the target intention recognition model obtained through training in the mode has better compatibility with the sample data which are not uniformly distributed, and the accuracy of the model obtained through training for intention recognition is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow chart of one embodiment of an intent recognition method of the present application;
FIG. 3 is a schematic diagram of a class cross entropy function fit in an embodiment of the present application intent recognition method;
FIG. 4 is a schematic block diagram of one embodiment of an intent recognition device according to the present application;
FIG. 5 is a schematic block diagram of one embodiment of a computer device according to the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein may be combined with other embodiments.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
Referring to fig. 1, as shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
The intention identifying method provided by the embodiment of the application is executed by the server, and accordingly, the intention identifying device is arranged in the server.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. Any number of terminal devices, networks and servers may be provided according to implementation needs, and the terminal devices 101, 102 and 103 in this embodiment may specifically correspond to an application system in actual production.
Referring to fig. 2, fig. 2 shows an intention identification method according to an embodiment of the present invention, which is described by taking the method applied to the server in fig. 1 as an example, and is detailed as follows:
s201: and acquiring sample data.
The sample data is collected corpus data, and may be particularly data with unbalanced distribution or data with even distribution.
S202: and performing feature extraction on the sample data by adopting an initial bert network model to obtain feature data.
Specifically, in this embodiment, the initial bert network model is based on a Transformer architecture, and a multi-head attention mechanism is adopted.
Wherein, the bert model is a language representation model based on the Transformer architecture, and the main constituent part is self-attention mechanism,
Figure 161843DEST_PATH_IMAGE004
the function represents the calculation process of self-attitude, and is shown in the following formula:
Figure 421923DEST_PATH_IMAGE005
the input of which consists of Q, K and V code vectors,
Figure 546743DEST_PATH_IMAGE006
a dimension representing the input vector is determined,
Figure 575879DEST_PATH_IMAGE007
and (3) expressing direct relation of all word (word) vectors, namely firstly segmenting the text and digitally encoding the text segmentation result, then inputting the text into a word embedding layer to obtain word embedding vectors, then corresponding the word embedding vectors to Q, K and V, wherein Q = K = V, and finally calculating weight information of the input word per se for all words by a softmax function. The weighted weight sum of all word vectors of the sentence is calculated through an integral formula, namely, the expression of each word in the sentence contains word context information and has global context information.
In a specific optional implementation manner, in step S202, performing feature extraction on the sample data by using the initial bert network model, and obtaining feature data includes:
calculating the coded information of the jth attention module by the following formulahead j
Figure 144263DEST_PATH_IMAGE002
Wherein the content of the first and second substances,Attentionin order to calculate the function for attention,
Figure 642372DEST_PATH_IMAGE003
calculating a weight matrix when the input vectors Q, K and V pass through the jth self-attention module;
and splicing and fusing each piece of coded information to obtain characteristic data.
Wherein the multi-head Attention mechanism consists of a single Attention mechanism and comprises the following components
Figure 321615DEST_PATH_IMAGE008
In this embodiment, bert adopts a multi-head attention mechanism, context encoding subspace is increased by repeating calculation and more parameters, and a plurality of context encoding subspaces are directly used in the implementation process
Figure 985683DEST_PATH_IMAGE009
And splicing is carried out, so that overfitting is effectively prevented, wherein W is a linear mapping matrix, and matrix parameters are generally formed by the hidden layer dimension and the input dimension of the model.
It should be noted that, when the conventional bert model is used for training a downstream task, if the data quality is poor and the number of samples is small, a good effect is difficult to achieve in the optimization process, because the bert model is trained in a corpus with high data quality. In the embodiment, the data is aggregated by using a metric learning method for the samples of specific categories, so that the problems of poor model generalization and overfitting caused by noise samples and repeated samples are greatly reduced.
Further, splicing and fusing each piece of encoding information to obtain feature data comprises:
splicing each piece of coded information in sequence to obtain a spliced matrix;
and fusing the splicing matrixes by adopting a preset fusion mode to obtain characteristic data.
The preset fusion mode includes but is not limited to: splicing fusion, weighted fusion, etc.
S203: and calculating a loss value based on the characteristic data by adopting a composite loss function, wherein the composite loss function is constructed based on a ternary loss function and a classified cross entropy.
Optionally, the ternary loss function is:
L tripletloss =max(d(a,m)-d(a,n)+margin,0)
wherein a is an anchor sample of the reference sample, m is a positive sample, n is a negative sample,d() For the distance function, margin is a parameter greater than 0, and is used to pull in the distance between the anchor sample a and the positive sample m and pull out the distance between the anchor sample a and the negative sample n.
Optionally, the composite loss function is:
Figure 459390DEST_PATH_IMAGE001
LOSS=Hp,q)+w*L tripletlos
wherein, the first and the second end of the pipe are connected with each other,Hp,q) In order to classify the cross-entropy function,x i is a firstiThe number of the samples is one,sin order to be the number of samples,wis a hyper-parameter.
Wherein, the positive sample refers to a sample of the same type as the anchor sample, and the negative sample refers to a sample of a different type from the anchor sample.
It should be noted that, in the manner of this embodiment, the cost of parameter adjustment may also be reduced, please refer to fig. 3, where fig. 3 (a) is a fitting situation of calculating loss by only using a classified cross entropy function in the prior art, where the loss (loss) is relatively low, the precision is relatively low, and when the loss (loss) is relatively high, the precision is relatively high, which causes overfitting of the model, and the accuracy is low when the model is subsequently used for identification. Fig. 3 (b) is a schematic fitting diagram of the scheme provided in this embodiment, first at the initial stage of training, w weight is large, model learning is biased to metric learning, data representation distribution is changed, the representation space of the same type of samples is narrowed, and the distances between different types of samples are expanded. Through the learning of a plurality of iteration times, the w weight is gradually reduced, the center of gravity of the model is placed on the cross entropy target function of the classifier, and at the moment, the optimal model can be selected according to the classification cross entropy loss value or the precision index, so that the accuracy of the final model is improved. It should be understood that metric learning is adopted in the present embodiment, the main reason is that the training set samples generally have problems of low text quality, poor sample diversity, uneven distribution, and the like, metric learning is used to effectively reduce the distance between samples of the same type, and the distance between samples of different types is enlarged, in the existing manner, the w parameter is set to be a fixed value, so that on one hand, the effectiveness of the w over-parameter cannot be known, the accuracy of the classifier cannot be effectively improved, and negative effects are brought about at a high probability, and training is required to be tried for many times, so that the parameter exchange efficiency is greatly influenced, on the other hand, the fixed w makes the metric loss (ternary loss in the present embodiment) and the optimization assignment of the class cross entropy loss learning target fixed, if the weight assigned to metric learning is large, the overall loss optimization is influenced, if the assigned weight is too small, metric learning is completely ineffective, therefore, in the present embodiment, the manner of setting linear attenuation to w is adopted, the schematic diagram of fig. 3 (c) is a schematic diagram, the weights for effectively balancing the weights of metric loss and the weights of the metric loss and the classifier cross entropy optimization, and the initial stage of the classifier are also effectively balanced, and the two variables of the same classifier, and the two types of the same of the probability distribution of the classifier are effectively balanced.
In this embodiment, the triples (positive samples, negative samples, and anchor samples) are generated by means of random sampling. The dialogue text data is usually converted from voice data, and is influenced by various factors in the conversion process, for example, the converted text data has relatively complex quality and poor quality due to the fact that the dialect expression and the sentence expression of the user are intermittent, and the influence of the problems on the intention recognition result is effectively reduced by using the method of the embodiment.
S204: and performing back propagation training on the initial bert network model based on the loss value until the loss value is smaller than a preset threshold value to obtain a trained bert network model, and taking the trained bert network model as a target intention recognition model.
The preset threshold may be set according to actual needs, and is not limited here, for example, set to 0.05.
S205: and adopting a target intention recognition model for intention recognition.
In a specific optional embodiment, in step S205, performing intent recognition by using the target intent recognition model includes:
obtaining corpus data to be identified;
and inputting the corpus data to be recognized into a target intention recognition model, and performing feature extraction and classification by using the target intention recognition model to obtain an intention recognition result.
In the embodiment, sample data is obtained, an initial bert network model is adopted to perform feature extraction on the sample data to obtain feature data, a composite loss function is adopted, a loss value is calculated based on the feature data, the composite loss function is constructed based on a ternary loss function and a classification cross entropy, back propagation training is performed on the initial bert network model based on the loss value until the loss value is smaller than a preset threshold value to obtain a trained bert network model, the trained bert network model is used as a target intention recognition model, and intention recognition is performed by the target intention recognition model. The loss is calculated after measurement and classification are carried out according to data, the target intention recognition model obtained through training in the mode has better compatibility with sample data which are not uniformly distributed, and the accuracy of intention recognition of the model obtained through training is improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Fig. 4 shows a schematic block diagram of an intention recognition apparatus in one-to-one correspondence with the intention recognition methods of the above-described embodiments. As shown in fig. 4, the intention identifying apparatus includes a sample acquiring module 31, a feature extracting module 32, a loss value calculating module 33, a model training module 34, and an intention identifying module 35. The detailed description of each functional module is as follows:
a sample obtaining module 31, configured to obtain sample data;
the feature extraction module 32 is configured to perform feature extraction on the sample data by using an initial bert network model to obtain feature data;
a loss value calculation module 33, configured to calculate a loss value based on the feature data by using a composite loss function, where the composite loss function is constructed based on a ternary loss function and a classified cross entropy;
the model training module 34 is used for performing back propagation training on the initial bert network model based on the loss value until the loss value is smaller than a preset threshold value, obtaining a trained bert network model, and taking the trained bert network model as a target intention recognition model;
and an intention recognition module 35, configured to perform intention recognition by using the target intention recognition model.
Optionally, the feature extraction module 32 comprises:
an information calculating unit for calculating the coded information of the jth attention module by the following formulahead j
Figure 428483DEST_PATH_IMAGE002
Wherein the content of the first and second substances,Attentionin order to calculate the function for the attention,
Figure 380258DEST_PATH_IMAGE003
when the input vectors Q, K and V pass through the jth self-attention module, calculating a weight matrix;
and the information fusion unit is used for splicing and fusing each piece of coded information to obtain the characteristic data.
Optionally, the information fusion unit includes:
the matrix splicing subunit is used for sequentially splicing each piece of coding information to obtain a spliced matrix;
and the matrix fusion subunit is used for fusing the splicing matrix by adopting a preset fusion mode to obtain the characteristic data.
Optionally, the intention identifying module 35 comprises:
the system comprises a to-be-identified data acquisition unit, a to-be-identified data acquisition unit and a to-be-identified data acquisition unit, wherein the to-be-identified data acquisition unit is used for acquiring corpus data to be identified;
and the intention identification unit is used for inputting the corpus data to be identified into the target intention identification model, and performing feature extraction and classification by using the target intention identification model to obtain an intention identification result.
For the specific definition of the intention identifying means, reference may be made to the above definition of the intention identifying method, which is not described herein again. The various modules in the above-described intent recognition apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In order to solve the technical problem, the embodiment of the application further provides computer equipment. Referring to fig. 5, fig. 5 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 4 comprises a memory 41, a processor 42, and a network interface 43, which are communicatively connected to each other via a system bus. It is noted that only computer device 4 having components connection memory 41, processor 42, network interface 43 is shown, but it is understood that not all of the illustrated components are required to be implemented, and that more or fewer components may alternatively be implemented. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or D interface display memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both internal and external storage devices of the computer device 4. In this embodiment, the memory 41 is generally used for storing an operating system installed in the computer device 4 and various types of application software, such as program codes for controlling electronic files. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute the program code stored in the memory 41 or process data, such as program code for executing control of an electronic file.
The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.
The present application further provides another embodiment, which is to provide a computer-readable storage medium storing an interface display program, which is executable by at least one processor to cause the at least one processor to perform the steps of the intention identification method as described above.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (such as a ROM/RAM, a magnetic disk, and an optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and the embodiments are provided so that this disclosure will be thorough and complete. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (10)

1. An intent recognition method, comprising:
acquiring sample data;
performing feature extraction on the sample data by adopting an initial bert network model to obtain feature data;
calculating a loss value based on the characteristic data by adopting a composite loss function, wherein the composite loss function is constructed based on a ternary loss function and a classified cross entropy, and the composite loss function is as follows:LOSS=Hp,q)+w*L tripletlos Hp, q) In order to classify the cross-entropy function,L tripletlos for the ternary loss function, p and q are the probability distributions for classification true and false respectively,wis a hyper-parameter, andwlinear weakness during training;
performing back propagation training on the initial bert network model based on the loss value until the loss value is smaller than a preset threshold value to obtain a trained bert network model, and taking the trained bert network model as a target intention recognition model;
and performing intention recognition by adopting the target intention recognition model.
2. The intent recognition method of claim 1, wherein the ternary loss function is:
L tripletloss =max(d(a,m)-d(a,n)+margin,0)
wherein a is an anchor sample of the reference sample, m is a positive sample, n is a negative sample,d() And as a distance function, margin is a parameter larger than 0, and is used for pulling in the distance between the anchor sample a and the positive sample m and pulling out the distance between the anchor sample a and the negative sample n.
3. The intent recognition method of claim 2, wherein the classification cross-entropy function is:
Figure 798332DEST_PATH_IMAGE001
wherein the content of the first and second substances,Hp,q) In order to classify the cross-entropy function,x i is a firstiThe number of the samples is one,sfor the number of samples, p and q are the probability distributions for classification true and false, respectively.
4. The intent recognition method of any of claims 1-3, wherein the initial bert network model is based on a transform architecture, employing a multi-head attention mechanism.
5. The intent recognition method of claim 4, wherein performing feature extraction on the sample data using an initial bert network model to obtain feature data comprises:
calculating the coded information of the jth attention module by the following formulahead j
Figure 100132DEST_PATH_IMAGE002
Wherein the content of the first and second substances,Attentionin order to calculate the function for the attention,
Figure 368302DEST_PATH_IMAGE003
calculating a weight matrix when the input vectors Q, K and V pass through the jth self-attention module;
and splicing and fusing each piece of coding information to obtain characteristic data.
6. The method for identifying an intention according to claim 5, wherein the splicing and fusing each of the encoded information to obtain feature data comprises:
splicing each piece of coded information in sequence to obtain a spliced matrix;
and fusing the splicing matrixes by adopting a preset fusion mode to obtain the characteristic data.
7. The intent recognition method of claim 1 wherein said employing the object intent recognition model for intent recognition comprises:
obtaining corpus data to be identified;
and inputting the corpus data to be recognized into a target intention recognition model, and performing feature extraction and classification by using the target intention recognition model to obtain an intention recognition result.
8. An intention recognition apparatus characterized by comprising:
the sample acquisition module is used for acquiring sample data;
the characteristic extraction module is used for extracting the characteristics of the sample data by adopting an initial bert network model to obtain characteristic data;
a loss value calculation module, configured to calculate a loss value based on the feature data by using a composite loss function, where the composite loss function is constructed based on a ternary loss function and a classified cross entropy, where the composite loss function is: LOSS = H (p, q) + w Ltripletlos, H (p, q) being a categorical cross entropy function, ltripletlos being a ternary LOSS function, p and q being probability distributions of categorical true and false, respectively, w being a hyperparameter, and w being linearly decaying during training;
the model training module is used for carrying out back propagation training on the initial bert network model based on the loss value until the loss value is smaller than a preset threshold value, obtaining a trained bert network model, and taking the trained bert network model as a target intention recognition model;
and the intention recognition module is used for performing intention recognition by adopting the target intention recognition model.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the intent recognition method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the intention recognition method of any one of claims 1 to 7.
CN202211718565.4A 2022-12-30 2022-12-30 Intention recognition method and device, computer equipment and storage medium Pending CN115687934A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211718565.4A CN115687934A (en) 2022-12-30 2022-12-30 Intention recognition method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211718565.4A CN115687934A (en) 2022-12-30 2022-12-30 Intention recognition method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115687934A true CN115687934A (en) 2023-02-03

Family

ID=85057447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211718565.4A Pending CN115687934A (en) 2022-12-30 2022-12-30 Intention recognition method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115687934A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117235665A (en) * 2023-09-18 2023-12-15 北京大学 Self-adaptive privacy data synthesis method, device, computer equipment and storage medium
CN117725961A (en) * 2024-02-18 2024-03-19 智慧眼科技股份有限公司 Medical intention recognition model training method, medical intention recognition method and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131970A (en) * 2020-09-07 2020-12-25 浙江师范大学 Identity recognition method based on multi-channel space-time network and joint optimization loss
WO2022048173A1 (en) * 2020-09-04 2022-03-10 平安科技(深圳)有限公司 Artificial intelligence-based customer intent identification method and apparatus, device, and medium
CN114547264A (en) * 2022-02-18 2022-05-27 南京大学 News diagram data identification method based on Mahalanobis distance and comparison learning
CN114610851A (en) * 2022-03-30 2022-06-10 苏州科达科技股份有限公司 Method for training intention recognition model, intention recognition method, apparatus and medium
WO2022227211A1 (en) * 2021-04-30 2022-11-03 平安科技(深圳)有限公司 Bert-based multi-intention recognition method for discourse, and device and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022048173A1 (en) * 2020-09-04 2022-03-10 平安科技(深圳)有限公司 Artificial intelligence-based customer intent identification method and apparatus, device, and medium
CN112131970A (en) * 2020-09-07 2020-12-25 浙江师范大学 Identity recognition method based on multi-channel space-time network and joint optimization loss
WO2022227211A1 (en) * 2021-04-30 2022-11-03 平安科技(深圳)有限公司 Bert-based multi-intention recognition method for discourse, and device and readable storage medium
CN114547264A (en) * 2022-02-18 2022-05-27 南京大学 News diagram data identification method based on Mahalanobis distance and comparison learning
CN114610851A (en) * 2022-03-30 2022-06-10 苏州科达科技股份有限公司 Method for training intention recognition model, intention recognition method, apparatus and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117235665A (en) * 2023-09-18 2023-12-15 北京大学 Self-adaptive privacy data synthesis method, device, computer equipment and storage medium
CN117725961A (en) * 2024-02-18 2024-03-19 智慧眼科技股份有限公司 Medical intention recognition model training method, medical intention recognition method and equipment

Similar Documents

Publication Publication Date Title
WO2022142041A1 (en) Training method and apparatus for intent recognition model, computer device, and storage medium
CN110232114A (en) Sentence intension recognizing method, device and computer readable storage medium
CN112685565A (en) Text classification method based on multi-mode information fusion and related equipment thereof
WO2021121198A1 (en) Semantic similarity-based entity relation extraction method and apparatus, device and medium
CN114780727A (en) Text classification method and device based on reinforcement learning, computer equipment and medium
CN112084779B (en) Entity acquisition method, device, equipment and storage medium for semantic recognition
CN112632244A (en) Man-machine conversation optimization method and device, computer equipment and storage medium
CN112084752A (en) Statement marking method, device, equipment and storage medium based on natural language
CN111858878A (en) Method, system and storage medium for automatically extracting answer from natural language text
CN113434683A (en) Text classification method, device, medium and electronic equipment
JP2021081713A (en) Method, device, apparatus, and media for processing voice signal
CN112699213A (en) Speech intention recognition method and device, computer equipment and storage medium
CN113947095A (en) Multilingual text translation method and device, computer equipment and storage medium
CN117807482B (en) Method, device, equipment and storage medium for classifying customs clearance notes
CN114817478A (en) Text-based question and answer method and device, computer equipment and storage medium
CN113220828B (en) Method, device, computer equipment and storage medium for processing intention recognition model
CN115687934A (en) Intention recognition method and device, computer equipment and storage medium
CN114398466A (en) Complaint analysis method and device based on semantic recognition, computer equipment and medium
CN114091452A (en) Adapter-based transfer learning method, device, equipment and storage medium
CN115730237B (en) Junk mail detection method, device, computer equipment and storage medium
CN116595985A (en) Method for assisting in enhancing emotion recognition in dialogue based on generated common sense
CN113555005B (en) Model training method, model training device, confidence determining method, confidence determining device, electronic equipment and storage medium
CN112818688B (en) Text processing method, device, equipment and storage medium
CN114783423A (en) Speech segmentation method and device based on speech rate adjustment, computer equipment and medium
CN113420869A (en) Translation method based on omnidirectional attention and related equipment thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230203

RJ01 Rejection of invention patent application after publication