WO2022141864A1 - 对话意图识别模型训练方法、装置、计算机设备及介质 - Google Patents

对话意图识别模型训练方法、装置、计算机设备及介质 Download PDF

Info

Publication number
WO2022141864A1
WO2022141864A1 PCT/CN2021/083953 CN2021083953W WO2022141864A1 WO 2022141864 A1 WO2022141864 A1 WO 2022141864A1 CN 2021083953 W CN2021083953 W CN 2021083953W WO 2022141864 A1 WO2022141864 A1 WO 2022141864A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample data
dialogue
preset
recognition model
enhanced
Prior art date
Application number
PCT/CN2021/083953
Other languages
English (en)
French (fr)
Inventor
王健宗
宋青原
吴天博
程宁
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022141864A1 publication Critical patent/WO2022141864A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the technical field of semantic parsing, and in particular, to a method, apparatus, computer equipment and medium for training a dialogue intent recognition model.
  • the field of natural language processing technology can be applied to intelligent multi-round dialogue systems, text similarity judgment systems and other systems.
  • the intelligent multi-round dialogue system needs to identify the customer's intention according to the content of the dialogue, and the intention is used for subsequent process control and dialogue generation. Therefore, intention recognition is a key technology in intelligent multi-round dialogue.
  • the inventor realizes that, in the prior art, intent recognition often uses an intent recognition model for intent extraction, and the training of the intent recognition model needs to use labeled data, and the existing labeled data needs to be obtained from historical dialogue information. After being screened, the labels are labeled by manual labeling. However, the amount of labeled data obtained by this method is often not sufficient, so that the intent recognition model cannot be fully trained, thus making the intent recognition model accurate. rate is lower.
  • Embodiments of the present application provide a method, apparatus, computer equipment and medium for training a dialogue intent recognition model, so as to solve the problem that the accuracy of the intent recognition model is low due to the insufficient amount of labeled data.
  • a method for training a dialogue intent recognition model comprising:
  • the dialogue sample data set includes at least one first dialogue sample data without a dialogue intention label
  • the enhanced sample data set includes at least one enhanced sample data
  • the first dialogue sample data and the enhanced sample data are input into the initial intent recognition model including the first initial parameter, and the enhanced intent recognition is performed on the first dialogue sample data and the enhanced sample data to obtain the same a first sample distribution corresponding to the first dialogue sample data, and a second sample distribution corresponding to the enhanced sample data;
  • the initial intent recognition model described above is recorded as a dialog intent recognition model.
  • a dialogue intention recognition model training device comprising:
  • a dialogue sample data set acquisition module configured to acquire a preset dialogue sample data set;
  • the dialogue sample data set includes at least one first dialogue sample data without a dialogue intention label;
  • An enhanced sample data determination module configured to input the first dialogue sample data into a retrieval model constructed based on ES retrieval, and determine an enhanced sample data set corresponding to the first dialogue sample data; the enhanced sample data set includes at least one augmented sample data;
  • the enhanced intent recognition module is configured to input the first dialogue sample data and the enhanced sample data into an initial intent recognition model including a first initial parameter, and perform an analysis on the first dialogue sample data and the enhanced sample data.
  • Enhanced intent recognition to obtain a first sample distribution corresponding to the first dialogue sample data, and a second sample distribution corresponding to the enhanced sample data;
  • a total loss value determination module configured to determine a distribution loss value according to the first sample distribution and the second sample distribution, and determine a total loss value of the initial intention recognition model according to each of the distribution loss values
  • a first parameter updating module configured to update and iterate a first initial parameter of the initial intent recognition model when the total loss value does not reach a preset convergence condition, until the total loss value reaches the preset convergence condition When conditions are met, the initial intention recognition model after convergence is recorded as a dialogue intention recognition model.
  • a computer device comprising a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, wherein the processor implements the following steps when executing the computer-readable instructions:
  • the dialogue sample data set includes at least one first dialogue sample data without a dialogue intention label
  • the enhanced sample data set includes at least one enhanced sample data
  • the first dialogue sample data and the enhanced sample data are input into the initial intent recognition model including the first initial parameter, and the enhanced intent recognition is performed on the first dialogue sample data and the enhanced sample data to obtain the same a first sample distribution corresponding to the first dialogue sample data, and a second sample distribution corresponding to the enhanced sample data;
  • the initial intent recognition model described above is recorded as a dialog intent recognition model.
  • One or more readable storage media storing computer-readable instructions, wherein the computer-readable instructions, when executed by one or more processors, cause the one or more processors to perform the following steps:
  • the dialogue sample data set includes at least one first dialogue sample data without a dialogue intention label
  • the enhanced sample data set includes at least one enhanced sample data
  • the first dialogue sample data and the enhanced sample data are input into the initial intent recognition model including the first initial parameter, and the enhanced intent recognition is performed on the first dialogue sample data and the enhanced sample data to obtain the same a first sample distribution corresponding to the first dialogue sample data, and a second sample distribution corresponding to the enhanced sample data;
  • the initial intent recognition model described above is recorded as a dialog intent recognition model.
  • the present application improves the utilization rate of the first dialogue sample data without dialogue intention labels, and at the same time, avoids the extra noise caused by data enhancement methods such as synonym replacement and back translation in the prior art, and improves the efficiency of model training and The accuracy of the model's intent recognition.
  • FIG. 1 is a schematic diagram of an application environment of a training method for a dialogue intent recognition model according to an embodiment of the present application
  • FIG. 2 is a flowchart of a training method for a dialogue intent recognition model in an embodiment of the present application
  • FIG. 3 is a flowchart of step S20 in the training method for a dialogue intent recognition model according to an embodiment of the present application
  • FIG. 4 is a flowchart of step S205 in the training method for a dialogue intent recognition model according to an embodiment of the present application
  • FIG. 5 is another flowchart of a training method for a dialogue intent recognition model in an embodiment of the present application
  • FIG. 6 is a schematic block diagram of an apparatus for training a dialog intention recognition model according to an embodiment of the present application
  • FIG. 7 is another principle block diagram of a dialog intention recognition model training device in an embodiment of the present application.
  • FIG. 8 is a schematic block diagram of an enhanced sample data determination module in a dialog intention recognition model training device according to an embodiment of the present application.
  • FIG. 9 is a schematic block diagram of an enhanced sample data determination unit in an apparatus for training a dialogue intention recognition model according to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a computer device in an embodiment of the present application.
  • the dialog intent recognition model training method provided by the embodiment of the present application can be applied in the application environment shown in FIG. 1 .
  • the dialogue intent recognition model training method is applied in a dialogue intent recognition model training system.
  • the dialogue intent recognition model training system includes a client and a server as shown in FIG. 1 , and the client and the server communicate through a network for Solve the problem that the accuracy of the intent recognition model is low due to the insufficient amount of labeled data.
  • the client also known as the client, refers to the program corresponding to the server and providing local services for the client. Clients can be installed on, but not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices.
  • the server can be implemented as an independent server or a server cluster composed of multiple servers.
  • a method for training a dialogue intent recognition model is provided, and the method is applied to the server in FIG. 1 as an example for description, including the following steps:
  • S10 Obtain a preset dialogue sample data set; the dialogue sample data set includes at least one first dialogue sample data without a dialogue intention label;
  • the first dialogue sample data is data that does not have a dialogue intent label that has been manually annotated in advance; generally, a large amount of manually annotated data is required for model training and learning in supervised learning, but the demand for manually annotated data is very large. , the manual labeling method wastes time and cannot output huge labeled data. Therefore, one of the problems to be solved in this application is how to train and learn the model more accurately and quickly in the absence of labeled data. Further, the first dialogue sample data may be selected according to different scenarios. In this embodiment, the first dialogue sample data may be obtained by extracting dialogue information in an intelligent multi-round dialogue system.
  • S20 Input the first dialogue sample data into a retrieval model constructed based on ES retrieval, and determine an enhanced sample data set corresponding to the first dialogue sample data; the enhanced sample data set includes at least one enhanced sample data;
  • the retrieval model constructed based on ES refers to that the model is constructed with an ES retrieval tool and is used to retrieve sentences with high similarity to the first dialogue sample data.
  • the ES retrieval tool carries An ES library that stores multiple sample data. Understandably, since there are few sample data with dialog intent labels, in order to improve the model's intent recognition accuracy and generalization ability, this application proposes to use a retrieval model constructed based on ES retrieval to determine the first dialog sample data. The corresponding enhanced sample data set, and then select the enhanced sample data corresponding to the first dialogue sample data from the enhanced sample data set, so as to input the first dialogue sample data and each enhanced sample data into the initial intention recognition model in step S30 for processing. Training, to solve the problem of less sample data with dialogue intent labels, resulting in low model training accuracy.
  • step S20 includes:
  • S201 Input the first dialogue sample data into the retrieval model, and perform vector encoding processing on the first dialogue sample data to obtain a dialogue encoding vector corresponding to the first dialogue sample data;
  • the first dialogue sample data in the preset dialogue sample data set is input into the retrieval model constructed based on ES retrieval, and encoded by the neural network in the retrieval model constructed based on ES retrieval.
  • the device encodes the first dialogue sample data to obtain a dialogue encoding vector corresponding to the first dialogue sample data.
  • the neural network encoder can be a neural network encoder constructed based on LSTM (Long Short-Term Memory, long short-term memory network) or BiLSTM (Bi-directional Long Short-Term Memory, bidirectional long and short-term memory network) neural network.
  • S202 Obtain all retrieval dialogue vectors from the retrieval database of the retrieval model, and determine the vector edit distance between the dialogue coding vector and each retrieval dialogue vector; one retrieval dialogue vector is associated with a retrieval sample sentence;
  • the retrieval dialogue vectors associated with multiple retrieval sample sentences are stored in the retrieval database of the retrieval model, and the retrieval dialogue vectors in the retrieval database can be pre-passed by an encoder (such as a neural network encoder constructed based on LSTM or BiLSTM neural network). ) is obtained by vector coding the retrieval sample sentence, and the retrieval sample sentence can also be obtained by crawling the dialogue information of each user in the intelligent multi-round dialogue system.
  • an encoder such as a neural network encoder constructed based on LSTM or BiLSTM neural network
  • the retrieval model After inputting the first dialogue sample data into the retrieval model, performing vector encoding processing on the first dialogue sample data, and obtaining a dialogue encoding vector corresponding to the first dialogue sample data, automatically All retrieval dialogue vectors are obtained from the retrieval database of the retrieval model, and the vector edit distance between the dialogue coding vector and each retrieval dialogue vector is determined. That is, the smaller the vector edit distance, the greater the similarity between the encoding vector representing the dialogue and the vector for retrieving dialogue.
  • S203 Compare each of the vector edit distances with a preset distance threshold, and record the retrieval sentence associated with the vector edit distance less than or equal to the preset distance threshold as sample data to be selected;
  • S204 Construct the enhanced sample data set according to all the sample data to be selected.
  • the preset distance threshold can be selected according to a specific application scenario, and if the scenario requires higher intent recognition, the preset distance threshold can be set to 0.05, 0.1, etc.
  • the vector edit distances are compared with the preset distances.
  • the thresholds are compared, and the retrieval sentences associated with the vector edit distance less than or equal to the preset distance threshold are recorded as sample data to be selected.
  • step S204 it also includes:
  • S025 Obtain a preset expansion multiple value, select sample data to be selected with a preset value from the enhanced sample data set according to the preset expansion multiple value, and record the selected sample data to be selected as the enhanced sample data.
  • the preset expansion multiple value is 10 times.
  • the preset value refers to the requirement of the preset expansion multiple value, except The number of enhanced sample data that needs to be expanded in addition to the original first dialogue sample data, that is, as the above distance, when the preset expansion multiple value is 10 times, the preset value of the enhanced sample data is 9.
  • step S205 includes:
  • S2051 Insert the to-be-selected sample data into the to-be-selected sequence according to the vector edit distance in ascending order;
  • the sample data to be selected after constructing the enhanced sample data set according to all the sample data to be selected, insert the sample data to be selected into the sequence to be selected according to the corresponding vector edit distance in ascending order.
  • the first order is the sample data to be selected with the smallest vector edit distance
  • the last order is the sample data to be selected with the largest vector edit distance.
  • S2052 Record the difference between the preset expansion multiple value and 1 as the preset value
  • S2053 Select sample data to be selected with a preset value before the sequence from the sequence to be selected, and record the selected sample data to be selected as the enhanced sample data.
  • Enhancement means that each first dialog data needs to be expanded, and then a first dialog data is regarded as an object to be expanded, thereby improving the generalization ability of the initial intent recognition model.
  • the preset expansion multiplier value is 10 times, and then excluding the first dialogue data itself, 9 sample data to be selected should also be expanded. , so the difference between the preset expansion multiple value and the preset value is 1.
  • step S2051 it is pointed out that the vector edit distance is inserted into the sequence to be selected in the order from small to large, so the vector edit distance between the retrieval dialogue vector corresponding to the sample data to be selected in front of the sequence and the dialogue coding vector is smaller. , and then select from the smallest sequence, and select the sample data to be selected with a preset value as the enhanced sample data.
  • the first conversation sample data is "I'm busy, I'm driving”
  • the corresponding enhanced sample data can be "I'm driving and talk later" or "I'm driving and it is inconvenient to communicate", etc.
  • S30 Input the first dialogue sample data and the enhanced sample data into an initial intent recognition model including a first initial parameter, perform enhanced intent recognition on the first dialogue sample data and the enhanced sample data, and obtain a first sample distribution corresponding to the first dialogue sample data, and a second sample distribution corresponding to the enhanced sample data;
  • enhanced intent recognition refers to a method of recognition through natural language understanding technology, which is to perform lexical analysis, syntactic analysis, and semantic analysis on the input first dialogue sample data or enhanced sample data to obtain the result. Semantic features are extracted, and finally a first sample distribution corresponding to the first dialogue sample data and a second sample distribution corresponding to the enhanced sample data are analyzed according to the extracted semantic features.
  • the initial intent recognition model may be various text classification models, such as a GMM (Gaussian Mixture Model, Gaussian Mixture) text topic model, and then the first dialogue sample data and the enhanced sample data are input.
  • GMM Gaussian Mixture Model, Gaussian Mixture
  • the initial intent recognition model including the first initial parameter After entering the initial intent recognition model including the first initial parameter, perform enhanced intent recognition on the first dialogue sample data and the enhanced sample data to obtain a first sample distribution corresponding to the first dialogue sample data, and a first sample distribution corresponding to the first dialogue sample data. The second sample distribution corresponding to the enhanced sample data.
  • S40 Determine a distribution loss value according to the first sample distribution and the second sample distribution, and determine a total loss value of the initial intent recognition model according to each of the distribution loss values;
  • the distribution is determined according to the first sample distribution and the second sample distribution loss value.
  • the number of enhanced sample data corresponding to the first dialogue sample data is at least one, and then the total loss value of the initial intent recognition model is determined according to the distribution loss values of the first sample distribution and each of the second sample distributions.
  • the total loss value of the initial intent recognition model can be determined by KL divergence:
  • q) is the total loss value
  • p(xi) is the second sample distribution of the i-th enhanced sample data corresponding to the x-th first dialogue sample data in the preset dialogue sample data set
  • q( x) is the first sample distribution corresponding to the xth first dialogue sample data in the preset dialogue sample data set.
  • the convergence condition can be the condition that the total loss value is less than the set threshold, that is, when the total loss value is less than the set threshold, the training is stopped; the convergence condition can also be that the total loss value after 10,000 calculations is Under the condition that it is very small and will not decrease again, that is, when the total loss value is small and will not decrease after 10,000 calculations, the training is stopped, and the initial intent recognition model after convergence is recorded as the dialog intent recognition model.
  • the first sample distribution corresponding to the first dialogue sample data and the second sample distribution of each enhanced sample data corresponding to the first dialogue sample data determine the total loss value of the initial intention recognition model Afterwards, when the total loss value does not reach the preset convergence condition, adjust the first initial parameter of the initial intent recognition model according to the total loss value, and re-input the first dialogue sample data and the corresponding enhanced sample data to the adjustment
  • the initial intent recognition model after an initial parameter, when the total loss value corresponding to the first dialogue sample data and the enhanced sample data reaches a preset convergence condition, another first dialogue sample data in the preset dialogue sample data set is selected.
  • the output results of the initial intention recognition model can be continuously approached to the accurate results, so that the recognition accuracy rate is getting higher and higher.
  • the initial intention recognition model after convergence is recorded as the dialogue intention recognition model.
  • the enhanced sample data corresponding to the first dialogue sample data is determined by a retrieval model constructed based on ES retrieval, and the utilization rate of the first dialogue sample data without the dialogue intent label is improved by this data enhancement method, It also solves the problem in the prior art that the accuracy of the intent recognition model is low due to the insufficient amount of labeled data. At the same time, additional noise caused by data enhancement methods such as synonym replacement and back translation in the prior art is avoided, and the efficiency of model training and the accuracy of model intent recognition are improved.
  • the dialogue intention recognition model may be stored in the blockchain.
  • Blockchain is a storage structure of encrypted and chained transactions formed by blocks.
  • the header of each block can include not only the hash values of all transactions in the block, but also the hash values of all transactions in the previous block, so that the transactions in the block can be tamper-proof based on the hash value.
  • anti-counterfeiting the newly generated transaction is filled into the block and after the consensus of the nodes in the blockchain network, it will be appended to the end of the blockchain to form a chain growth.
  • the dialogue sample data set further includes at least one second dialogue sample data with the dialogue intention label; as shown in FIG. 5 , before step S30, the first dialogue sample data and all Before the enhanced sample data is input into the initial intent recognition model including the first initial parameter, it includes:
  • S60 Input the second dialogue sample data into a preset recognition model including a second initial parameter, and perform labeling intention recognition on the second dialogue sample data by using the preset recognition model, and obtain a matching result with the first dialogue sample data. 2.
  • an annotation prediction label represents an intention corresponding to the second dialogue sample data, that is, for a second dialogue sample data, when the intention recognition is performed on the second dialogue sample data, it may be At least one intent related to the dialogue sample data, so for each identified intent, a corresponding annotation prediction label is applied to it. Further, for a second dialogue sample data, although at least one intent can be identified, the probability of each intent being identified is different, so a label prediction label is associated with a label prediction probability, that is, the second dialogue sample belongs to The probability of labeling the intent range of the predicted label is the label prediction probability.
  • dialog intention label of the second dialog sample data is obtained by manually labeling the second dialog sample data in advance.
  • S70 Determine the predicted loss value of the preset recognition model according to each of the labeled prediction labels, the label prediction probability corresponding to each of the labeled predicted labels, and the dialog intent label;
  • the second dialogue sample data is subjected to labeling intention recognition through the preset recognition model to obtain the same
  • the prediction of the preset recognition model is determined according to each annotation prediction label, the label prediction probability corresponding to each annotation prediction label, and the dialogue intention label loss value.
  • step S70 includes:
  • each labeling prediction label corresponding to the second dialog sample data is obtained, it is necessary to match each labeling prediction label with the labeling prediction label.
  • the dialog intent labels are compared, and the predicted labels that match the dialog intent labels are determined to be the same as the dialog intent labels.
  • the label prediction result corresponding to the label with the same category as the dialog intent label is 1; the label prediction result of the label different from the dialog intent label category is 0.
  • a prediction loss value of the preset recognition model is determined through a cross-entropy loss function.
  • the prediction loss value can be determined by the following cross-entropy loss function:
  • L1 refers to the prediction loss value
  • N refers to the number of all second dialogue sample data
  • M is the number of annotated prediction labels corresponding to the ith second dialogue sample data
  • y ic is the ith second dialogue sample data.
  • pic is the label prediction probability corresponding to the c-th label prediction label of the i-th second dialogue sample data.
  • the convergence condition can be the condition that the predicted loss value is less than the set threshold, that is, when the predicted loss value is less than the set threshold, the training is stopped; the convergence condition can also be that the predicted loss value after 10,000 calculations is Under the condition that it is very small and will not decrease further, that is, when the predicted loss value is small and will not decrease after 10,000 calculations, the training is stopped, and the preset recognition model after convergence is recorded as the initial recognition model.
  • the predicted loss value of the preset recognition model is determined according to each of the annotation prediction labels, the label prediction probability corresponding to each of the annotation prediction labels, and the dialog intention label, and the preset recognition model is determined.
  • the predicted loss value when the predicted loss value does not reach the preset convergence condition, adjust the second initial parameter of the preset recognition model according to the predicted loss value, and re-input the second dialogue sample data to adjust the second initial parameter
  • the preset recognition model after the parameters, when the predicted loss value corresponding to the second dialogue sample data reaches the preset convergence condition, another second dialogue sample data in the preset dialogue sample data set is selected, and the above step S60 is executed Go to S70, and obtain the predicted loss value corresponding to the second dialogue sample data, and when the predicted loss value does not reach the preset convergence condition, adjust the second initial parameter of the preset recognition model again according to the predicted loss value, The predicted loss value corresponding to the second dialogue sample data reaches a preset convergence condition.
  • the preset recognition model after the preset recognition model is trained by using all the second dialogue sample data in the preset dialogue sample data set, the output results of the preset recognition model can be continuously approached to the accurate results, so that the recognition accuracy rate is getting higher and higher. Until the predicted loss values corresponding to all the second dialogue sample data reach the preset convergence condition, the preset recognition model after convergence is recorded as the initial recognition model.
  • the initial recognition model only needs to be trained by a small amount of second dialogue sample data with dialogue intention labels, so that on the basis of training through the second dialogue sample data with dialogue intention labels, the above-mentioned embodiment
  • the first dialogue sample data without dialogue intent labels and the corresponding enhanced sample data in the initial recognition model are trained, which greatly reduces the workload of manual labeling.
  • an apparatus for training a dialogue intention recognition model is provided, and the apparatus for training a dialogue intention recognition model is in one-to-one correspondence with the method for training a dialogue intention recognition model in the above embodiment.
  • the dialogue intent recognition model training device includes a dialogue sample data set acquisition module 10 , an enhanced sample data determination module 20 , an enhanced intent recognition module 30 , a total loss value determination module 40 and a first parameter update module 50 .
  • the detailed description of each functional module is as follows:
  • a dialogue sample data set acquisition module 10 configured to acquire a preset dialogue sample data set; the dialogue sample data set includes at least one first dialogue sample data without a dialogue intention label;
  • the enhanced sample data determination module 20 is configured to input the first dialogue sample data into a retrieval model constructed based on ES retrieval, and determine an enhanced sample data set corresponding to the first dialogue sample data; the enhanced sample data set include at least one augmented sample data;
  • the enhanced intent recognition module 30 is configured to input the first dialogue sample data and the enhanced sample data into an initial intent recognition model including first initial parameters, and analyze the first dialogue sample data and the enhanced sample data performing enhanced intent recognition to obtain a first sample distribution corresponding to the first dialogue sample data, and a second sample distribution corresponding to the enhanced sample data;
  • a total loss value determination module 40 configured to determine a distribution loss value according to the first sample distribution and the second sample distribution, and determine a total loss value of the initial intention recognition model according to each of the distribution loss values;
  • the first parameter updating module 50 is configured to update and iterate the first initial parameter of the initial intent recognition model when the total loss value does not reach a preset convergence condition, until the total loss value reaches the preset convergence condition.
  • the initial intention recognition model after convergence is recorded as a dialogue intention recognition model.
  • the dialogue intention recognition model training device further includes:
  • An annotation intent recognition module 60 configured to input the second dialogue sample data into a preset recognition model including second initial parameters, and perform annotation intent on the second dialogue sample data through the preset recognition model Identify, obtain each label prediction label corresponding to the second dialogue sample data; one described label prediction label is associated with a label prediction probability;
  • a predicted loss value determination module 70 configured to determine the predicted loss value of the preset recognition model according to each of the labeled predicted labels, the predicted predicted probability of the label corresponding to each of the labeled predicted labels, and the dialog intent label;
  • the second parameter updating module 80 is configured to update and iterate the second initial parameter of the preset recognition model when the predicted loss value does not reach a preset convergence condition, until the predicted loss value reaches the preset convergence condition When the convergence conditions are met, the preset recognition model after convergence is recorded as the initial intention recognition model.
  • the predicted loss value determination module 70 includes:
  • an annotation prediction result determination unit configured to determine an annotation prediction result corresponding to each of the annotation prediction tags according to each of the annotation prediction tags and the dialog intention tags;
  • a predicted loss value determination unit configured to determine the predicted loss value of the preset recognition model by using a cross-entropy loss function according to each of the labeled prediction results and the label prediction probability corresponding to each of the labeled prediction results.
  • the enhanced sample data determination module 20 includes:
  • a vector coding processing unit 201 configured to input the first dialogue sample data into the retrieval model, perform vector coding processing on the first dialogue sample data, and obtain a dialogue code corresponding to the first dialogue sample data vector;
  • the vector edit distance determining unit 202 is configured to obtain all retrieval dialogue vectors from the retrieval database of the retrieval model, and determine the vector edit distance between the dialogue coding vector and each retrieval dialogue vector; one retrieval dialogue The vector is associated with a retrieval sample sentence;
  • Marginal distance comparison unit 203 used to compare each of the vector edit distances with a preset distance threshold, and record the retrieval sentence associated with the vector edit distance less than or equal to the preset distance threshold as sample data to be selected;
  • the enhanced sample data set construction unit 204 is configured to construct the enhanced sample data set according to all the sample data to be selected.
  • the enhanced sample data determination module 20 further includes:
  • the enhanced sample data determination unit 205 is configured to obtain a preset expansion multiple value, select sample data to be selected with a preset value according to the preset expansion multiple value from the enhanced sample data set, and use the selected sample to be selected. Data is recorded as the augmented sample data.
  • the enhanced sample data determination unit 205 includes:
  • the data sequence insertion subunit 2051 is used to insert the sample data to be selected into the sequence to be selected according to the sequence of the vector edit distance from small to large;
  • the preset value determination subunit 2052 is used to record the difference between the preset expansion multiple value and 1 as the preset value
  • the enhanced sample data selection subunit 2053 is configured to select the to-be-selected sample data with a preset value before the sequence from the to-be-selected sequence, and record the selected to-be-selected sample data as the enhanced sample data.
  • Each module in the above-mentioned dialogue intent recognition model training apparatus may be implemented in whole or in part by software, hardware and combinations thereof.
  • the above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided, and the computer device may be a server, and its internal structure diagram may be as shown in FIG. 10 .
  • the computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a readable storage medium, an internal memory.
  • the readable storage medium stores an operating system, computer readable instructions and a database.
  • the internal memory provides an environment for the execution of the operating system and computer-readable instructions in the readable storage medium.
  • the database of the computer device is used to store the training method for the dialogue intent recognition model in the above embodiment.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions when executed by a processor, implement a method for training a dialogue intent recognition model.
  • the readable storage medium provided by this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium
  • a computer apparatus comprising a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, wherein the processor executes the computer
  • the following steps are implemented when readable instructions:
  • the dialogue sample data set includes at least one first dialogue sample data without a dialogue intention label
  • the enhanced sample data set includes at least one enhanced sample data
  • the first dialogue sample data and the enhanced sample data are input into the initial intent recognition model including the first initial parameter, and the enhanced intent recognition is performed on the first dialogue sample data and the enhanced sample data to obtain the same a first sample distribution corresponding to the first dialogue sample data, and a second sample distribution corresponding to the enhanced sample data;
  • the initial intent recognition model described above is recorded as a dialog intent recognition model.
  • one or more readable storage media having computer-readable instructions stored thereon, wherein the computer-readable instructions, when executed by one or more processors, cause the one or more processing The device performs the following steps:
  • the dialogue sample data set includes at least one first dialogue sample data without a dialogue intention label
  • the enhanced sample data set includes at least one enhanced sample data
  • the first dialogue sample data and the enhanced sample data are input into the initial intent recognition model including the first initial parameter, and the enhanced intent recognition is performed on the first dialogue sample data and the enhanced sample data to obtain the same a first sample distribution corresponding to the first dialogue sample data, and a second sample distribution corresponding to the enhanced sample data;
  • the initial intent recognition model described above is recorded as a dialog intent recognition model.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)

Abstract

一种对话意图识别模型训练方法、装置、设备及介质,涉及语义解析技术领域。通过将第一对话样本数据输入至基于ES检索构建的检索模型中确定增强样本数据(S20);将第一对话样本数据以及增强样本数据输入至初始意图识别模型中,对第一对话样本数据以及增强样本数据进行增强意图识别,得到第一样本分布以及第二样本分布(S30);根据第一样本分布及第二样本分布确定分布损失值,并根据各分布损失值确定初始意图识别模型的总损失值(S40);在总损失值未达到预设的收敛条件时,更新迭代初始意图识别模型的第一初始参数,直至总损失值达到预设的收敛条件时,将收敛之后的初始意图识别模型记录为对话意图识别模型(S50),提高了意图识别模型的识别准确率。

Description

对话意图识别模型训练方法、装置、计算机设备及介质
本申请要求于2020年12月31日提交中国专利局、申请号为202011637063.X,发明名称为“对话意图识别模型训练方法、装置、计算机设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及语义解析技术领域,尤其涉及一种对话意图识别模型训练方法、装置、计算机设备及介质。
背景技术
随着科学技术的发展,自然语言处理技术领域也快速发展,例如自然语言处理技术领域可以应用于智能多轮对话系统、文本相似度判定系统等系统中。其中,智能多轮对话系统需要根据客户的对话内容识别其意图,该意图用于后续的流程控制以及对话生成,因此意图识别是智能多轮对话中的关键技术。
发明人意识到,现有技术中,意图识别常常会采用意图识别模型进行意图提取,对于意图识别模型的训练需要采用带有标签的数据,而现有的带有标签的数据需要从历史对话信息中筛选得到后,通过人为标注的方法进行标签标注,但是,通过该方式得到的已标注标签的数据量往往不够充足,进而导致意图识别模型无法得到完整充分的训练,从而使得意图识别模型的准确率较低。
申请内容
本申请实施例提供一种对话意图识别模型训练方法、装置、计算机设备及介质,以解决由于已标注标签的数据量不充足,导致意图识别模型的准确率较低的问题。
一种对话意图识别模型训练方法,包括:
获取预设对话样本数据集;所述对话样本数据集中包含至少一个不具有对话意图标签的第一对话样本数据;
将所述第一对话样本数据输入至基于ES检索构建的检索模型中,确定与所述第一对话样本数据对应的增强样本数据集;所述增强样本数据集中包括至少一个增强样本数据;
将所述第一对话样本数据以及所述增强样本数据输入至包含第一初始参数的初始意图识别模型中,对所述第一对话样本数据以及所述增强样本数据进行增强意图识别,得到与所述第一对话样本数据对应的第一样本分布,以及与所述增强样本数据对应的第二样本分布;
根据所述第一样本分布以及所述第二样本分布确定分布损失值,并根据各所述分布损失值确定所述初始意图识别模型的总损失值;
在所述总损失值未达到预设的收敛条件时,更新迭代所述初始意图识别模型的第一初始参数,直至所述总损失值达到所述预设的收敛条件时,将收敛之后的所述初始意图识别模型记录为对话意图识别模型。
一种对话意图识别模型训练装置,包括:
对话样本数据集获取模块,用于获取预设对话样本数据集;所述对话样本数据集中包含至少一个不具有对话意图标签的第一对话样本数据;
增强样本数据确定模块,用于将所述第一对话样本数据输入至基于ES检索构建的检索模型中,确定与所述第一对话样本数据对应的增强样本数据集;所述增强样本数据集中包括至少一个增强样本数据;
增强意图识别模块,用于将所述第一对话样本数据以及所述增强样本数据输入至包含第一初始参数的初始意图识别模型中,对所述第一对话样本数据以及所述增强样本数据进行增强意图识别,得到与所述第一对话样本数据对应的第一样本分布,以及与所述增强样本数据对应的第二样本分布;
总损失值确定模块,用于根据所述第一样本分布以及所述第二样本分布确定分布损失值,并根据各所述分布损失值确定所述初始意图识别模型的总损失值;
第一参数更新模块,用于在所述总损失值未达到预设的收敛条件时,更新迭代所述初始意图识别模型的第一初始参数,直至所述总损失值达到所述预设的收敛条件时,将收敛之后的所述初始意图识别模型记录为对话意图识别模型。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其中,所述处理器执行所述计算机可读指令时实现如下步骤:
获取预设对话样本数据集;所述对话样本数据集中包含至少一个不具有对话意图标签的第一对话样本数据;
将所述第一对话样本数据输入至基于ES检索构建的检索模型中,确定与所述第一对话样本数据对应的增强样本数据集;所述增强样本数据集中包括至少一个增强样本数据;
将所述第一对话样本数据以及所述增强样本数据输入至包含第一初始参数的初始意图识别模型中,对所述第一对话样本数据以及所述增强样本数据进行增强意图识别,得到与所述第一对话样本数据对应的第一样本分布,以及与所述增强样本数据对应的第二样本分布;
根据所述第一样本分布以及所述第二样本分布确定分布损失值,并根据各所述分布损失值确定所述初始意图识别模型的总损失值;
在所述总损失值未达到预设的收敛条件时,更新迭代所述初始意图识别模型的第一初始参数,直至所述总损失值达到所述预设的收敛条件时,将收敛之后的所述初始意图识别模型记录为对话意图识别模型。
一个或多个存储有计算机可读指令的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
获取预设对话样本数据集;所述对话样本数据集中包含至少一个不具有对话意图标签的第一对话样本数据;
将所述第一对话样本数据输入至基于ES检索构建的检索模型中,确定与所述第一对话样本数据对应的增强样本数据集;所述增强样本数据集中包括至少一个增强样本数据;
将所述第一对话样本数据以及所述增强样本数据输入至包含第一初始参数的初始意图识别模型中,对所述第一对话样本数据以及所述增强样本数据进行增强意图识别,得到与所述第一对话样本数据对应的第一样本分布,以及与所述增强样本数据对应的第二样本分布;
根据所述第一样本分布以及所述第二样本分布确定分布损失值,并根据各所述分布损失值确定所述初始意图识别模型的总损失值;
在所述总损失值未达到预设的收敛条件时,更新迭代所述初始意图识别模型的第一初始参数,直至所述总损失值达到所述预设的收敛条件时,将收敛之后的所述初始意图识别模型记录为对话意图识别模型。
本申请提高了不具有对话意图标签的第一对话样本数据的利用率,同时,避免了现有技术中采用同义词替换、回译等数据增强方式带来的额外噪音,提高了模型训练的效率以及模型意图识别的准确率。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要 使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例中对话意图识别模型训练方法的一应用环境示意图;
图2是本申请一实施例中对话意图识别模型训练方法的一流程图;
图3是本申请一实施例中对话意图识别模型训练方法中步骤S20的一流程图;
图4是本申请一实施例中对话意图识别模型训练方法中步骤S205的一流程图;
图5是本申请一实施例中对话意图识别模型训练方法的另一流程图;
图6是本申请一实施例中对话意图识别模型训练装置的一原理框图;
图7是本申请一实施例中对话意图识别模型训练装置的另一原理框图;
图8是本申请一实施例中对话意图识别模型训练装置中增强样本数据确定模块的一原理框图;
图9是本申请一实施例中对话意图识别模型训练装置中增强样本数据确定单元的一原理框图;
图10是本申请一实施例中计算机设备的一示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提供的对话意图识别模型训练方法,该对话意图识别模型训练方法可应用如图1所示的应用环境中。具体地,该对话意图识别模型训练方法应用在对话意图识别模型训练系统中,该对话意图识别模型训练系统包括如图1所示的客户端和服务器,客户端与服务器通过网络进行通信,用于解决由于已标注标签的数据量不充足,导致意图识别模型的准确率较低的问题。其中,客户端又称为用户端,是指与服务器相对应,为客户提供本地服务的程序。客户端可安装在但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备上。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一实施例中,如图2所示,提供一种对话意图识别模型训练方法,以该方法应用在图1中的服务器为例进行说明,包括如下步骤:
S10:获取预设对话样本数据集;所述对话样本数据集中包含至少一个不具有对话意图标签的第一对话样本数据;
可以理解地,第一对话样本数据为不具有预先通过人工标注的对话意图标签的数据;一般地,在有监督学习中需要大量的人工标注数据进行模型训练学习,但是人工标注数据需求量很大,通过人工进行标注的方法浪费时间,且无法输出庞大的标注数据,因此本申请需要解决的其中一个问题就是缺乏有标注数据的情况下,如何对模型进行更加精确,快速的训练学习。进一步地,该第一对话样本数据可以根据不同场景进行选取,在本实施例中,第一对话样本数据可以为对智能多轮对话系统中的对话信息进行提取得到。
S20:将所述第一对话样本数据输入至基于ES检索构建的检索模型中,确定与所述第一对话样本数据对应的增强样本数据集;所述增强样本数据集中包括至少一个增强样本数据;
其中,基于ES(Elasticsearch,搜索服务器)检索构建的检索模型指的是该模型中采用的是ES检索工具构建,且用于检索与第一对话样本数据相似性高的句子,该ES检索工具携带一个ES库,该ES库中存储多个样本数据。可以理解地,由于具有对话意图标签的样 本数据较少,为了提高模型的意图识别准确率以及泛化能力,本申请中提出采用基于ES检索构建的检索模型,确定与所述第一对话样本数据对应的增强样本数据集,进而从增强样本数据集中选取与第一对话样本数据对应的增强样本数据,以将第一对话样本数据与各增强样本数据输入至步骤S30中的初始意图识别模型中进行训练,解决具有对话意图标签的样本数据较少,导致模型训练准确率较低的问题。
在一实施例中,如图3所示,步骤S20中,包括:
S201:将所述第一对话样本数据输入至所述检索模型中,对所述第一对话样本数据进行向量编码处理,得到与所述第一对话样本数据对应的对话编码向量;
具体地,在获取预设对话样本数据集之后,将预设对话样本数据集中的第一对话样本数据输入至基于ES检索构建的检索模型中,通过基于ES检索构建的检索模型中的神经网络编码器对第一对话样本数据进行编码处理,得到与所述第一对话样本数据对应的对话编码向量。其中,该神经网络编码器可以为基于LSTM(Long Short‐Term Memory,长短期记忆网络)或者BiLSTM(Bi‐directional Long Short‐Term Memory,双向长短期记忆网络)神经网络构建的神经网络编码器。
S202:自所述检索模型的检索数据库中获取所有检索对话向量,并确定所述对话编码向量与各所述检索对话向量之间的向量编辑距离;一个所述检索对话向量关联一个检索样本句子;
可以理解地,在检索模型的检索数据库中存储多个检索样本句子关联的检索对话向量,该检索数据库中的检索对话向量可以预先通过编码器(如基于LSTM或者BiLSTM神经网络构建的神经网络编码器)对检索样本句子进行向量编码得到,该检索样本句子也可以为对智能多轮对话系统中各个用户的对话信息进行爬取得到的。
进一步地,在将所述第一对话样本数据输入至所述检索模型中,对所述第一对话样本数据进行向量编码处理,得到与所述第一对话样本数据对应的对话编码向量之后,自所述检索模型的检索数据库中获取所有检索对话向量,并确定对话编码向量与各检索对话向量之间的向量编辑距离,该向量编辑距离表征将检索对话向量转换成对话编码向量最少编辑次数,也即向量编辑距离越小,表征对话编码向量与检索对话向量之间相似度越大。
S203:将各所述向量编辑距离与预设距离阈值进行比较,并将小于或等于预设距离阈值的向量编辑距离关联的检索句子,记录为待选取样本数据;
S204:根据所有所述待选取样本数据构建所述增强样本数据集。
其中,预设距离阈值可以根据具体应用场景进行选择,若场景下对意图识别要求较高,该预设距离阈值可以设置为0.05、0.1等。
具体地,在自所述检索模型的检索数据库中获取所有检索对话向量,并确定所述对话编码向量与各所述检索对话向量之间的向量编辑距离之后,将各向量编辑距离与预设距离阈值进行比较,将小于或等于预设距离阈值的向量编辑距离关联的检索句子,记录为待选取样本数据。
进一步地,步骤S204之后,还包括:
S025:获取预设扩充倍数值,自所述增强样本数据集中根据所述预设扩充倍数值选取预设数值的待选取样本数据,并将选取的所述待选取样本数据记录为所述增强样本数据。
示例性地,假设需要将第一对话样本数据扩充至十个第一对话样本数据,则该预设扩充倍数值为10倍,预设数值指的是为了达到预设扩充倍数值的要求,除原始第一对话样本数据之外还需要扩充的增强样本数据的数量,也即如上述距离,预设扩充倍数值为10倍时,增强样本数据的预设数值则为9个。
在一具体实施例中,如图4所示,步骤S205中,包括:
S2051:将所述待选取样本数据按照所述向量编辑距离从小到大的顺序插入待选取序列中;
具体地,在根据所有所述待选取样本数据构建所述增强样本数据集之后,将各待选取样本数据按照对应的向量编辑距离从小到大的顺序插入待选取序列中,可以理解地,在待选取序列中,排序第一位的为向量编辑距离最小的待选取样本数据,排序在最后一位的为向量编辑距离最大的待选区样本数据。
S2052:将所述预设扩充倍数值与1之间的差值记录为所述预设数值;
S2053:自所述待选取序列中选取序列在前的预设数值的待选取样本数据,并将选取的待选取样本数据记录为所述增强样本数据。
可以理解地,本实施例中,由于具有对话意图标签的第二对话数据过少,进而在通过采用不具有对话意图标签的第一对话数据进行训练时,需要对每一个第一对话数据进行数据增强,也即需要对每一个第一对话数据进行扩充,进而将一个第一对话数据看做一个待扩充对象,进而提高初始意图识别模型的泛化能力。示例性地,需要将该第一对话数据从1个数据扩充至10个数据,则该预设扩充倍数值即为10倍,进而除去第一对话数据本身,还应该扩充9个待选取样本数据,因此表征预设扩充倍数值与预设数值之间的差值即为1。
进一步地,在确定预设数值之后,自步骤S2051中得到的待选取序列中,选取序列在前的预设数值的待选取样本数据,并将选取的待选取样本数据记录为增强样本数据。可以理解地,在步骤S2051中指出通过向量编辑距离从小到大的顺序插入待选取序列中,因此序列在前的待选取样本数据对应的检索对话向量与对话编码向量之间的向量编辑距离较小,进而从序列最小的开始选取,选取预设数值的待选取样本数据作为增强样本数据。示例性地,假设第一对话样本数据为“我在忙呢,正在开车”,对应的增强样本数据可以为“我在开车晚点再聊”亦或者“我在开车不方便沟通”等。
S30:将所述第一对话样本数据以及所述增强样本数据输入至包含第一初始参数的初始意图识别模型中,对所述第一对话样本数据以及所述增强样本数据进行增强意图识别,得到与所述第一对话样本数据对应的第一样本分布,以及与所述增强样本数据对应的第二样本分布;
可以理解地,增强意图识别指的是通过自然语言理解技术进行识别的方法,该自然语言理解技术为通过对输入的第一对话样本数据或者增强样本数据,进行词法分析、句法分析、语义分析从而提取出语义特征,最后根据提取的语义特征分析出与所述第一对话样本数据对应的第一样本分布,以及与所述增强样本数据对应的第二样本分布。
进一步地,在本实施例中,初始意图识别模型可以为各类文本分类模型,例如GMM(Gaussian Mixture Model,高斯混合)文本主题模型,进而在将第一对话样本数据以及所述增强样本数据输入至包含第一初始参数的初始意图识别模型中之后,对第一对话样本数据以及增强样本数据进行增强意图识别,得到与所述第一对话样本数据对应的第一样本分布,以及与所述增强样本数据对应的第二样本分布。
S40:根据所述第一样本分布以及所述第二样本分布确定分布损失值,并根据各所述分布损失值确定所述初始意图识别模型的总损失值;
具体地,在将所述第一对话样本数据以及所述增强样本数据输入至包含第一初始参数的初始意图识别模型中,对所述第一对话样本数据以及所述增强样本数据进行增强意图识别,得到与所述第一对话样本数据对应的第一样本分布,以及与所述增强样本数据对应的第二样本分布之后,根据所述第一样本分布以及所述第二样本分布确定分布损失值。进一步地,与第一对话样本数据对应的增强样本数据的数量为至少一个,进而根据第一样本分布与各第二样本分布的分布损失值确定初始意图识别模型的总损失值。
进一步地,可以通过KL散度确定初始意图识别模型的总损失值:
Figure PCTCN2021083953-appb-000001
其中,KL(p||q)为总损失值;p(xi)为与预设对话样本数据集中第x个第一对话样本数据对应的第i个增强样本数据的第二样本分布;q(x)为预设对话样本数据集中第x个第一对话样本数据对应的第一样本分布。
S50:在所述总损失值未达到预设的收敛条件时,更新迭代所述初始意图识别模型的第一初始参数,直至所述总损失值达到所述预设的收敛条件时,将收敛之后的所述初始意图识别模型记录为对话意图识别模型。
可以理解地,该收敛条件可以为总损失值小于设定阈值的条件,也即在总损失值小于设定阈值时,停止训练;收敛条件还可以为总损失值经过了10000次计算后值为很小且不会再下降的条件,也即总损失值经过10000次计算后值很小且不会下降时,停止训练,将收敛之后的所述初始意图识别模型记录为对话意图识别模型。
进一步地,根据所述第一对话样本数据对应的第一样本分布,以及与该第一对话样本数据对应的各增强样本数据的第二样本分布,确定所述初始意图识别模型的总损失值之后,在总损失值未达到预设的收敛条件时,根据该总损失值调整初始意图识别模型的第一初始参数,并将该第一对话样本数据以及对应的增强样本数据重新输入至调整第一初始参数后的初始意图识别模型中,以在该第一对话样本数据和增强样本数据对应的总损失值达到预设的收敛条件时,选取预设对话样本数据集中另一个第一对话样本数据,并执行上述步骤S10至S40,并得到与该第一对话样本数据对应的总损失值,并在该总损失值未达到预设的收敛条件时,根据该总损失值再次调整初始意图识别模型的第一初始参数,使得该第一对话样本数据对应的总损失值达到预设的收敛条件。
如此,在通过预设对话样本数据集中所有第一对话样本数据对初始意图识别模型进行训练之后,使得初始意图识别模型输出的结果可以不断向准确地结果靠拢,让识别准确率越来越高,直至所有第一对话样本数据对应的总损失值均达到预设的收敛条件时,将收敛之后的所述初始意图识别模型记录为对话意图识别模型。
在本实施例中,通过基于ES检索构建的检索模型确定与第一对话样本数据对应的增强样本数据,通过如此数据增强方式,提高了不具有对话意图标签的第一对话样本数据的利用率,也解决了现有技术中由于已标注标签的数据量不充足,导致意图识别模型的准确率较低的问题。同时,避免了现有技术中采用同义词替换、回译等数据增强方式带来的额外噪音,提高了模型训练的效率以及模型意图识别的准确率。
在另一具体实施例中,为了保证上述实施例中的对话意图识别模型的私密以及安全性,可以将对话意图识别模型存储在区块链中。其中,区块链(Blockchain),是由区块(Block)形成的加密的、链式的交易的存储结构。
例如,每个区块的头部既可以包括区块中所有交易的哈希值,同时也包含前一个区块中所有交易的哈希值,从而基于哈希值实现区块中交易的防篡改和防伪造;新产生的交易被填充到区块并经过区块链网络中节点的共识后,会被追加到区块链的尾部从而形成链式的增长。
在一实施例中,所述对话样本数据集中还包含至少一个具有所述对话意图标签的第二对话样本数据;如图5所示,步骤S30之前,也即将所述第一对话样本数据以及所述增强样本数据输入至包含第一初始参数的初始意图识别模型中之前,包括:
S60:将所述第二对话样本数据输入至包含第二初始参数的预设识别模型中,通过所述预设识别模型对所述第二对话样本数据进行标注意图识别,得到与所述第二对话样本数据对应的各标注预测标签;一个所述标注预测标签关联一个标签预测概率;
可以理解地,一个标注预测标签表征一个与第二对话样本数据对应的意图,也即针对于一个第二对话样本数据,在对第二对话样本数据进行意图识别时,可能会识别出与第二 对话样本数据相关的至少一个意图,因此针对于识别出的每一个意图,对其打上相对应的标注预测标签。进一步地,针对于一个第二对话样本数据,虽然可以识别出至少一个意图,但是每一个意图识别出的概率是不同的,因此一个标注预测标签关联一个标签预测概率,也即第二对话样本属于标注预测标签的意图范围的概率为标签预测概率。
进一步地,第二对话样本数据具有的对话意图标签为通过预先对第二对话样本数据进行人工标注得到。
S70:根据各所述标注预测标签、与各所述标注预测标签对应的标签预测概率以及所述对话意图标签,确定所述预设识别模型的预测损失值;
具体地,在将所述第二对话样本数据输入至包含第二初始参数的预设识别模型中,通过所述预设识别模型对所述第二对话样本数据进行标注意图识别,得到与所述第二对话样本数据对应的各标注预测标签之后,根据各所述标注预测标签、与各所述标注预测标签对应的标签预测概率以及所述对话意图标签,确定所述预设识别模型的预测损失值。
在一实施例中,步骤S70中,包括:
根据各所述标注预测标签与所述对话意图标签,确定与各所述标注预测标签对应的标注预测结果;
可以理解地,在通过所述预设识别模型对所述第二对话样本数据进行标注意图识别,得到与所述第二对话样本数据对应的各标注预测标签之后,需要将各标注预测标签与对话意图标签进行比对,将与对话意图标签相符的标注预测标签确定为与对话意图标签类别相同的标签,与对话意图标签不相符的标注预测标签确定与对话意图标签类别不相同的标签,进而与对话意图标签类别相同的标签对应的标注预测结果为1;与对话意图标签类别不相同的标签的标注预测结果为0。
根据各所述标注预测结果以及与各所述标注预测结果对应的所述标签预测概率,通过交叉熵损失函数确定所述预设识别模型的预测损失值。
具体地,可以通过下述交叉熵损失函数确定预测损失值:
Figure PCTCN2021083953-appb-000002
其中,L1指的是预测损失值;N指的是所有第二对话样本数据的数量;M为与第i个第二对话样本数据对应的标注预测标签的数量;y ic为第i个第二对话样本数据的第c个标注预测标签对应的标注预测结果;p ic为第i个第二对话样本数据的第c个标注预测标签对应的标签预测概率。
S80:在所述预测损失值未达到预设的收敛条件时,更新迭代所述预设识别模型的第二初始参数,直至所述预测损失值达到所述预设的收敛条件时,将收敛之后的所述预设识别模型记录为所述初始意图识别模型。
可以理解地,该收敛条件可以为预测损失值小于设定阈值的条件,也即在预测损失值小于设定阈值时,停止训练;收敛条件还可以为预测损失值经过了10000次计算后值为很小且不会再下降的条件,也即预测损失值经过10000次计算后值很小且不会下降时,停止训练,将收敛之后的所述预设识别模型记录为所述初始识别模型。
进一步地,根据根据各所述标注预测标签、与各所述标注预测标签对应的标签预测概率以及所述对话意图标签,确定所述预设识别模型的预测损失值,确定所述预设识别模型的预测损失值之后,在预测损失值未达到预设的收敛条件时,根据该预测损失值调整预设识别模型的第二初始参数,并将该第二对话样本数据重新输入至调整第二初始参数后的预设识别模型中,以在该第二对话样本数据对应的预测损失值达到预设的收敛条件时,选取预设对话样本数据集中另一个第二对话样本数据,并执行上述步骤S60至S70,并得到与 该第二对话样本数据对应的预测损失值,并在该预测损失值未达到预设的收敛条件时,根据该预测损失值再次调整预设识别模型的第二初始参数,使得该第二对话样本数据对应的预测损失值达到预设的收敛条件。
如此,在通过预设对话样本数据集中所有第二对话样本数据对预设识别模型进行训练之后,使得预设识别模型输出的结果可以不断向准确地结果靠拢,让识别准确率越来越高,直至所有第二对话样本数据对应的预测损失值均达到预设的收敛条件时,将收敛之后的所述预设识别模型记录为所述初始识别模型。
在本实施例中,仅需要通过少量的具有对话意图标签的第二对话样本数据训练得到初始识别模型,使得在通过具有对话意图标签的第二对话样本数据训练的基础上,再通过上述实施例中的不具有对话意图标签的第一对话样本数据以及对应的增强样本数据,对初始识别模型进行训练,极大的减少了人工标注的工作量。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在一实施例中,提供一种对话意图识别模型训练装置,该对话意图识别模型训练装置与上述实施例中对话意图识别模型训练方法一一对应。如图6所示,该对话意图识别模型训练装置包括对话样本数据集获取模块10、增强样本数据确定模块20、增强意图识别模块30、总损失值确定模块40和第一参数更新模块50。各功能模块详细说明如下:
对话样本数据集获取模块10,用于获取预设对话样本数据集;所述对话样本数据集中包含至少一个不具有对话意图标签的第一对话样本数据;
增强样本数据确定模块20,用于将所述第一对话样本数据输入至基于ES检索构建的检索模型中,确定与所述第一对话样本数据对应的增强样本数据集;所述增强样本数据集中包括至少一个增强样本数据;
增强意图识别模块30,用于将所述第一对话样本数据以及所述增强样本数据输入至包含第一初始参数的初始意图识别模型中,对所述第一对话样本数据以及所述增强样本数据进行增强意图识别,得到与所述第一对话样本数据对应的第一样本分布,以及与所述增强样本数据对应的第二样本分布;
总损失值确定模块40,用于根据所述第一样本分布以及所述第二样本分布确定分布损失值,并根据各所述分布损失值确定所述初始意图识别模型的总损失值;
第一参数更新模块50,用于在所述总损失值未达到预设的收敛条件时,更新迭代所述初始意图识别模型的第一初始参数,直至所述总损失值达到所述预设的收敛条件时,将收敛之后的所述初始意图识别模型记录为对话意图识别模型。
优选地,如图7所示,对话意图识别模型训练装置还包括:
标注意图识别模块60,用于将所述第二对话样本数据输入至包含第二初始参数的预设识别模型中,通过所述预设识别模型对所述第二对话样本数据进行标注意图识别,得到与所述第二对话样本数据对应的各标注预测标签;一个所述标注预测标签关联一个标签预测概率;
预测损失值确定模块70,用于根据各所述标注预测标签、与各所述标注预测标签对应的标签预测概率以及所述对话意图标签,确定所述预设识别模型的预测损失值;
第二参数更新模块80,用于在所述预测损失值未达到预设的收敛条件时,更新迭代所述预设识别模型的第二初始参数,直至所述预测损失值达到所述预设的收敛条件时,将收敛之后的所述预设识别模型记录为所述初始意图识别模型。
优选地,预测损失值确定模块70包括:
标注预测结果确定单元,用于根据各所述标注预测标签与所述对话意图标签,确定与各所述标注预测标签对应的标注预测结果;
预测损失值确定单元,用于根据各所述标注预测结果以及与各所述标注预测结果对应的所述标签预测概率,通过交叉熵损失函数确定所述预设识别模型的预测损失值。
优选地,如图8所示,增强样本数据确定模块20包括:
向量编码处理单元201,用于将所述第一对话样本数据输入至所述检索模型中,对所述第一对话样本数据进行向量编码处理,得到与所述第一对话样本数据对应的对话编码向量;
向量编辑距离确定单元202,用于自所述检索模型的检索数据库中获取所有检索对话向量,并确定所述对话编码向量与各所述检索对话向量之间的向量编辑距离;一个所述检索对话向量关联一个检索样本句子;
边际距离比较单元203,用于将各所述向量编辑距离与预设距离阈值进行比较,并将小于或等于预设距离阈值的向量编辑距离关联的检索句子,记录为待选取样本数据;
增强样本数据集构建单元204,用于根据所有所述待选取样本数据构建所述增强样本数据集。
优选地,如图8所示,增强样本数据确定模块20还包括:
增强样本数据确定单元205,用于获取预设扩充倍数值,自所述增强样本数据集中根据所述预设扩充倍数值选取预设数值的待选取样本数据,并将选取的所述待选取样本数据记录为所述增强样本数据。
优选地,如图9所示,增强样本数据确定单元205包括:
数据序列插入子单元2051,用于将所述待选取样本数据按照所述向量编辑距离从小到大的顺序插入待选取序列中;
预设数值确定子单元2052,用于将所述预设扩充倍数值与1之间的差值记录为所述预设数值;
增强样本数据选取子单元2053,用于自所述待选取序列中选取序列在前的预设数值的待选取样本数据,并将选取的待选取样本数据记录为所述增强样本数据。
关于对话意图识别模型训练装置的具体限定可以参见上文中对于对话意图识别模型训练方法的限定,在此不再赘述。上述对话意图识别模型训练装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图10所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括可读存储介质、内存储器。该可读存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为可读存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储上述实施例中对话意图识别模型训练方法。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种对话意图识别模型训练方法。本实施例所提供的可读存储介质包括非易失性可读存储介质和易失性可读存储介质。
在一个实施例中,提供了一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其中,所述处理器执行所述计算机可读指令时实现如下步骤:
获取预设对话样本数据集;所述对话样本数据集中包含至少一个不具有对话意图标签的第一对话样本数据;
将所述第一对话样本数据输入至基于ES检索构建的检索模型中,确定与所述第一对话样本数据对应的增强样本数据集;所述增强样本数据集中包括至少一个增强样本数据;
将所述第一对话样本数据以及所述增强样本数据输入至包含第一初始参数的初始意图识别模型中,对所述第一对话样本数据以及所述增强样本数据进行增强意图识别,得到与所述第一对话样本数据对应的第一样本分布,以及与所述增强样本数据对应的第二样本分布;
根据所述第一样本分布以及所述第二样本分布确定分布损失值,并根据各所述分布损失值确定所述初始意图识别模型的总损失值;
在所述总损失值未达到预设的收敛条件时,更新迭代所述初始意图识别模型的第一初始参数,直至所述总损失值达到所述预设的收敛条件时,将收敛之后的所述初始意图识别模型记录为对话意图识别模型。
在一个实施例中,提供了一个或多个存储有计算机可读指令的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
获取预设对话样本数据集;所述对话样本数据集中包含至少一个不具有对话意图标签的第一对话样本数据;
将所述第一对话样本数据输入至基于ES检索构建的检索模型中,确定与所述第一对话样本数据对应的增强样本数据集;所述增强样本数据集中包括至少一个增强样本数据;
将所述第一对话样本数据以及所述增强样本数据输入至包含第一初始参数的初始意图识别模型中,对所述第一对话样本数据以及所述增强样本数据进行增强意图识别,得到与所述第一对话样本数据对应的第一样本分布,以及与所述增强样本数据对应的第二样本分布;
根据所述第一样本分布以及所述第二样本分布确定分布损失值,并根据各所述分布损失值确定所述初始意图识别模型的总损失值;
在所述总损失值未达到预设的收敛条件时,更新迭代所述初始意图识别模型的第一初始参数,直至所述总损失值达到所述预设的收敛条件时,将收敛之后的所述初始意图识别模型记录为对话意图识别模型。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质或者易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种对话意图识别模型训练方法,其中,包括:
    获取预设对话样本数据集;所述对话样本数据集中包含至少一个不具有对话意图标签的第一对话样本数据;
    将所述第一对话样本数据输入至基于ES检索构建的检索模型中,确定与所述第一对话样本数据对应的增强样本数据集;所述增强样本数据集中包括至少一个增强样本数据;
    将所述第一对话样本数据以及所述增强样本数据输入至包含第一初始参数的初始意图识别模型中,对所述第一对话样本数据以及所述增强样本数据进行增强意图识别,得到与所述第一对话样本数据对应的第一样本分布,以及与所述增强样本数据对应的第二样本分布;
    根据所述第一样本分布以及所述第二样本分布确定分布损失值,并根据各所述分布损失值确定所述初始意图识别模型的总损失值;
    在所述总损失值未达到预设的收敛条件时,更新迭代所述初始意图识别模型的第一初始参数,直至所述总损失值达到所述预设的收敛条件时,将收敛之后的所述初始意图识别模型记录为对话意图识别模型。
  2. 如权利要求1所述的对话意图识别模型训练方法,其中,所述对话样本数据集中还包含至少一个具有所述对话意图标签的第二对话样本数据;所述将所述第一对话样本数据以及所述增强样本数据输入至包含第一初始参数的初始意图识别模型中之前,包括:
    将所述第二对话样本数据输入至包含第二初始参数的预设识别模型中,通过所述预设识别模型对所述第二对话样本数据进行标注意图识别,得到与所述第二对话样本数据对应的各标注预测标签;一个所述标注预测标签关联一个标签预测概率;
    根据各所述标注预测标签、与各所述标注预测标签对应的标签预测概率以及所述对话意图标签,确定所述预设识别模型的预测损失值;
    在所述预测损失值未达到预设的收敛条件时,更新迭代所述预设识别模型的第二初始参数,直至所述预测损失值达到所述预设的收敛条件时,将收敛之后的所述预设识别模型记录为所述初始意图识别模型。
  3. 如权利要求2所述的对话意图识别模型训练方法,其中,所述根据各所述标注预测标签、与各所述标注预测标签对应的标签预测概率以及所述对话意图标签,确定所述预设识别模型的预测损失值,包括:
    根据各所述标注预测标签与所述对话意图标签,确定与各所述标注预测标签对应的标注预测结果;
    根据各所述标注预测结果以及与各所述标注预测结果对应的所述标签预测概率,通过交叉熵损失函数确定所述预设识别模型的预测损失值。
  4. 如权利要求1所述的对话意图识别模型训练方法,其中,将所述第一对话样本数据输入至基于ES检索构建的检索模型中,确定与所述第一对话样本数据相似的增强样本数据集,包括:
    将所述第一对话样本数据输入至所述检索模型中,对所述第一对话样本数据进行向量编码处理,得到与所述第一对话样本数据对应的对话编码向量;
    自所述检索模型的检索数据库中获取所有检索对话向量,并确定所述对话编码向量与各所述检索对话向量之间的向量编辑距离;一个所述检索对话向量关联一个检索样本句子;
    将各所述向量编辑距离与预设距离阈值进行比较,并将小于或等于预设距离阈值的向量编辑距离关联的检索句子,记录为待选取样本数据;
    根据所有所述待选取样本数据构建所述增强样本数据集。
  5. 如权利要求4所述的对话意图识别模型训练方法,其中,所述根据所有所述待选取样本数据构建所述增强样本数据集之后,还包括:
    获取预设扩充倍数值,自所述增强样本数据集中根据所述预设扩充倍数值选取预设数值的待选取样本数据,并将选取的所述待选取样本数据记录为所述增强样本数据。
  6. 如权利要求5所述的对话意图识别模型训练方法,其中,所述获取预设扩充倍数值,自所述增强样本数据集中根据所述预设扩充倍数值选取预设数值的待选取样本数据,并将选取的所述待选取样本数据记录为所述增强样本数据,包括:
    将所述待选取样本数据按照所述向量编辑距离从小到大的顺序插入待选取序列中;
    将所述预设扩充倍数值与1之间的差值记录为所述预设数值;
    自所述待选取序列中选取序列在前的预设数值的待选取样本数据,并将选取的待选取样本数据记录为所述增强样本数据。
  7. 一种对话意图识别模型训练装置,其中,包括:
    对话样本数据集获取模块,用于获取预设对话样本数据集;所述对话样本数据集中包含至少一个不具有对话意图标签的第一对话样本数据;
    增强样本数据确定模块,用于将所述第一对话样本数据输入至基于ES检索构建的检索模型中,确定与所述第一对话样本数据对应的增强样本数据集;所述增强样本数据集中包括至少一个增强样本数据;
    增强意图识别模块,用于将所述第一对话样本数据以及所述增强样本数据输入至包含第一初始参数的初始意图识别模型中,对所述第一对话样本数据以及所述增强样本数据进行增强意图识别,得到与所述第一对话样本数据对应的第一样本分布,以及与所述增强样本数据对应的第二样本分布;
    总损失值确定模块,用于根据所述第一样本分布以及所述第二样本分布确定分布损失值,并根据各所述分布损失值确定所述初始意图识别模型的总损失值;
    第一参数更新模块,用于在所述总损失值未达到预设的收敛条件时,更新迭代所述初始意图识别模型的第一初始参数,直至所述总损失值达到所述预设的收敛条件时,将收敛之后的所述初始意图识别模型记录为对话意图识别模型。
  8. 如权利要求7所述的对话意图识别模型训练装置,其中,所述对话样本数据集中还包含至少一个具有所述对话意图标签的第二对话样本数据;所述对话意图识别模型训练装置还包括:
    标注意图识别模块,用于将所述第二对话样本数据输入至包含第二初始参数的预设识别模型中,通过所述预设识别模型对所述第二对话样本数据进行标注意图识别,得到与所述第二对话样本数据对应的各标注预测标签;一个所述标注预测标签关联一个标签预测概率;
    预测损失值确定模块,用于根据各所述标注预测标签、与各所述标注预测标签对应的标签预测概率以及所述对话意图标签,确定所述预设识别模型的预测损失值;
    第二参数更新模块,用于在所述预测损失值未达到预设的收敛条件时,更新迭代所述预设识别模型的第二初始参数,直至所述预测损失值达到所述预设的收敛条件时,将收敛之后的所述预设识别模型记录为所述初始意图识别模型。
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其中,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取预设对话样本数据集;所述对话样本数据集中包含至少一个不具有对话意图标签的第一对话样本数据;
    将所述第一对话样本数据输入至基于ES检索构建的检索模型中,确定与所述第一对话样本数据对应的增强样本数据集;所述增强样本数据集中包括至少一个增强样本数据;
    将所述第一对话样本数据以及所述增强样本数据输入至包含第一初始参数的初始意 图识别模型中,对所述第一对话样本数据以及所述增强样本数据进行增强意图识别,得到与所述第一对话样本数据对应的第一样本分布,以及与所述增强样本数据对应的第二样本分布;
    根据所述第一样本分布以及所述第二样本分布确定分布损失值,并根据各所述分布损失值确定所述初始意图识别模型的总损失值;
    在所述总损失值未达到预设的收敛条件时,更新迭代所述初始意图识别模型的第一初始参数,直至所述总损失值达到所述预设的收敛条件时,将收敛之后的所述初始意图识别模型记录为对话意图识别模型。
  10. 如权利要求9所述的计算机设备,其中,所述对话样本数据集中还包含至少一个具有所述对话意图标签的第二对话样本数据;所述将所述第一对话样本数据以及所述增强样本数据输入至包含第一初始参数的初始意图识别模型中之前,所述处理器执行所述计算机可读指令时还实现如下步骤:
    将所述第二对话样本数据输入至包含第二初始参数的预设识别模型中,通过所述预设识别模型对所述第二对话样本数据进行标注意图识别,得到与所述第二对话样本数据对应的各标注预测标签;一个所述标注预测标签关联一个标签预测概率;
    根据各所述标注预测标签、与各所述标注预测标签对应的标签预测概率以及所述对话意图标签,确定所述预设识别模型的预测损失值;
    在所述预测损失值未达到预设的收敛条件时,更新迭代所述预设识别模型的第二初始参数,直至所述预测损失值达到所述预设的收敛条件时,将收敛之后的所述预设识别模型记录为所述初始意图识别模型。
  11. 如权利要求10所述的计算机设备,其中,所述根据各所述标注预测标签、与各所述标注预测标签对应的标签预测概率以及所述对话意图标签,确定所述预设识别模型的预测损失值,包括:
    根据各所述标注预测标签与所述对话意图标签,确定与各所述标注预测标签对应的标注预测结果;
    根据各所述标注预测结果以及与各所述标注预测结果对应的所述标签预测概率,通过交叉熵损失函数确定所述预设识别模型的预测损失值。
  12. 如权利要求9所述的计算机设备,其中,将所述第一对话样本数据输入至基于ES检索构建的检索模型中,确定与所述第一对话样本数据相似的增强样本数据集,包括:
    将所述第一对话样本数据输入至所述检索模型中,对所述第一对话样本数据进行向量编码处理,得到与所述第一对话样本数据对应的对话编码向量;
    自所述检索模型的检索数据库中获取所有检索对话向量,并确定所述对话编码向量与各所述检索对话向量之间的向量编辑距离;一个所述检索对话向量关联一个检索样本句子;
    将各所述向量编辑距离与预设距离阈值进行比较,并将小于或等于预设距离阈值的向量编辑距离关联的检索句子,记录为待选取样本数据;
    根据所有所述待选取样本数据构建所述增强样本数据集。
  13. 如权利要求12所述的计算机设备,其中,所述根据所有所述待选取样本数据构建所述增强样本数据集之后,所述处理器执行所述计算机可读指令时还实现如下步骤:
    获取预设扩充倍数值,自所述增强样本数据集中根据所述预设扩充倍数值选取预设数值的待选取样本数据,并将选取的所述待选取样本数据记录为所述增强样本数据。
  14. 如权利要求13所述的计算机设备,其中,所述获取预设扩充倍数值,自所述增强样本数据集中根据所述预设扩充倍数值选取预设数值的待选取样本数据,并将选取的所述待选取样本数据记录为所述增强样本数据,包括:
    将所述待选取样本数据按照所述向量编辑距离从小到大的顺序插入待选取序列中;
    将所述预设扩充倍数值与1之间的差值记录为所述预设数值;
    自所述待选取序列中选取序列在前的预设数值的待选取样本数据,并将选取的待选取样本数据记录为所述增强样本数据。
  15. 一个或多个存储有计算机可读指令的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
    获取预设对话样本数据集;所述对话样本数据集中包含至少一个不具有对话意图标签的第一对话样本数据;
    将所述第一对话样本数据输入至基于ES检索构建的检索模型中,确定与所述第一对话样本数据对应的增强样本数据集;所述增强样本数据集中包括至少一个增强样本数据;
    将所述第一对话样本数据以及所述增强样本数据输入至包含第一初始参数的初始意图识别模型中,对所述第一对话样本数据以及所述增强样本数据进行增强意图识别,得到与所述第一对话样本数据对应的第一样本分布,以及与所述增强样本数据对应的第二样本分布;
    根据所述第一样本分布以及所述第二样本分布确定分布损失值,并根据各所述分布损失值确定所述初始意图识别模型的总损失值;
    在所述总损失值未达到预设的收敛条件时,更新迭代所述初始意图识别模型的第一初始参数,直至所述总损失值达到所述预设的收敛条件时,将收敛之后的所述初始意图识别模型记录为对话意图识别模型。
  16. 如权利要求15所述的可读存储介质,其中,所述对话样本数据集中还包含至少一个具有所述对话意图标签的第二对话样本数据;所述将所述第一对话样本数据以及所述增强样本数据输入至包含第一初始参数的初始意图识别模型中之前,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:
    将所述第二对话样本数据输入至包含第二初始参数的预设识别模型中,通过所述预设识别模型对所述第二对话样本数据进行标注意图识别,得到与所述第二对话样本数据对应的各标注预测标签;一个所述标注预测标签关联一个标签预测概率;
    根据各所述标注预测标签、与各所述标注预测标签对应的标签预测概率以及所述对话意图标签,确定所述预设识别模型的预测损失值;
    在所述预测损失值未达到预设的收敛条件时,更新迭代所述预设识别模型的第二初始参数,直至所述预测损失值达到所述预设的收敛条件时,将收敛之后的所述预设识别模型记录为所述初始意图识别模型。
  17. 如权利要求16所述的可读存储介质,其中,所述根据各所述标注预测标签、与各所述标注预测标签对应的标签预测概率以及所述对话意图标签,确定所述预设识别模型的预测损失值,包括:
    根据各所述标注预测标签与所述对话意图标签,确定与各所述标注预测标签对应的标注预测结果;
    根据各所述标注预测结果以及与各所述标注预测结果对应的所述标签预测概率,通过交叉熵损失函数确定所述预设识别模型的预测损失值。
  18. 如权利要求15所述的可读存储介质,其中,将所述第一对话样本数据输入至基于ES检索构建的检索模型中,确定与所述第一对话样本数据相似的增强样本数据集,包括:
    将所述第一对话样本数据输入至所述检索模型中,对所述第一对话样本数据进行向量编码处理,得到与所述第一对话样本数据对应的对话编码向量;
    自所述检索模型的检索数据库中获取所有检索对话向量,并确定所述对话编码向量与各所述检索对话向量之间的向量编辑距离;一个所述检索对话向量关联一个检索样本句子;
    将各所述向量编辑距离与预设距离阈值进行比较,并将小于或等于预设距离阈值的向量编辑距离关联的检索句子,记录为待选取样本数据;
    根据所有所述待选取样本数据构建所述增强样本数据集。
  19. 如权利要求18所述的可读存储介质,其中,所述根据所有所述待选取样本数据构建所述增强样本数据集之后,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:
    获取预设扩充倍数值,自所述增强样本数据集中根据所述预设扩充倍数值选取预设数值的待选取样本数据,并将选取的所述待选取样本数据记录为所述增强样本数据。
  20. 如权利要求19所述的可读存储介质,其中,所述获取预设扩充倍数值,自所述增强样本数据集中根据所述预设扩充倍数值选取预设数值的待选取样本数据,并将选取的所述待选取样本数据记录为所述增强样本数据,包括:
    将所述待选取样本数据按照所述向量编辑距离从小到大的顺序插入待选取序列中;
    将所述预设扩充倍数值与1之间的差值记录为所述预设数值;
    自所述待选取序列中选取序列在前的预设数值的待选取样本数据,并将选取的待选取样本数据记录为所述增强样本数据。
PCT/CN2021/083953 2020-12-31 2021-03-30 对话意图识别模型训练方法、装置、计算机设备及介质 WO2022141864A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011637063.XA CN112766319A (zh) 2020-12-31 2020-12-31 对话意图识别模型训练方法、装置、计算机设备及介质
CN202011637063.X 2020-12-31

Publications (1)

Publication Number Publication Date
WO2022141864A1 true WO2022141864A1 (zh) 2022-07-07

Family

ID=75698053

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083953 WO2022141864A1 (zh) 2020-12-31 2021-03-30 对话意图识别模型训练方法、装置、计算机设备及介质

Country Status (2)

Country Link
CN (1) CN112766319A (zh)
WO (1) WO2022141864A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115114407A (zh) * 2022-07-12 2022-09-27 平安科技(深圳)有限公司 意图识别方法、装置、计算机设备及存储介质
CN116776887A (zh) * 2023-08-18 2023-09-19 昆明理工大学 一种基于样本相似性计算的负采样远程监督实体识别方法
CN117523565A (zh) * 2023-11-13 2024-02-06 拓元(广州)智慧科技有限公司 尾部类样本标注方法、装置、电子设备和存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256434B (zh) * 2021-06-08 2021-11-23 平安科技(深圳)有限公司 车险理赔行为识别方法、装置、设备及存储介质
CN113469237B (zh) * 2021-06-28 2023-09-15 平安科技(深圳)有限公司 用户意图识别方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111161740A (zh) * 2019-12-31 2020-05-15 中国建设银行股份有限公司 意图识别模型训练方法、意图识别方法以及相关装置
CN111198938A (zh) * 2019-12-26 2020-05-26 深圳市优必选科技股份有限公司 一种样本数据处理方法、样本数据处理装置及电子设备
US20200202211A1 (en) * 2018-12-25 2020-06-25 Abbyy Production Llc Neural network training utilizing loss functions reflecting neighbor token dependencies
CN111831826A (zh) * 2020-07-24 2020-10-27 腾讯科技(深圳)有限公司 跨领域的文本分类模型的训练方法、分类方法以及装置
CN112069302A (zh) * 2020-09-15 2020-12-11 腾讯科技(深圳)有限公司 会话意图识别模型的训练方法、会话意图识别方法及装置
CN112069300A (zh) * 2020-09-04 2020-12-11 中国平安人寿保险股份有限公司 任务型对话的语义识别方法、装置、电子设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11100917B2 (en) * 2019-03-27 2021-08-24 Adobe Inc. Generating ground truth annotations corresponding to digital image editing dialogues for training state tracking models
CN111061847A (zh) * 2019-11-22 2020-04-24 中国南方电网有限责任公司 对话生成及语料扩充方法、装置、计算机设备和存储介质
CN111061850B (zh) * 2019-12-12 2023-04-28 中国科学院自动化研究所 基于信息增强的对话状态跟踪方法、系统、装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200202211A1 (en) * 2018-12-25 2020-06-25 Abbyy Production Llc Neural network training utilizing loss functions reflecting neighbor token dependencies
CN111198938A (zh) * 2019-12-26 2020-05-26 深圳市优必选科技股份有限公司 一种样本数据处理方法、样本数据处理装置及电子设备
CN111161740A (zh) * 2019-12-31 2020-05-15 中国建设银行股份有限公司 意图识别模型训练方法、意图识别方法以及相关装置
CN111831826A (zh) * 2020-07-24 2020-10-27 腾讯科技(深圳)有限公司 跨领域的文本分类模型的训练方法、分类方法以及装置
CN112069300A (zh) * 2020-09-04 2020-12-11 中国平安人寿保险股份有限公司 任务型对话的语义识别方法、装置、电子设备及存储介质
CN112069302A (zh) * 2020-09-15 2020-12-11 腾讯科技(深圳)有限公司 会话意图识别模型的训练方法、会话意图识别方法及装置

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115114407A (zh) * 2022-07-12 2022-09-27 平安科技(深圳)有限公司 意图识别方法、装置、计算机设备及存储介质
CN115114407B (zh) * 2022-07-12 2024-04-19 平安科技(深圳)有限公司 意图识别方法、装置、计算机设备及存储介质
CN116776887A (zh) * 2023-08-18 2023-09-19 昆明理工大学 一种基于样本相似性计算的负采样远程监督实体识别方法
CN116776887B (zh) * 2023-08-18 2023-10-31 昆明理工大学 一种基于样本相似性计算的负采样远程监督实体识别方法
CN117523565A (zh) * 2023-11-13 2024-02-06 拓元(广州)智慧科技有限公司 尾部类样本标注方法、装置、电子设备和存储介质
CN117523565B (zh) * 2023-11-13 2024-05-17 拓元(广州)智慧科技有限公司 尾部类样本标注方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN112766319A (zh) 2021-05-07

Similar Documents

Publication Publication Date Title
WO2022141864A1 (zh) 对话意图识别模型训练方法、装置、计算机设备及介质
CN110765265B (zh) 信息分类抽取方法、装置、计算机设备和存储介质
CN111160017B (zh) 关键词抽取方法、话术评分方法以及话术推荐方法
WO2022142613A1 (zh) 训练语料扩充方法及装置、意图识别模型训练方法及装置
CN109190120B (zh) 神经网络训练方法和装置及命名实体识别方法和装置
WO2021179570A1 (zh) 序列标注方法、装置、计算机设备和存储介质
CN107797985B (zh) 建立同义鉴别模型以及鉴别同义文本的方法、装置
WO2018153265A1 (zh) 关键词提取方法、计算机设备和存储介质
WO2021135469A1 (zh) 基于机器学习的信息抽取方法、装置、计算机设备及介质
CN111666401B (zh) 基于图结构的公文推荐方法、装置、计算机设备及介质
WO2021121198A1 (zh) 基于语义相似度的实体关系抽取方法、装置、设备及介质
CN111444723A (zh) 信息抽取模型训练方法、装置、计算机设备和存储介质
TW202020691A (zh) 特徵詞的確定方法、裝置和伺服器
CN110162771B (zh) 事件触发词的识别方法、装置、电子设备
WO2022134805A1 (zh) 文档分类预测方法、装置、计算机设备及存储介质
CN112380837B (zh) 基于翻译模型的相似句子匹配方法、装置、设备及介质
CN110427612B (zh) 基于多语言的实体消歧方法、装置、设备和存储介质
WO2022227162A1 (zh) 问答数据处理方法、装置、计算机设备及存储介质
CN110851546B (zh) 一种验证、模型的训练、模型的共享方法、系统及介质
WO2022174496A1 (zh) 基于生成模型的数据标注方法、装置、设备及存储介质
WO2022142108A1 (zh) 面试实体识别模型训练、面试信息实体提取方法及装置
CN110633475A (zh) 基于计算机场景的自然语言理解方法、装置、系统和存储介质
WO2023065635A1 (zh) 命名实体识别方法、装置、存储介质及终端设备
CN114021570A (zh) 实体消歧方法、装置、设备及存储介质
CN111145914A (zh) 一种确定肺癌临床病种库文本实体的方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21912634

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21912634

Country of ref document: EP

Kind code of ref document: A1