WO2023040153A1

WO2023040153A1 - Method, apparatus, and device for updating intent recognition model, and readable medium

Info

Publication number: WO2023040153A1
Application number: PCT/CN2022/071694
Authority: WO
Inventors: 罗圣西; 马骏; 王少军
Original assignee: 平安科技（深圳）有限公司
Priority date: 2021-09-18
Filing date: 2022-01-13
Publication date: 2023-03-23
Also published as: CN113792540B; CN113792540A

Abstract

A method, apparatus, and device for updating an intent recognition model, and a readable medium, which relate to the field of artificial intelligence. The method improves the accuracy of an intent recognition model recognizing corpus intent. The method comprises: obtaining an original dialog corpus, and using an intent model for recognition to obtain a first intent category; initializing the corpus into a mask list, and replacing an element value in the mask list according to a selection rule; constructing an auxiliary dialogue corpus corresponding to the original dialogue corpus, and performing recognition by means of a training model to obtain a second intent category; detecting the degree of difference between the first and second intent categories, obtaining a first detection result, and performing selection to obtain an alternative outside corpus; by means of a training model, recognizing the alternative outside corpus to obtain a third intent category; detecting the degree of difference between the first and third intent categories to obtain a second detection result, and performing selection to obtain a final outside corpus; and retraining said corpus and the original dialogue corpus to obtain an intent recognition model.

Description

Intent recognition model updating method, device, equipment and readable medium

This application claims the priority of the Chinese patent application submitted to the China Patent Office on September 18, 2021, with the application number 202111095912.8, and the title of the invention is "Intention Recognition Model Updating Method, Device, Equipment, and Readable Medium", the entire content of which is passed References are incorporated in the application.

technical field

The present application relates to the field of artificial intelligence, and discloses a method, device, equipment and readable medium for updating an intention recognition model.

Background technique

With the continuous development and innovation of computer technology, artificial intelligence technology has been gradually applied in all walks of life, and related intelligent products and technology applications have gradually penetrated into all aspects of people's daily life, greatly improving people's production In life, human-computer dialogue is an important research field of artificial intelligence. It is to study how to enable computers to understand and use the natural language of human society, realize natural language communication between humans and computers, and play a part of the brain power that computers can replace human beings. Labor, as well as the role of extending the human brain and reducing part of the work of humans. In daily life, the complexity and variety of dialogue scenarios require computers to accurately identify the customer's intentions during the dialogue, and better understand the customer's needs in order to facilitate more Start the dialogue well and meet the real needs of customers.

"Intent recognition" refers to a piece of information input by the user to express the query demand, and judge the type of intent stated by the user. The inventor realizes that the current intent recognition technology is mainly used in search engines, man-machine dialogue systems, etc., in When applied to human-computer dialogue, an intention recognition model is constructed to identify the customer's intention. Due to the interference of environmental noise in daily human-computer dialogue, a large amount of corpus that does not belong to the existing intention category will be generated. If If the intent recognition model cannot correctly identify this kind of corpus, it will have a great impact on the user experience, and in severe cases, there may be a risk of leaking user privacy. The existing solution is to generate out-of-set corpus through data enhancement methods, generally random insertion, deletion, exchange and other operations, and train the rejection ability of the intention recognition model through out-of-set corpus, but this data enhancement method cannot guarantee that the generated corpus will be certain. Belonging to the out-of-set category will also lead to corpus entanglement in the training corpus, affecting the recognition effect of the trained intent recognition model on normal corpus, that is, the problem of low update accuracy of the existing intent recognition model.

Contents of the invention

The present application provides an intent recognition model update method, device, device and storage medium. An intent recognition model update method, device, device and readable medium are used to improve the recognition accuracy of the intent recognition model for corpus intent.

The first aspect of the present application provides a method for updating an intent recognition model, wherein the method for updating an intent recognition model includes: obtaining original dialogue material, and identifying the meaning of each sentence in the original dialogue material by presetting an intent recognition model The first intent category; initialize the mask list corresponding to the original dialogue material, and adjust a group of element values in the mask list according to preset selection rules to obtain an adjusted mask list; based on the adjusted A mask list, constructing the auxiliary dialogue material corresponding to the original dialogue material, and identifying the second intent category of each statement in the auxiliary dialogue material through the intent recognition model; for the first intent category and the The degree of difference is detected for the second intent category to obtain the first detection result, and based on the first detection result, a sentence satisfying the preset difference condition is selected from the attached dialogue corpus as the final extra-collection corpus; The foreign corpus is marked as out-of-set intent, and the original dialogue corpus and the final out-of-set corpus are used to train the intent recognition model to obtain a new intent recognition model.

The second aspect of the present application provides an intent recognition model updating device, including a memory, a processor, and computer-readable instructions stored on the memory and operable on the processor, and the processor executes the computer The following steps are implemented when the instruction is readable: obtain the original dialogue material, and identify the first intent category of each statement in the original dialogue material through a preset intent recognition model; initialize the mask list corresponding to the original dialogue material, and Adjust a set of element values in the mask list according to preset selection rules to obtain an adjusted mask list; based on the adjusted mask list, construct an auxiliary dialogue material corresponding to the original dialogue material, and pass The intent recognition model identifies the second intent category of each statement in the attached dialogue material; detects the degree of difference between the first intent category and the second intent category to obtain a first detection result, and based on the According to the first detection result, a sentence that satisfies the preset difference condition is selected from the attached dialogue material as the final out-of-set corpus; the final out-of-set corpus is marked as out-of-set intent, and the original dialogue material and the Finally, the extra-collection corpus is used to train the intent recognition model to obtain a new intent recognition model.

The third aspect of the present application provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are run on the computer, the computer is made to perform the following steps: obtain the original dialogue material, And through the preset intention recognition model, identify the first intent category of each sentence in the original dialogue material; initialize the mask list corresponding to the original dialogue material, and adjust the mask list according to the preset selection rules A set of element values to obtain an adjusted mask list; based on the adjusted mask list, construct an auxiliary dialogue corpus corresponding to the original dialogue corpus, and use the intent recognition model to identify the the second intent category of each sentence; detecting the degree of difference between the first intent category and the second intent category to obtain a first detection result, and based on the first detection result, from the attached dialogue data Selecting sentences that satisfy the preset difference conditions as the final extra-set corpus; marking the final extra-set corpus as extra-set intent, and using the original dialogue material and the final extra-set corpus to train the intent recognition model , to get a new intent recognition model.

The fourth aspect of the present application provides an intent recognition model update device, wherein the intent recognition model update device includes: a corpus acquisition module, used to obtain original dialogue materials, and recognize the original dialogue by presetting the intent recognition model The first intent category of each sentence in the corpus; the mask construction module is used to initialize the mask list corresponding to the original dialogue material, and adjust a group of element values in the mask list according to preset selection rules to obtain An adjusted mask list; a second intention module, configured to construct an auxiliary dialogue material corresponding to the original dialogue material based on the adjusted mask list, and identify the auxiliary dialogue material through the intent recognition model The second intent category of each sentence in the sentence; the final out-of-set module is used to detect the degree of difference between the first intent category and the second intent category, obtain the first detection result, and based on the first detection result , selecting sentences satisfying the preset difference conditions from the attached dialogue data as the final extra-set corpus; the corpus training module is used to mark the final out-of-set corpus as out-of-set intent, and adopt the original dialogue corpus and all The final out-of-collection corpus is used to train the intent recognition model to obtain a new intent recognition model.

In the technical solution provided by the present application, the page to be confused is obtained, and the target detection model is used to determine the text area in the page to be confused, and the position coordinates corresponding to the text area are determined; Recognize the text in the region to obtain the text; use regular expressions to query the text to be confused and the position coordinates corresponding to the text to be confused in the text, and use the color extraction algorithm to extract the text color of the text to be confused ; According to the text color of the text to be confused, a confusion layer is generated on the interface to be confused corresponding to the position coordinates of the text to be confused, and the text to be confused is covered by the confusion layer to obtain a covered page . This application uses the mask list in the computer data processing method to realize the sentence processing of the original dialogue material, constructs the mask list and replaces the relevant values to obtain the candidate foreign language material that meets the first difference condition, and the processing method of the language material uses the preset Processing the mathematical laws of the set, so that the generated candidate out-of-set corpus is more suitable for the category of out-of-set intentions. At the same time, by identifying and comparing the obtained candidate out-of-set corpus with the second difference condition, it is possible to further avoid the appearance of corpus in the training corpus The phenomenon of entanglement, so as to retrain the obtained final out-of-set corpus and the original dialogue corpus to obtain a new intent recognition model. This method can eliminate the need for additional comparison experiments to realize the recognition rejection function by setting the confidence threshold, shortening the time The cycle of launching and optimizing the intent recognition model, while enabling the trained intent recognition model to have the ability to reject recognition, reduces the impact on the normal corpus recognition effect, thereby improving the update accuracy of the intent recognition model.

Description of drawings

FIG. 1 is a schematic diagram of the first embodiment of the method for updating the intent recognition model of the present application;

FIG. 2 is a schematic diagram of a second embodiment of the method for updating the intent recognition model of the present application;

FIG. 3 is a schematic diagram of a third embodiment of the method for updating the intent recognition model of the present application;

FIG. 4 is a schematic diagram of a fourth embodiment of the method for updating the intent recognition model of the present application;

FIG. 5 is a schematic diagram of a fifth embodiment of the method for updating the intent recognition model of the present application;

Fig. 6 is a schematic diagram of an embodiment of the device for updating the intention recognition model of the present application;

Fig. 7 is a schematic diagram of another embodiment of the device for updating the intention recognition model of the present application;

Fig. 8 is a schematic diagram of an embodiment of an intention recognition model updating device of the present application.

Detailed ways

Embodiments of the present application provide a method, device, device, and readable medium for updating an intent recognition model. In the embodiment of the present application, compared with the prior art processing method for data enhancement is random insertion, deletion, exchange and other operations, the present application uses the mask list in the computer data processing method to realize the statement of the original dialogue material Processing, constructing a mask list and replacing relevant values to obtain candidate out-of-collection corpus that meets the first difference condition. The processing method of the corpus uses preset mathematical laws to make the generated candidate out-of-collection corpus more satisfy the category of out-of-collection intent , this method can avoid the additional comparison experiment of realizing the recognition rejection function by setting the confidence threshold, shorten the cycle of launching and optimizing the intention recognition model, and at the same time make the trained intention recognition model have the recognition rejection ability, and reduce the The impact on the recognition effect of normal corpus improves the update accuracy of the intent recognition model.

The terms "first", "second", "third", "fourth", etc. (if any) in the specification and claims of the present application and the above drawings are used to distinguish similar objects, and not necessarily Used to describe a specific sequence or sequence. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the term "comprising" or "having" and any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to those explicitly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.

For ease of understanding, the following describes the specific process of the embodiment of the present application. Please refer to FIG. 1. The first embodiment of the method for updating the intent recognition model in the embodiment of the present application includes:

101. Obtain the original dialogue material, and identify the first intent category of each sentence in the original dialogue material through a preset intent recognition model;

It can be understood that the execution subject of the present application may be an intent recognition model updating device, or may also be a terminal or a server, which is not specifically limited here. The embodiment of the present application is described by taking the server as an execution subject as an example.

The embodiments of the present application may acquire and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .

Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

In this embodiment, the original dialogue materials here are generally collected and summarized by business personnel in related industries according to the frequently consulted questions in this industry and the business that they often need to deal with. It is used to establish an intent recognition model. , and the preset intent recognition model here can be FastText (fast text classifier), textCNN (text classification algorithm) or any classification model based on the pre-trained language model, by obtaining the original dialogue material and passing the preset intent Recognition model, identifying the first intent category of each sentence in the original dialogue data, here is the recognition principle of the intent category because the preset intent recognition model is essentially a series of mathematical operations, which can output different intents obtained by recognition operations The results in the form of probability scores, and select the label with the highest probability of intention operation and record the label and corresponding probability for output.

In practical applications, the target man-machine dialogue system obtains relevant original dialogue materials collected by relevant business personnel, and through the preset intent recognition model, performs intent recognition on each sentence in the obtained original dialogue materials and its corresponding intent Label and probability, where the probability represents the possibility that the corpus belongs to this category, and then the first intent category in each sentence in the original dialogue corpus is obtained.

102. Initialize the mask list corresponding to the original dialogue material, and adjust a group of element values in the mask list according to preset selection rules to obtain an adjusted mask list;

In this embodiment, the mask here is a string of binary codes, which can also be replaced by other character elements in practical applications. Since computers essentially perform operations on binary codes, this program uses a mask list, which is convenient for numerical conversion and For computer processing, the preset selection rules here are determined according to the business type of the intention recognition model dialogue to be trained. According to the length of the original dialogue material and the difficulty of dialogue processing, two or more masks can be set for numerical adjustment. , and the adjustment rules only need to conform to certain mathematical laws. For example, adjust according to the reading order of the corpus, initialize the obtained original dialogue corpus as a mask list, and adjust the mask list according to the preset selection rules. Elements are adjusted and replaced, resulting in an adjusted mask list.

In practical applications, based on the obtained original dialogue material, first initialize a mask element with the same length as the corpus for each dialogue material, and the mask elements are all 0, and then select the first two in the corpus mask. characters, adjust the mask element at the corresponding position of the selected character to 1, and then form a new character in the original dialogue corpus with the character at the position element 0 corresponding to the mask list and the character at the transformed element 1 according to the order of the original position. sentences, get the adjusted mask list.

103. Based on the adjusted mask list, construct an auxiliary dialogue corpus corresponding to the original dialogue corpus, and identify the second intent category of each sentence in the auxiliary dialogue corpus through an intent recognition model;

In this embodiment, the adjusted mask list is used to determine the first element value corresponding to each piece of corpus in the adjusted mask list, where the first element value is the corresponding value in the mask list obtained from the initialization process of the original dialogue material. mask element value, by determining the value of the first element and its corresponding position, replacing it with the words corresponding to the original dialogue material and arranging them in the original order to obtain the auxiliary dialogue material, and identifying the auxiliary dialogue material through the intent recognition model. The labels and corresponding probabilities for each sentence in the dialogue corpus yield the second intent category.

In practical applications, the target man-machine dialogue system uses the processed mask list after numerical transformation, first identifies the first mask element list in the mask list after numerical transformation and correspondingly initializes the mask element and its position, and then recognizes the obtained The first mask element and its corresponding position are replaced by the words corresponding to the corresponding position in the original dialogue material, and then the words replacing the first mask element are arranged in the order of the original dialogue material , to obtain the auxiliary dialogue material corresponding to the original dialogue material, and then pass the obtained auxiliary dialogue material through the current intent recognition model to be trained to identify the labels and corresponding probabilities corresponding to each sentence in the auxiliary dialogue material, and then obtain the auxiliary dialogue material The corresponding second intent class.

104. Detecting the degree of difference between the first intent category and the second intent category to obtain a first detection result, and based on the first detection result, select a sentence satisfying a preset difference condition from the attached dialogue data as the final extra-collection data;

In this embodiment, the preset difference conditions here include the first difference condition and the second difference condition; according to the first intention category and the second intention category obtained by the recognition process, the label corresponding to each sentence corpus and its corresponding The detection of the degree of difference corresponds to the probability. The detection of the degree of difference here refers to the use of intent recognition to obtain whether the tags in the original dialogue corpus and the attached dialogue corpus are different, or to compare the corresponding probabilities of the two. Generally, the attached dialogue corpus The difference between the corresponding label probability of the original dialogue material and the corresponding label basic probability of the original dialogue material is greater than a certain threshold (the threshold is set according to the probability characteristics of the intent recognition model, and the general threshold is small), and the obtained result is changed to the mask list , intent recognition, and difference degree detection loop operation processing until the preset loop processing conditions are met, and then the first detection result is obtained, and based on the obtained first detection result, select from the attached dialogue materials that meet the preset first difference condition Sentences are used as the candidate out-of-set corpus, and after class identification and difference detection are carried out on the candidate out-of-set corpus, the second detection result is obtained, and the sentences satisfying the preset second difference condition are selected from the candidate out-of-set corpus as the final set foreign language materials.

In practical applications, the target man-machine dialogue system detects the degree of difference between the corresponding probabilities of the machine corresponding to each sentence of the intention category according to the first intention category and the second intention category, and compares whether the difference between the two is greater than Preset values, and the obtained results are again subjected to mask list transformation, intent recognition and difference degree detection until the exit condition of replacing all words in the corpus is met, and the first detection result is obtained, and then according to the first The detection result uses the preset first difference condition, and selects a sentence from each dialogue material that satisfies the large difference in the difference between the two intent recognition probabilities as the candidate foreign language material; the candidate foreign language is identified through the intent recognition model The third intent category of each sentence in the data; the degree of difference between the first intent category and the third intent category is detected to obtain the second detection result, and according to the second detection result, select from the candidate out-of-set corpus that satisfies the preset The sentences of the second difference condition are used as the final out-of-set corpus.

105. Mark the final out-of-set corpus as out-of-set intent, and use the original dialogue corpus and the final out-of-set corpus to train the intent recognition model to obtain a new intent recognition model.

In this embodiment, according to the final out-of-set corpus obtained in step 104, it is marked as the out-of-set intention of this model, and then the original dialogue data and the final out-of-set corpus obtained from the processing are combined into new training corpus, which is based on machine learning methods Carry out intent recognition training with new training corpus, obtain a new intent recognition model, and then obtain an intent recognition model with a recognition rejection function. The basic machine learning method here refers to using the new training corpus data obtained and the previous intent recognition model training. Experience, using artificial intelligence and other technologies to continuously train the model, so as to improve the recognition accuracy of the intent recognition model.

In practical applications, the target human-computer dialogue system marks it as the out-of-set intention of the training model according to the obtained final out-of-set corpus, and then merges the original dialogue material and the final out-of-set corpus to obtain a new training corpus , the new training corpus is based on the method of machine learning to perform repeated training in the intention recognition model to be trained this time, so as to obtain a new intention recognition model with recognition rejection function.

In this embodiment, a new intention recognition model is obtained through training, and the acquired corpus to be recognized is transferred to the intention recognition model, and the new intention recognition model is performed according to the obtained prediction to be recognized and the out-of-set corpus in the model. Corpus identification, identifying the extra-set corpus that has nothing to do with the real intention, and then obtaining the corpus corresponding to the real intention predicted by the test and its corresponding intention category, and then returning the recognized intention to the target human-computer dialogue system for display.

In the embodiment of the present application, compared with the prior art processing method for data enhancement is random insertion, deletion, exchange and other operations, the present application uses the mask list in the computer data processing method to realize the statement of the original dialogue material Processing, constructing a mask list and replacing relevant values to obtain candidate out-of-collection corpus that meets the first difference condition. The processing method of the corpus uses preset mathematical laws to make the generated candidate out-of-collection corpus more satisfy the category of out-of-collection intent , this method can avoid the additional comparison experiment of realizing the recognition rejection function by setting the confidence threshold, shorten the cycle of launching and optimizing the intention recognition model, and at the same time make the trained intention recognition model have the recognition rejection ability, and reduce the The impact on the recognition effect of normal corpus improves the accuracy of updating the intent recognition model.

Please refer to Figure 2, the second embodiment of the method for updating the intent recognition model in the embodiment of the present application includes:

201. Obtain the original dialogue material, and identify the first intent category of each sentence in the original dialogue material through a preset intent recognition model;

202. Perform sentence segmentation processing on the original dialogue data to obtain multiple sentences, and calculate the string length of each sentence in the original dialogue data respectively;

In this embodiment, by performing sentence segmentation processing on the original dialogue data, the sentence segmentation processing here is to realize the sentence processing of the middle corpus of the original dialogue data by identifying the basic punctuation marks in the original dialogue data, so that the original dialogue can be obtained Multiple sentences of the corpus, and the string length of each sentence obtained in the original dialogue data is calculated separately through the preset string function, and the string length value corresponding to each sentence can be obtained.

203. Using the preset first element value with the same length as each character string, respectively combine into a mask corresponding to each statement;

In this embodiment, by obtaining the string length corresponding to each sentence, and then using the preset first mask element value of the same length corresponding to each string length value, respectively combining to obtain the mask corresponding to each sentence .

204. Using a mask to construct a mask list corresponding to the original dialogue material;

In this embodiment, the corresponding mask list of the original dialogue material is constructed by using masks.

In practical applications, the human-computer dialogue system uses the preset sentence processing function to process the original dialogue data according to the original dialogue data obtained from the external input. The sentence processing function here uses the set basic punctuation marks (such as full stop, exclamation point, etc.), and then recognize and divide the punctuation marks in the original dialogue data, realize the sentence processing operation, obtain multiple sentences, and then use the string function to perform statistical calculations on the number of characters in the obtained sentences, and get each The string length of the sentence, and then use the preset first mask value of the same length corresponding to each string length to form the mask corresponding to each sentence, so that the mask corresponding to the original dialogue material constructed by the mask code list, for example, set the first mask value element to 0, and initialize a mask list with the same length as the corpus according to the obtained string length, in which the mask elements are all 0, such as the original corpus is "I want to modify my account password" and the length is 10, then the generated corresponding mask list is a list [0,0,0,0,0,0,0,0,0,0,0] with a length of 10.

205. According to the preset selection rules, respectively determine the adjustment position corresponding to each segment of the mask in the mask list;

In this embodiment, according to the mask list obtained by the initialization process, the preset selection rules are used to select the positions of each segment of the mask. The mask corresponds to the first mask value at the selected position.

206. Use the preset second element value to replace the first element value at the adjusted position to obtain an adjusted mask list;

In this embodiment, the first mask value at the selected position is correspondingly adjusted by selecting each segment of the mask, and then the second mask value is used to replace the first mask value at the selected position. After the replacement, the original position is not changed. Change, so that the mask list after numerical transformation can be obtained.

In practical applications, the man-machine dialogue system uses the preset selection rules to determine the selection position corresponding to each segment of the mask in the mask list according to the mask list obtained from the initialization process, and adjusts the selected position on the selected mask. For example, the selection rule here adopts the normal sentence reading order, selects two characters at a time from left to right, selects the first two mask elements of the sentence for the first time, and thus obtains the first two characters at the selected position A mask value [0,0], and then use the preset second mask value to replace the first mask value at the corresponding position selected by adjustment, so as to obtain the mask list after numerical transformation, such as setting the second The mask value is 1, and the mask element 0 position corresponding to the selected character is changed to 1, and the list of the original corpus according to the previous embodiment then becomes [1,1,0,0,0,0,0,0 ,0,0].

207. Based on the adjusted mask list, construct the auxiliary dialogue corpus corresponding to the original dialogue corpus, and identify the second intent category of each statement in the auxiliary dialogue corpus through the intention recognition model;

208. Detect the degree of difference between the first intent category and the second intent category to obtain a first detection result, and based on the first detection result, select a sentence that satisfies the preset difference condition from the attached dialogue data as the final out-of-collection corpus;

209. Mark the final out-of-set corpus as out-of-set intent, and use the original dialogue corpus and the final out-of-set corpus to train the intent recognition model to obtain a new intent recognition model.

In the embodiment of the present application, the original dialogue material is divided into sentences, and the length of the string is calculated to obtain the string length of each sentence, and then the preset first element value that is the same as the length of each string is used to combine into a mask corresponding to each sentence, and use the obtained mask to construct a mask list corresponding to the original dialogue material, so as to determine the position selected for adjustment corresponding to each segment of the mask in the mask list based on the preset selection rules, and The first element value at the corresponding selected position is respectively selected, and the preset second element value is used to replace the first element value at the selected position, thereby obtaining an adjusted mask list. Compared with the prior art, the present application uses a mask list to process the sentences of the original dialogue data. After calculating the string length of each sentence, the first element value is used to construct a new mask, and then the preset selection is used. The rule replaces the selected first element value with the second element value, so as to obtain the adjusted mask list, thereby avoiding the random insertion, deletion, exchange and other operations and processing methods of the original dialogue material in the prior art, and then the original dialogue material The processing is more in line with the laws of mathematical operations, and can obtain more corpus outside the training set.

Please refer to Figure 3, the third embodiment of the method for updating the intent recognition model in the embodiment of the present application includes:

301. Obtain the original dialogue material, and identify the first intent category of each sentence in the original dialogue material through a preset intent recognition model;

302. Initialize the mask list corresponding to the original dialogue material, and adjust a set of element values in the mask list according to preset selection rules to obtain an adjusted mask list;

303. Determine the position of the first element value of each segment of the mask in the mask list after numerical transformation, and respectively select the word in the original corpus that has the same position as the first element value in each sentence;

In this embodiment, the system uses the first mask value to identify the function, and this function uses the preset first mask value as the identification identifier to traverse the entire sentence through function traversal, and identifies the corresponding first mask value in the mask list. A mask value and its corresponding position, respectively determine the first mask value and its corresponding position of each mask in the mask A list after numerical transformation, and according to the initial mask list and the original dialogue material between each statement The corresponding relationship of each corpus in the original dialogue corpus and the word in the same position as the first mask are obtained.

304. According to the order of the position of the first element value, each statement is correspondingly selected in order and combined in order to obtain a new statement;

In this embodiment, according to the words in the same position of each corpus and the first mask in the obtained original dialogue corpus, according to the order of the original position of the first element value, the corresponding selected words of each sentence are sequentially combined to obtain corresponding new statement.

305. Splicing each new sentence to obtain the auxiliary dialogue material corresponding to the original dialogue material;

In this embodiment, the corresponding new sentences are spliced and combined according to the sentence combination method of the original dialogue materials to obtain the auxiliary dialogue materials corresponding to the original dialogue materials, and then the obtained auxiliary dialogue materials are passed through the intention recognition model to be trained. Intent recognition, identifying a second intent category corresponding to each sentence in the attached dialogue data.

In practical applications, the man-machine dialogue system uses the first mask value recognition function to identify the position of the first mask value of each mask in the mask list after numerical transformation, and selects the position of each mask in the original dialogue data respectively. A word with the same position as the first mask value, such as [1,1,0,0,0,0,0,0,0,0,0] obtained in the previous embodiment, is identified and selected to obtain the third position Go to the first mask value at the 10th position and its corresponding position, and then according to the original dialogue material as "I want to modify my account password", select the character corresponding to the mask list position element of 0 in the corpus, and then Splice the selected words according to the reading order of the original sentence to obtain the attached dialogue material corresponding to the original dialogue material. For example, a new sentence can be formed by selecting and splicing the original dialogue material. Here you can get "Change my account password" ”, and then use the intention recognition model to be trained to perform intent recognition on the obtained auxiliary dialogue data, and identify the intent category corresponding to each sentence in the auxiliary dialogue data, so as to obtain the second intention category.

306. Detect the degree of difference between the first intent category and the second intent category to obtain a first detection result, and based on the first detection result, select a sentence that satisfies the preset difference condition from the attached dialogue corpus as the final out-of-collection corpus;

307. Mark the final out-of-set corpus as out-of-set intent, and use the original dialogue corpus and the final out-of-set corpus to train the intent recognition model to obtain a new intent recognition model.

In the embodiment of the present application, based on the adjusted mask list, by respectively determining the first element value position of each mask in the mask list after numerical transformation, and selecting each sentence in the original corpus and the first element value position The same word, and then according to the order of the position of the first element value, sequentially combine the words corresponding to each sentence to obtain a new sentence, splice the new sentence, and obtain the attached dialogue material corresponding to the original dialogue material , so that the second intent category is obtained by performing intent recognition processing on the auxiliary dialogue data. Compared with the prior art, the present application converts the corresponding element position into the original dialogue sentence combination sentence by determining the position of the corresponding element in the adjusted mask list, and then realizes the intent recognition processing of the combined sentence to obtain the second intent category , by using the mask list processing method for the original dialogue material, the adjusted mask list is converted into the corresponding words of the original sentence, so as to realize the intention recognition and processing operation of the attached dialogue material, and the processing method is simple and faster. The desired intent recognition result.

Please refer to Figure 4, the fourth embodiment of the method for updating the intent recognition model in the embodiment of the present application includes:

401. Obtain the original dialogue material, and identify the first intent category of each sentence in the original dialogue material through a preset intent recognition model;

402. Initialize the mask list corresponding to the original dialogue material, and adjust a set of element values in the mask list according to preset selection rules to obtain an adjusted mask list;

403. Based on the adjusted mask list, construct the auxiliary dialogue corpus corresponding to the original dialogue corpus, and identify the second intent category of each statement in the auxiliary dialogue corpus through the intention recognition model;

404. If the first detection result shows that the first intent category and the second intent category are the same, use the auxiliary dialogue material as the initial dialogue material;

In this embodiment, the judgment of the intention category is performed on the two obtained intention categories, and it is judged whether the intention categories of their corresponding sentences are the same. If the same, then use the auxiliary dialogue data of this round of intent recognition as the initial dialogue data of a new round.

405. If the first detection result is that the first intent category and the second intent category are different, use the original dialogue material as the initial dialogue material;

In this embodiment, if it is judged that the first detection result is that the first intent category and the second intent category are different, the current round of original dialogue material is used as a new round of initial dialogue material.

406. Carry out the next round of corresponding mask list value conversion, intent identification, and difference degree detection on the initial dialogue material until the initial dialogue material satisfies the preset exit condition and stop to obtain a new first detection result;

In this embodiment, the next round of corresponding mask list value transformation, intention recognition and difference degree detection is performed on the initial dialogue material case, and the processing is stopped until the initial dialogue material satisfies the preset exit condition, and a new first detection result is obtained. The preset exit condition here is that when all the words in each segment of the corpus in the original dialogue corpus have been traversed and selected by the preset selection rules, the processing exit condition is met.

In practical applications, the human-computer dialogue system judges whether the intent types of the first intent category and the second intent category are the same. For a new round of initial dialogue material, if the detection results have different intent categories, the initial dialogue material for this round of intent recognition will be used as the next round of new initial dialogue material, such as the auxiliary dialogue material obtained in the previous embodiment. My account password", and the initial dialogue material is "I want to change my account password", the intention identification tags of the two should be the same, and the probability will be very close, so the first two elements of the mask list will be Keep it as 1. It can be considered that the element in the mask list is 1, which means that the corresponding character does not contribute much to the semantics in the original sentence, and then the attached dialogue data, namely "change my account password" is used as the next round If it is judged that the intent categories of the two are not the same, the initial dialogue material, namely "I want to change my account password" will be used as the new initial dialogue material for the next round, and then the new initial dialogue material will be obtained. The next round of corresponding mask value conversion, intent recognition, and difference degree comparison cycle processing operations until all the words in the initial dialogue data have been traversed and replaced. For example, the new initial dialogue material is "Change my account password", Then select the second and third characters for loop processing, and then when the two characters of "password" are replaced, the exit condition is satisfied, and then a new first detection result is obtained.

407. According to the first detection result, sequentially determine whether each first intent category is the same as each corresponding second intent category;

In this embodiment, according to the first detection result obtained through processing, it is sequentially judged whether the degree of difference between the first intention category corresponding to each piece of corpus in the original dialogue corpus and the second intention category corresponding to the subsidiary dialogue corpus are the same, and the preset difference here is Whether the degree is the same means that after two corresponding corpora are identified for intent categories, when the intent categories of the two are the same and the probability of both is very high (generally greater than 80%), the degree of difference is the same.

408. If they are not the same, determine that the sentence corresponding to the second intent category different from the first intent category satisfies the preset first difference condition and serves as a candidate out-of-collection corpus;

In this embodiment, the preset first difference condition here is to perform difference detection based on the corresponding probabilities of the labels obtained from the intention recognition of the original dialogue data and the labels and their probabilities obtained from the intention recognition of the attached dialogue materials, and obtain a two A difference condition where the difference in probability is large. If the intent category of the sentence corresponding to the second intent category different from the first intent category is different, it may be determined that the sentence corresponding to the second intent category satisfies the preset first difference condition and is used as a candidate out-of-set corpus.

In practical applications, the man-machine dialogue system judges the first intention category corresponding to each segment of the original dialogue corpus and the second intention category corresponding to the processed auxiliary dialogue corpus according to the first detection result obtained through processing, and judges the two Whether the categories corresponding to the two are the same category and the corresponding probability is relatively high. If the judgment result is that the two intention categories are not the same as the statement corresponding to the second intention category, it can be determined that the statement corresponding to the second intention category satisfies the preset first The difference difference condition is used as an alternative out-of-set corpus.

409. Identify the third intent category of each sentence in the corpus outside the candidate set by using the intent recognition model;

In this embodiment, according to the candidate out-of-collection corpus obtained in step 408, use the intention recognition model to be trained this time to perform the intention label and corresponding probability intention recognition processing on each sentence in the candidate out-of-collection corpus obtained from the selection process , and then the third intent category of each sentence in the candidate foreign corpus can be obtained.

410. Detect the degree of difference between the first intent category and the third intent category, obtain the second detection result, and select the sentences satisfying the preset second difference condition from the corpus outside the candidate set as the final set according to the second detection result foreign language material;

In this embodiment, the first intent category obtained by performing intent recognition on the basis of the original dialogue data and the third intent class obtained by performing intent recognition on the alternative foreign corpus, use intent recognition to obtain the labels corresponding to each piece of corpus and their corresponding probabilities Carry out the degree of difference detection, and then obtain the second detection result, and according to the obtained second detection result, select from the candidate foreign language material, so as to obtain the sentence that satisfies the preset second difference condition as the final foreign language material, here The preset second difference condition is to perform difference detection based on the corresponding probabilities of the tags obtained by the intention recognition of the original dialogue data, and the tags and their probabilities obtained by the intent recognition of the alternative foreign language data, and compare them to obtain a probability difference between the two The difference condition with a large difference.

In practical applications, the target human-computer dialogue system performs intent recognition processing on the original dialogue data to obtain the first intent category and the third intent category obtained from the alternative corpus. Compare the difference between the corresponding labels and their corresponding probabilities, and obtain the detection result of the difference comparison between the two as the second detection result. Then, according to the second detection result, select from the candidate foreign corpus, and use the second detection result The label corresponding to each sentence in the sentence and its corresponding probability, and the label and its corresponding probability of each sentence in the original dialogue corpus are compared to compare the probability difference, and the corpus that meets the larger difference condition is obtained, and then the corpus that satisfies the preset second The sentences of the two different conditions are used as the final extra-corpus corpus.

411. Mark the final out-of-set corpus as out-of-set intent, and use the original dialogue corpus and the final out-of-set corpus to train the intent recognition model to obtain a new intent recognition model.

In the embodiment of the present application, by detecting the degree of difference between the obtained first intention category and the second intention category, if the detection results are the same, the auxiliary dialogue material is used as a new initial dialogue material, and if the detection results are different, the initial dialogue material is used as The new initial dialogue data, and then use the preset selection rules to perform the cycle processing of mask transformation, intent recognition, and difference detection. When the words in the sentence are replaced, the first detection result is obtained, and then based on the first detection result, use Whether the preset difference degree is the same is judged by the first and second intent categories, and the sentence corresponding to the second intent category with different judgment results is obtained as the candidate foreign language data, and then through the intention identification and difference of the candidate foreign language data After detection and judgment of difference conditions, the final out-of-collection corpus is obtained. Compared with the prior art, the present application can identify all the out-of-set corpus composed of the original dialogue material words of the training, so as to obtain a candidate out-of-set corpus. This method can obtain as much out-of-set corpus as possible, avoiding The additional comparative experiment of realizing the rejection function by setting the confidence threshold of the existing method is eliminated, which shortens the time required for the intention recognition model to go online and the post-optimization, and then obtains an intention recognition model with higher accuracy of intention recognition.

Please refer to Figure 5, the fifth embodiment of the method for updating the intent recognition model in the embodiment of the present application includes:

501. Obtain the original dialogue material, and identify the first intent category of each sentence in the original dialogue material through a preset intent recognition model;

502. Initialize the mask list corresponding to the original dialogue material, and adjust a set of element values in the mask list according to preset selection rules to obtain an adjusted mask list;

503. Based on the adjusted mask list, construct the auxiliary dialogue corpus corresponding to the original dialogue corpus, and identify the second intent category of each statement in the auxiliary dialogue corpus through the intention recognition model;

504. Detect the degree of difference between the first intent category and the second intent category, obtain a first detection result, and sequentially determine whether each first intent category and each corresponding second intent category are the same according to the first detection result;

505. If they are not the same, determine that the sentence corresponding to the second intent category different from the first intent category satisfies the preset first difference condition and serves as an alternative foreign language material;

506. Recognize the third intent category of each sentence in the corpus outside the candidate set by using the intent recognition model;

507. Detect the degree of difference between the first intent category and the third intent category, obtain a second detection result, and judge whether the degree of difference between each first intent category and each corresponding third intent category is based on the second detection result Greater than the preset difference degree threshold;

In this embodiment, the second detection result is obtained by detecting the degree of difference between the obtained first intention category and the third intention category, and then according to the second detection result, each first intention category and alternative For the third intent category corresponding to the out-of-set corpus, it is judged whether the difference degree of the intent category is greater than a preset difference degree threshold.

508. If it is greater, determine that the sentence corresponding to the third intent category whose degree of difference is greater than the preset difference degree threshold satisfies the second difference condition and is used as the final extra-corpus corpus;

In this embodiment, if it is judged that the intent category of the sentences corresponding to the two is greater than the preset difference degree threshold, it can be determined that the sentence corresponding to the third intent category satisfies the second difference condition and is used as the final out-of-collection corpus, where the preset second The difference condition is based on the corresponding probabilities of the tags obtained from the intent recognition of the original dialogue data, and the tags and their probabilities obtained from the intent recognition of the alternative corpus, and judging whether the two corresponding corpus intent categories are different and greater than the preset difference degree Threshold (here, the difference degree threshold is generally set to 80%, and then it is judged that the intention labels of the two are different and the probability value is relatively large) as the difference detection result.

In practical applications, the human-computer dialogue system judges the first intention recognition model obtained by recognizing the intention of each piece of corpus in the original dialogue corpus according to the second detection result, and it is the same as the corresponding original corpus of dialogue corpus in the candidate foreign corpus. After processing the third intention category obtained by intention recognition, judge whether the corresponding intention category outside the candidate set and the corresponding intention category of the original corpus are greater than the preset difference degree threshold. If the probability is greater than a certain difference degree threshold, select the candidate out-of-set corpus corresponding to these different third intent categories as the final out-of-set corpus obtained from model training.

509. Mark the final out-of-set corpus as out-of-set intent, and use the original dialogue corpus and the final out-of-set corpus to train the intent recognition model to obtain a new intent recognition model.

In the embodiment of the present application, by detecting the degree of difference between the third intent category obtained from the intention recognition of the candidate foreign language data and the first intent category obtained from the original dialogue data, and then selecting the second detection result obtained from the detection, from From the candidate foreign corpus, the sentences satisfying the preset second difference condition are selected as the final foreign corpus. Compared with the existing technology, the selection of the candidate out-of-set corpus and then the second difference condition can avoid the problem that the out-of-set corpus constructed by the traditional data-enhanced corpus synthesis method may be entangled with the normal training corpus. The code list and the existing intent recognition model ensure the out-of-set nature of the generated corpus, so that the trained intent recognition model has the ability to reject recognition while reducing the impact on the normal corpus recognition effect.

The method for updating the intent recognition model in the embodiment of the present application is described above, and the device for updating the intent recognition model in the embodiment of the present application is described below. Please refer to FIG. 6. An embodiment of the device for updating the intent recognition model in the embodiment of the present application includes: corpus The obtaining module 601 is used to obtain the original dialogue material, and through the preset intention recognition model, identifies the first intent category of each sentence in the original dialogue material; the mask construction module 602 is used to initialize the mask list corresponding to the original dialogue material , and adjust a group of element values in the mask list according to the preset selection rules to obtain the adjusted mask list; the second intent module 603 is used to construct an auxiliary dialogue corresponding to the original dialogue material based on the adjusted mask list corpus, and identify the second intent category of each statement in the attached dialogue material through the intent recognition model; the final out-of-set module 604 is used to detect the degree of difference between the first intent category and the second intent category to obtain the first detection result , and based on the first detection result, select sentences satisfying the preset difference conditions from the attached dialogue materials as the final out-of-set corpus; the corpus training module 605 is used to mark the final out-of-set corpus as out-of-set intent, and use the original dialogue corpus and the final extra-set corpus to train the intent recognition model to obtain a new intent recognition model.

In the embodiment of the present application, compared with the prior art processing method for data enhancement is random insertion, deletion, exchange and other operations, the present application uses the mask list in the computer data processing method to realize the statement of the original dialogue material Processing, constructing a mask list and replacing relevant values to obtain candidate out-of-collection corpus that meets the first difference condition. The processing method of the corpus uses preset mathematical laws to make the generated candidate out-of-collection corpus more satisfy the category of out-of-collection intent , this method can avoid the additional comparison experiment of realizing the recognition rejection function by setting the confidence threshold, shorten the cycle of launching and optimizing the intention recognition model, and at the same time make the trained intention recognition model have the recognition rejection ability, and reduce the The impact on the recognition effect of normal corpus improves the update accuracy of the intent recognition model.

Please refer to Fig. 7, another embodiment of the device for updating the intention recognition model in the embodiment of the present application includes: a corpus acquisition module 601, which is used to obtain the original dialogue materials, and recognize each sentence in the original dialogue materials through the preset intention recognition model The first intent category of the mask construction module 602, which is used to initialize the mask list corresponding to the original dialogue material, and adjust a set of element values in the mask list according to preset selection rules to obtain an adjusted mask list; The second intention module 603 is used to construct the auxiliary dialogue material corresponding to the original dialogue material based on the adjusted mask list, and identify the second intent category of each statement in the auxiliary dialogue material through the intent recognition model; the final out-of-set module 604 , which is used to detect the degree of difference between the first intent category and the second intent category to obtain the first detection result, and based on the first detection result, select the sentence satisfying the preset difference condition from the attached dialogue corpus as the final out-of-set corpus; The corpus training module 605 is used to mark the final out-of-set corpus as out-of-set intent, and use the original dialogue corpus and the final out-of-set corpus to train the intent recognition model to obtain a new intent recognition model.

Specifically, the mask construction module 602 includes: a character calculation unit 6021, which is used to perform sentence segmentation processing on the original dialogue material to obtain a plurality of sentences, and calculate the string length of each sentence of the original dialogue material respectively; a mask combination unit 6022 , for combining the preset first element value with the same length as each string to form a mask corresponding to each sentence; the corpus correspondence unit 6023 is used for constructing a mask list corresponding to the original dialogue corpus by using the mask.

Specifically, the mask construction module 602 also includes: an element selection unit 6024, which is used to respectively determine the adjustment position corresponding to each segment of the mask in the mask list according to preset selection rules; an element replacement unit 6025, which is used to adopt the preset The two-element value replaces the first element value at the adjusted position to obtain the adjusted mask list.

Specifically, the second intention module 603 includes: a word selection unit 6031, which is used to respectively determine the position of the first element value of each mask in the mask list after numerical transformation, and select each sentence in the original corpus with the first The single word that element value position is identical; Sequential combination unit 6032, is used for according to the order of first element value position, each sentence is correspondingly selected single word carries out order combination respectively, correspondingly obtains new sentence; Sentence splicing unit 6033, is used for Each new sentence is spliced to obtain the attached dialogue material corresponding to the original dialogue material.

Specifically, the final out-of-set module 604 also includes: if the first detection result is that the first intent category and the second intent category are the same, then use the auxiliary dialogue material as the initial dialogue material; if the first detection result is that the first intent category and the second If the two intent categories are different, the original dialogue corpus is used as the initial dialogue corpus; the next round of corresponding mask list value transformation, intent recognition and difference degree detection is performed on the initial dialogue corpus until the initial dialogue corpus meets the preset exit conditions. A new first detection result is obtained.

Specifically, the final out-of-collection corpus 604 includes: a difference judging unit 6041, configured to sequentially judge whether each first intent category and each corresponding second intent category are the same according to the first detection result; an alternative selection unit 6042, using If not the same, then determine that the sentence corresponding to the second intent category different from the first intent category satisfies the preset first difference condition and is used as an alternative foreign language material; the alternative identification unit 6043 is used to identify the alternative through the intention recognition model The third intention category of each sentence in the foreign language material; the final selection unit 6044 is used to detect the degree of difference between the first intention category and the third intention category, obtain the second detection result, and according to the second detection result, from In the candidate foreign corpus, the sentences satisfying the preset second difference condition are selected as the final foreign corpus.

Specifically, the final out-of-set unit 6044 includes judging whether the degree of difference between each first intention category and each corresponding third intention category is greater than a preset difference degree threshold according to the second detection result; if greater, determine that the degree of difference is greater than Sentences corresponding to the third intent category with a preset difference degree threshold satisfy the second difference condition and are used as the final out-of-set corpus.

On the basis of the previous embodiment, this embodiment describes in detail the specific functions of each module and the unit structure of some modules. Through this device, the mask list is obtained by processing the original dialogue data with the mask element, and then through the element The cycle processing of replacement, intent recognition, and difference detection is performed until the words in each sentence in the original dialogue data are processed accordingly, and the candidate out-of-set corpus is obtained, and then the intention recognition and difference conditions are performed on the candidate out-of-set corpus Judging the final extra-corpus corpus, and training it with the original dialogue corpus using basic machine learning methods to obtain a rejection intent recognition model with the rejection function, which can not only speed up the training speed of the model, but also avoid the phenomenon of corpus entanglement, and obtain An intent recognition model with higher efficiency for normal corpus intent recognition.

The above Figures 6 and 7 describe in detail the intent recognition model updating device in the embodiment of the present application from the perspective of modular functional entities, and the following describes the intent recognition model updating device in the embodiment of the present application in detail from the perspective of hardware processing.

FIG. 8 is a schematic structural diagram of an intention recognition model updating device provided by an embodiment of the present application. The intent recognition model updating device 800 may have relatively large differences due to different configurations or performances, and may include one or more processors (central processing units (CPU) 810 (for example, one or more processors) and memory 820, one or more storage media 830 for storing application programs 833 or data 832 (for example, one or more mass storage devices). Wherein, the memory 820 and the storage medium 830 may be temporary storage or persistent storage. The program stored in the storage medium 830 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations for the intent recognition model update device 800 . Furthermore, the processor 810 may be configured to communicate with the storage medium 830 , and execute a series of instruction operations in the storage medium 830 on the intent recognition model update device 800 .

The intent recognition model update device 800 may also include one or more power sources 840, one or more wired or wireless network interfaces 850, one or more input and output interfaces 860, and/or, one or more operating systems 831, such as Windows Serve, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art can understand that the structure of the intention recognition model updating device shown in FIG. Different component arrangements.

The present application also provides an intent recognition model update device, the computer device includes a memory and a processor, and computer readable instructions are stored in the memory, and when the computer readable instructions are executed by the processor, the processor executes the intent in the above-mentioned embodiments Steps to identify the model update method.

The present application also provides a computer-readable storage medium, the computer-readable storage medium may be a non-volatile computer-readable storage medium, the computer-readable storage medium may also be a volatile computer-readable storage medium, and the computer-readable storage medium may be Instructions are stored in the read storage medium, and when the instructions are run on the computer, the computer is made to execute the steps of the method for updating the intention recognition model.

Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

The application can be used in numerous general purpose or special purpose computer system environments or configurations. Examples: personal computers, server computers, handheld or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, etc. This application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.

As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, and are not intended to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still understand the foregoing The technical solutions described in each embodiment are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the application.

Claims

A method for updating an intent recognition model, wherein the method for updating an intent recognition model includes:

Obtaining the original dialogue material, and identifying the first intent category of each statement in the original dialogue material through a preset intent recognition model;

Initializing a mask list corresponding to the original dialogue material, and adjusting a group of element values in the mask list according to preset selection rules to obtain an adjusted mask list;

Based on the adjusted mask list, construct the auxiliary dialogue material corresponding to the original dialogue material, and identify the second intent category of each sentence in the auxiliary dialogue material through the intent recognition model;

Detecting the degree of difference between the first intent category and the second intent category to obtain a first detection result, and based on the first detection result, selecting a sentence that satisfies a preset difference condition from the attached dialogue material as Final extra-corporate corpus;

Marking the final out-of-set corpus as out-of-set intent, and using the original dialogue corpus and the final out-of-set corpus to train the intent recognition model to obtain a new intent recognition model.
The method for updating an intent recognition model according to claim 1, wherein said initializing a mask list corresponding to said original dialogue material comprises:

Sentence processing is performed on the original dialogue data to obtain multiple sentences, and the string length of each sentence in the original dialogue data is calculated respectively;

Using a preset first element value with the same length as each of the character strings to form a mask corresponding to each statement;

A mask list corresponding to the original dialogue material is constructed by using the mask.
The method for updating an intention recognition model according to claim 2, wherein said adjusting a group of element values in said mask list according to preset selection rules, and obtaining an adjusted mask list includes:

According to preset selection rules, respectively determine the adjustment position corresponding to each mask in the mask list;

The first element value at the adjusted position is replaced by the preset second element value to obtain an adjusted mask list.
The method for updating an intent recognition model according to claim 3, wherein, based on the adjusted mask list, constructing the auxiliary dialogue material corresponding to the original dialogue material comprises:

Respectively determine the first element value position of each mask in the mask list after the numerical transformation, and respectively select the single word in each sentence in the original corpus that has the same position as the first element value;

According to the order of the position of the first element value, each sentence is respectively combined in sequence with the corresponding selected words to obtain a new sentence;

Each of the new sentences is spliced to obtain the auxiliary dialogue material corresponding to the original dialogue material.
The method for updating an intention recognition model according to claim 1, wherein, after the degree of difference between the first intention category and the second intention category is detected and the first detection result is obtained, further comprising:

If the first detection result is that the first intent category is the same as the second intent category, using the auxiliary dialogue material as the initial dialogue material;

If the first detection result is that the first intent category is different from the second intent category, then using the original dialogue material as initial dialogue material;

Carry out the next round of corresponding mask list numerical transformation, intent recognition and difference degree detection on the initial dialogue material until the initial dialogue material meets the preset exit condition, and a new first detection result is obtained.
The method for updating an intent recognition model according to claim 1, wherein the difference condition includes a first difference condition and a second difference condition, and based on the first detection result, selecting from the attached dialogue data that satisfies the predetermined The sentences with difference conditions as the final out-of-set corpus include:

According to the first detection result, sequentially determine whether each first intent category and each corresponding second intent category are the same;

If not the same, then determine that the sentence corresponding to the second intent category different from the first intent category satisfies the preset first difference condition and is used as an alternative foreign language material;

Recognizing the third intent category of each sentence in the candidate foreign corpus through the intent recognition model;

Detecting the degree of difference between the first intent category and the third intent category to obtain a second detection result, and selecting from the candidate out-of-collection corpus to satisfy the preset second difference according to the second detection result Conditional sentences are used as the final out-of-set corpus.
The method for updating the intent recognition model according to claim 6, wherein, according to the second detection result, selecting a sentence satisfying the preset second difference condition from the candidate out-of-set corpus as the final out-of-set corpus includes :

According to the second detection result, it is judged whether the degree of difference between each first intention category and each corresponding third intention category is greater than a preset difference degree threshold;

If greater, it is determined that the sentence corresponding to the third intent category whose degree of difference is greater than the preset difference degree threshold satisfies the second difference condition and is taken as the final out-of-collection corpus.
An intent recognition model updating device, comprising a memory, a processor, and computer-readable instructions stored on the memory and operable on the processor, and the processor implements the following steps when executing the computer-readable instructions :

Obtaining the original dialogue material, and identifying the first intent category of each statement in the original dialogue material through a preset intent recognition model;

Initializing a mask list corresponding to the original dialogue material, and adjusting a group of element values in the mask list according to preset selection rules to obtain an adjusted mask list;

Based on the adjusted mask list, construct the auxiliary dialogue material corresponding to the original dialogue material, and identify the second intent category of each sentence in the auxiliary dialogue material through the intent recognition model;

Detecting the degree of difference between the first intent category and the second intent category to obtain a first detection result, and based on the first detection result, selecting a sentence that satisfies a preset difference condition from the attached dialogue material as Final extra-corporate corpus;

Marking the final out-of-set corpus as out-of-set intent, and using the original dialogue corpus and the final out-of-set corpus to train the intent recognition model to obtain a new intent recognition model.
The device for updating an intention recognition model according to claim 8, wherein said initializing a mask list corresponding to said original dialogue material comprises:

Sentence processing is performed on the original dialogue data to obtain multiple sentences, and the string length of each sentence in the original dialogue data is calculated respectively;

Using a preset first element value with the same length as each of the character strings to form a mask corresponding to each statement;

A mask list corresponding to the original dialogue material is constructed by using the mask.
The device for updating an intention recognition model according to claim 9, wherein said adjusting a group of element values in said mask list according to a preset selection rule, and obtaining an adjusted mask list includes:

According to preset selection rules, respectively determine the adjustment position corresponding to each mask in the mask list;

The first element value at the adjusted position is replaced by the preset second element value to obtain an adjusted mask list.
The device for updating an intention recognition model according to claim 10, wherein, based on the adjusted mask list, constructing the auxiliary dialogue material corresponding to the original dialogue material comprises:

Respectively determine the first element value position of each mask in the mask list after the numerical transformation, and respectively select the single word in each sentence in the original corpus that has the same position as the first element value;

According to the order of the position of the first element value, each sentence is respectively combined in sequence with the corresponding selected words to obtain a new sentence;

Each of the new sentences is spliced to obtain the auxiliary dialogue material corresponding to the original dialogue material.
The device for updating the intent recognition model according to claim 8, wherein, after the degree of difference between the first intent category and the second intent category is detected and the first detection result is obtained, further comprising:

If the first detection result is that the first intent category is the same as the second intent category, using the auxiliary dialogue material as the initial dialogue material;

If the first detection result is that the first intent category is different from the second intent category, then using the original dialogue material as initial dialogue material;

Carry out the next round of corresponding mask list numerical transformation, intent recognition and difference degree detection on the initial dialogue material until the initial dialogue material meets the preset exit condition, and a new first detection result is obtained.
The device for updating an intention recognition model according to claim 8, wherein the difference condition includes a first difference condition and a second difference condition, and based on the first detection result, selecting from the attached dialogue data that satisfies the predetermined The sentences with difference conditions as the final out-of-set corpus include:

According to the first detection result, sequentially determine whether each first intent category and each corresponding second intent category are the same;

If not the same, then determine that the sentence corresponding to the second intent category different from the first intent category satisfies the preset first difference condition and is used as an alternative foreign language material;

Recognizing the third intent category of each sentence in the candidate foreign corpus through the intent recognition model;

Detecting the degree of difference between the first intent category and the third intent category to obtain a second detection result, and selecting from the candidate out-of-collection corpus to satisfy the preset second difference according to the second detection result Conditional sentences are used as the final out-of-set corpus.
The device for updating the intent recognition model according to any one of claim 13, wherein, according to the second detection result, a sentence that satisfies the preset second difference condition is selected from the candidate foreign language material as the final Extra-corporate materials include:

According to the second detection result, it is judged whether the degree of difference between each first intention category and each corresponding third intention category is greater than a preset difference degree threshold;

If greater, it is determined that the sentence corresponding to the third intent category whose degree of difference is greater than the preset difference degree threshold satisfies the second difference condition and is taken as the final out-of-collection corpus.
A computer-readable storage medium, wherein computer instructions are stored in the computer-readable storage medium, and when the computer instructions are run on the computer, the computer is made to perform the following steps:

Obtaining the original dialogue material, and identifying the first intent category of each statement in the original dialogue material through a preset intent recognition model;

Initializing a mask list corresponding to the original dialogue material, and adjusting a group of element values in the mask list according to preset selection rules to obtain an adjusted mask list;

Based on the adjusted mask list, construct the auxiliary dialogue material corresponding to the original dialogue material, and identify the second intent category of each sentence in the auxiliary dialogue material through the intent recognition model;

Detecting the degree of difference between the first intent category and the second intent category to obtain a first detection result, and based on the first detection result, selecting a sentence that satisfies a preset difference condition from the attached dialogue material as Final extra-corporate corpus;

Marking the final out-of-set corpus as out-of-set intent, and using the original dialogue corpus and the final out-of-set corpus to train the intent recognition model to obtain a new intent recognition model.
The computer-readable storage medium according to claim 15, wherein the initializing the mask list corresponding to the original dialogue material comprises:

Sentence processing is performed on the original dialogue data to obtain multiple sentences, and the string length of each sentence in the original dialogue data is calculated respectively;

Using a preset first element value with the same length as each of the character strings to form a mask corresponding to each statement;

A mask list corresponding to the original dialogue material is constructed by using the mask.
The computer-readable storage medium according to claim 16, wherein said adjusting a group of element values in the mask list according to preset selection rules, and obtaining the adjusted mask list includes:

According to preset selection rules, respectively determine the adjustment position corresponding to each mask in the mask list;

The first element value at the adjusted position is replaced by the preset second element value to obtain an adjusted mask list.
The computer-readable storage medium according to claim 17, wherein said constructing the auxiliary dialogue material corresponding to the original dialogue material based on the adjusted mask list comprises:

Respectively determine the first element value position of each mask in the mask list after the numerical transformation, and respectively select the single word in each sentence in the original corpus that has the same position as the first element value;

According to the order of the position of the first element value, each sentence is respectively combined in sequence with the corresponding selected words to obtain a new sentence;

Each of the new sentences is spliced to obtain the auxiliary dialogue material corresponding to the original dialogue material.
The computer-readable storage medium according to claim 15, wherein, after the degree of difference between the first intent category and the second intent category is detected and the first detection result is obtained, further comprising:

If the first detection result is that the first intent category is the same as the second intent category, using the auxiliary dialogue material as the initial dialogue material;

If the first detection result is that the first intent category is different from the second intent category, then using the original dialogue material as initial dialogue material;

Carry out the next round of corresponding mask list numerical transformation, intent recognition and difference degree detection on the initial dialogue material until the initial dialogue material meets the preset exit condition, and a new first detection result is obtained.
A device for updating an intention recognition model, wherein the device for updating an intention recognition model includes:

The corpus acquisition module is used to obtain the original dialogue material, and identify the first intent category of each statement in the original dialogue material through a preset intent recognition model;

A mask construction module, configured to initialize a mask list corresponding to the original dialogue material, and adjust a group of element values in the mask list according to preset selection rules to obtain an adjusted mask list;

The second intention module is configured to construct the attached dialogue corpus corresponding to the original dialogue corpus based on the adjusted mask list, and identify the second sentence of each sentence in the attached dialogue corpus through the intent recognition model. intent class;

The final out-of-set module is used to detect the degree of difference between the first intent category and the second intent category, obtain a first detection result, and select from the attached dialogue material based on the first detection result that satisfies The sentences with preset difference conditions are used as the final extra-corpus corpus;

The corpus training module is used to mark the final out-of-set corpus as out-of-set intent, and use the original dialogue corpus and the final out-of-set corpus to train the intent recognition model to obtain a new intent recognition model.