CN116150311A

CN116150311A - Training method of text matching model, intention recognition method and device

Info

Publication number: CN116150311A
Application number: CN202210983380.XA
Authority: CN
Inventors: 丁隆耀; 蒋宁; 吴海英; 权佳成; 李宽
Original assignee: Mashang Xiaofei Finance Co Ltd
Current assignee: Mashang Xiaofei Finance Co Ltd
Priority date: 2022-08-16
Filing date: 2022-08-16
Publication date: 2023-05-23

Abstract

The embodiment of the application discloses a training method, an intention recognition method and a device of a text matching model, wherein the method comprises the following steps: constructing a training sample set based on standard problem sentences in a knowledge base, wherein each training sample in the training sample set comprises: standard problem sentences, similar samples and heterogeneous samples; in the iterative training process of using a training sample set to carry out diversity stage on an initial text matching model, carrying out mixed coding processing based on linear interpolation on each training sample, inputting the initial text matching model, and outputting to obtain a first distance between a standard problem statement and a similar sample and a second distance between the standard problem statement and a heterogeneous sample; and adjusting model parameters of the text matching model according to the first distance, the second distance and the loss function until the loss function meets the set condition, and obtaining the text matching model after the training in the diversity stage. The method and the device are favorable for improving the accuracy of text matching.

Description

Training method of text matching model, intention recognition method and device

Technical Field

The present disclosure relates to the field of machine learning technologies, and in particular, to a training method of a text matching model, an intention recognition method and a device.

Background

The voice robot dialogue system is a dialogue Question and Answer (QA) system, in which a customer usually presents questions to a voice robot, and the voice robot presents answers to the questions presented by the customer. When a voice robot dialogue system is constructed, accurate intention recognition of questions posed by clients is a precondition for providing good-quality answers by a voice robot, and is also a difficulty of the system.

With the development of deep learning, some methods suitable for text matching have been proposed, of which there are two more classical: representation-based methods and interaction-based methods. And (3) respectively encoding the two sections of texts based on a representation method to obtain respective feature vectors, and obtaining a final matching relationship through a similarity calculation function or a related structure. Based on the interaction method, the final matching relation is obtained by carrying out interaction with different granularities (word level, phrase level and the like) on two sections of texts, then, aggregating the matching results with different granularities through a structure, and taking the matching results as a feature vector.

Regardless of the text matching method employed, a common voice robotic dialog system needs to recognize many intents of the customer. Under a cold start scene, the voice robot can only perform model training according to similar problem data input by customer service personnel in a knowledge base, training data are usually few, only a few to tens of pieces of data are required, training cannot be supported, and the text matching accuracy is low. In addition, the input of the voice robot depends on the voice recognition result at the upstream, and the voice recognition result usually has more incorrect cases such as word staggering, word writing and the like, so that the accuracy of text matching can be reduced, and the difficulty of intention recognition is further increased. How to fully utilize limited training data to improve the accuracy of text matching and reduce the adverse effect on intention recognition caused by inaccurate voice recognition results is a technical problem facing the industry.

Disclosure of Invention

The embodiment of the application aims to provide a training method of a text matching model, and an intention recognition method and device, which are used for improving the accuracy of text matching and the accuracy of intention recognition.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical scheme:

in a first aspect, a training method for a text matching model is provided, including:

constructing a training sample set based on standard problem sentences in a knowledge base, wherein each training sample in the training sample set comprises: the standard problem statement is a statement which is manually input as a standard problem, the similar sample is used for indicating a statement similar to the standard problem statement, and the heterogeneous sample is used for indicating a statement dissimilar to the standard problem statement;

in the iterative training process of using the training sample set to carry out diversity stage on an initial text matching model, carrying out mixed coding processing based on linear interpolation on each training sample, inputting the initial text matching model, and outputting to obtain a first distance between the standard problem statement and the similar sample and a second distance between the standard problem statement and the heterogeneous sample;

And adjusting model parameters of the text matching model according to the output first distance, the second distance and a loss function of the text matching model until the loss function of the text matching model meets a set condition, so as to obtain the text matching model trained in the diversity stage.

In a second aspect, there is provided an intention recognition method including:

acquiring a voice recognition statement corresponding to a voice to be recognized;

inputting the voice recognition sentences and standard problem sentences in a knowledge base into a pre-trained text matching model, and outputting the matching degree between the voice recognition sentences and the standard problem sentences; the text matching model is obtained by training according to the training method of the text matching model in the first aspect;

and determining the standard problem statement matched with the voice to be recognized as an intention recognition result based on the matching degree between the voice recognition statement and each standard problem statement.

In a third aspect, a training device for a text matching model is provided, including:

a building module, configured to build a training sample set based on standard problem sentences in a knowledge base, where each training sample in the training sample set includes: the standard problem statement is a statement which is manually input as a standard problem, the similar sample is used for indicating a statement similar to the standard problem statement, and the heterogeneous sample is used for indicating a statement dissimilar to the standard problem statement;

The diversity training module is used for inputting the initial text matching model after performing linear interpolation-based mixed coding processing on each training sample in the iterative training process of using the training sample set to perform diversity stage on the initial text matching model, and outputting to obtain a first distance between the standard problem statement and the similar sample and a second distance between the standard problem statement and the heterogeneous sample;

and the first adjusting module is used for adjusting model parameters of the text matching model according to the output first distance, the second distance and the loss function of the text matching model until the loss function of the text matching model meets a set condition, so as to obtain the text matching model trained in the diversity stage.

In a fourth aspect, there is provided an intention recognition apparatus including:

the acquisition module is used for acquiring a voice recognition statement corresponding to the voice to be recognized;

the text matching module is used for inputting the voice recognition sentences and standard problem sentences in a knowledge base into a pre-trained text matching model and outputting the matching degree between the voice recognition sentences and the standard problem sentences; the text matching model is obtained by training according to the training method of the text matching model in the first aspect;

And the intention recognition module is used for determining the standard problem statement matched with the voice to be recognized as an intention recognition result based on the matching degree between the voice recognition statement and each standard problem statement.

In a fifth aspect, there is provided an electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method according to the first aspect or the second aspect.

In a sixth aspect, there is provided a computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the method of the first or second aspect.

According to the training scheme of the text matching model, the problem that model training can only be carried out according to similar problem data in a knowledge base and training data are less is solved, data expansion is carried out on each training sample in a training sample set, and the standard problem statement and the similar samples are added into each training sample except the standard problem statement; in the iterative training process of using a training sample set to carry out diversity stage, after carrying out data enhancement on standard problem sentences, similar samples and heterogeneous samples in the training samples by adopting mixed coding processing based on linear interpolation, inputting an initial text matching model, and adjusting model parameters of the text matching model through comparison among the samples. Through carrying out data expansion to every training sample in the training sample set to because the problem that the customer put forward is various, adopt the mixed coding processing based on linear interpolation to carry out data enhancement, can promote the variety of model input data on limited training data basis, thereby can effectively promote the rate of accuracy of text matching model through the iterative training of diversity stage.

According to the intention recognition scheme provided by the embodiment of the application, the text matching model is obtained through training by using the training method of the text matching model, text matching is carried out on the voice recognition sentences corresponding to the voices to be recognized and standard problem sentences in the knowledge base, as the training method of the text matching model aims at the problem of less training data, data expansion is carried out on a training sample set, besides the standard problem sentences and similar samples thereof, heterogeneous samples of labeling problem sentences are added, data enhancement is carried out by adopting mixed coding processing based on linear interpolation in a diversity stage, diversity of model input data is improved, therefore, the accuracy of the text matching model can be improved, the trained text matching model is suitable for intention recognition scenes with diversity of the voices to be recognized, and real clients expressed by the voices to be recognized can be accurately recognized on the basis that the text matching model carries out accurate text matching on various voices to be recognized, so that the accuracy of intention recognition is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

Fig. 1 is a flow chart of a training method of a text matching model according to an embodiment of the present application;

fig. 2 is a schematic diagram of a network structure of a Triplet network according to an embodiment of the present application;

FIG. 3a is a schematic diagram of model iterative training for a baseline stage provided by one embodiment of the present application;

FIG. 3b is a schematic diagram of model iterative training at a diversity stage provided by an embodiment of the present application;

FIG. 3c is a schematic diagram of model iterative training for progressive stages provided by one embodiment of the present application;

FIG. 4 is a schematic flow diagram of a hybrid encoding process based on linear interpolation according to an embodiment of the present application;

FIG. 5 is a flow chart of an intent recognition method provided by one embodiment of the present application;

FIG. 6 is a process flow diagram of a voice robot conversation according to one embodiment of the present application;

FIG. 7 is a schematic structural diagram of a training device for text matching models according to an embodiment of the present application;

FIG. 8 is a schematic diagram of an intent recognition device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purposes, technical solutions and advantages of the present application, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

As previously described, in the business scenario of a voice robot conversation, methods applicable to text matching include representation-based methods and interaction-based methods. In recent years, research based on representation methods has focused mainly on two points: one is to employ different network structures to strengthen the encoder to get a better representation of the feature vectors, and the other is to model with different similarity calculation functions. Interaction-based methods typically focus on the way text pairs interact, enabling models to extract more efficient interaction information. However, the representation-based approach does not focus well on critical semantic information in the text, and cannot capture focus. Although the interaction-based method can cater to text matching characteristics of specific data or specific scenes, the lack of a large pre-training model to support basic semantics can lead to a reduction in the problem matching success rate of the voice robot dialog system over time.

In the business scenario of a voice robot conversation, there are many problems with text matching models employed by voice robots: first, training of text matching models is a typical small sample learning task, and similar problem data in the knowledge base of each class of voice robots is usually less. The small sample learning (Few-shot learning) refers to a scene with more categories and less training data of each category, and the model has learning and summarizing capabilities by learning from a small amount of training data. In the prior art, a data enhancement mode is generally adopted to alleviate the situation, but the difference between the enhancement data and the original data and the difference of effects brought by different data enhancement methods are not considered, the original data and the enhancement data are used for training together at the same time, and the accuracy of text matching is affected. Secondly, the data input to the voice robot during prediction comes from a voice recognition result, and voice recognition (Audio Speech Recognition, ASR) is a technology for converting human voice into text, and the voice recognition result usually comprises some wrong words and wrongly written words, which can reduce the accuracy of text matching and aggravate the difficulty of intention recognition, which is not considered in the prior art.

In view of this, the embodiments of the present application aim to provide a training method of a text matching model, and an intention recognition method and apparatus, so as to solve the problem that the accuracy of text matching is low due to limited training data; further solves the problems of inaccurate text matching and difficult intention recognition caused by inaccurate input voice recognition results.

It should be understood that, the training method and the intention recognition method of the text matching model provided in the embodiments of the present application may be executed by an electronic device or software installed in the electronic device, and may specifically be executed by a terminal device or a server device. The training method and the intention recognition method of the text matching model may be performed by the same electronic device, or may be performed by different electronic devices.

The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.

Referring to fig. 1, a flowchart of a training method of a text matching model according to an embodiment of the present application is provided, where the method may include:

s101, constructing a training sample set based on standard problem sentences in a knowledge base, wherein each training sample in the training sample set comprises: the standard question statement, a homogeneous sample of the standard question statement, and a heterogeneous sample of the standard question statement.

Original corpus data in a text format is usually input in advance by a person (customer service personnel) in a knowledge base, and the original corpus data comprises a certain number of standard problem sentences and a certain number of similar sentences of each standard problem sentence, as mentioned above, a voice robot in the prior art can only perform model training according to the similar problem data in the knowledge base, and the training data is less. Aiming at the problems, the embodiment of the application provides that a certain number of dissimilar sentences (or referred to as irrelevant sentences) of each standard problem sentence is added in a knowledge base, and a training sample set for model iterative training is constructed based on the three sentences. In this embodiment of the present application, each training sample is composed of three sentences, and a training sample may be formed from a standard problem sentence, a similar sentence of the standard problem sentence, and an irrelevant sentence of the standard problem sentence, where the standard problem sentence may also be referred to as an anchor sentence as a reference sample, the similar sentence of the standard problem sentence is a similar sample (positive sample), and the dissimilar sentence of the standard problem sentence is a heterogeneous sample (negative sample), so that, for convenience in distinguishing, in this embodiment, a training sample composed of the standard problem sentence, the similar sentence of the standard problem sentence, and the dissimilar sentence of the standard problem sentence manually entered in a knowledge base is referred to as a first class training sample, and it may be understood that the constructed training sample set generally includes the first class training sample.

The data expansion is a common method for solving the problem of training a small sample, and because the original corpus data in a knowledge base is very rare, the effective data volume is improved by adopting a plurality of data expansion methods in the embodiment of the application, but different data expansion methods have different characteristics. The embodiment of the invention can adopt the following data expansion method:

the first data expansion method is to add punctuation marks to the original corpus data for short punctuation expansion.

Punctuation marks are added into the original corpus data; |), adding a set number of punctuation marks at random positions of the sentence to be processed to obtain the sentence with the punctuation expanded. The set number can be selected arbitrarily in the numerical range of 1-text length/3, if the text length of the sentence to be processed is less than or equal to 3, the punctuation mark adding process is not performed. Question marks are typically not considered in punctuation expansion because they have some extra semantics. For example, assuming that the sentence to be processed is "i want to go to your company", the punctuation expanded sentence may be "i want to go to your company". Driver). Punctuation expansion is data expansion aiming at similar sentences and non-related sentences of standard problem sentences, and data expansion is not usually carried out on the standard problem sentences, namely anchor sentences, namely the sentences to be processed are similar sentences or dissimilar sentences of the standard problem sentences.

Through punctuation expansion, the training sample set may include a second class training sample, where the second class training sample is formed by a standard question sentence (reference sample), a punctuation expansion similar sentence (similar sample) obtained by adding punctuation marks to a similar sentence of the standard question sentence, and a punctuation expansion dissimilar sentence (heterogeneous sample) obtained by adding punctuation marks to a dissimilar sentence of the standard question sentence.

And secondly, the data expansion method performs EDA (Easy Data Augmentation for Text Classification Tasks) based simple data expansion of text classification tasks on the original corpus data, and is called noise expansion for short.

Adding noise into the original corpus data in one or any combination of the following ways to be processed to obtain a noise-added expanded sentence:

mode 1, synonym or homonym substitution (SR, synonyms reproduction)

Randomly extracting at least one non-stop word from the sentence to be processed, and carrying out corresponding replacement in the sentence to be processed by using synonyms or homonyms corresponding to the non-stop words. Specifically, irrespective of stop words (stop words), n words are randomly extracted from the sentence to be processed, and then synonyms or homophones are randomly extracted from the synonym dictionary or homophones to be replaced.

Mode 2 Random Insertion (RI)

Performing at least one random insertion in the statement to be processed: randomly extracting a non-stop word, and inserting synonyms corresponding to the non-stop word into random positions of the sentences to be processed. Specifically, regardless of stop words (stop words), a word is randomly extracted, then a random position inserted into the sentence to be processed is randomly selected from the synonym set of the word, and the process can be repeated n times.

Mode 3 Random Switching (RS)

Performing at least one random exchange in the statement to be processed: two words are randomly selected for position exchange. Specifically, two words are randomly selected in the sentence to be processed for position exchange, and the process can be repeated n times.

Mode 4 Random Delete (RD)

And randomly deleting each word in the sentence to be processed based on the set probability (for example, the set probability is p).

For example, assuming that the sentence to be processed is "i want to go to your company", the sentence after the noise addition expansion (assuming that the way 4 is randomly deleted) may be "i want to go to your company". The noise adding expansion is also the data expansion aiming at the similar sentences and the dissimilar sentences of the standard problem sentences, which is the same as punctuation expansion, and the data expansion is not usually carried out on the standard problem sentences, namely the anchor sentences, namely the sentences to be processed are the similar sentences or the dissimilar sentences of the standard problem sentences.

Through the noise adding expansion, the training sample set may include a third class training sample, where the third class training sample is formed by a standard problem sentence (reference sample), a noise adding expansion similar sentence (similar sample) obtained by performing EDA-based noise adding processing on a similar sentence of the standard problem sentence, and a noise adding expansion dissimilar sentence (heterogeneous sample) obtained by performing EDA-based noise adding processing on a dissimilar sentence of the standard problem sentence, where it is understood that the third class training sample is obtained by performing EDA-based noise adding processing on a similar sentence and a dissimilar sentence of the standard problem sentence in the first class training sample.

The noise adding expansion is simple and effective, and in view of the fact that the conditions of words are omitted, synonyms or homophones are more, the mode 1 and the mode 4 are taken as main modes in the noise adding expansion in the embodiment of the application. The noise adding expansion has larger text change of the sentence to be processed, and some error data is added while fully expanding the data, so that a difficult sample is formed, and the situation that the expression of the spoken language and the like is not standard under the service scene of the robot voice dialogue can be simulated exactly, and the situation that errors, wrongly written characters and the like are inaccurate in the voice recognition result can be simulated.

Other data expansion methods, such as back-translation methods, may be used in implementations to increase the variety of training samples in the training sample set. The back-translation method is to translate the data to be processed into a text with a set language, and then translate the translated text with the set language back into the text with the original language, for example, the sentence to be processed of the Chinese text can be translated into an English text, and then translate the English text back into the Chinese text, so as to obtain the sentence after back-translation expansion. For example, assuming that the sentence to be processed is "I want to go to your company", the back-translated expanded sentence may be "I want to go to a noble company".

To this end, training the sample set may include: a first class of training samples (original corpus data), a second class of training samples (punctuation expanded data), a third class of training samples (noisy expanded data), and so forth. For each type of training sample, the training sample includes: the standard question statement, a homogeneous sample of the standard question statement, and a heterogeneous sample of the standard question statement. It may be understood that the standard question sentence is a sentence manually entered as a standard question, the homogeneous sample is used to indicate a sentence similar to the standard question sentence, and the heterogeneous sample is used to indicate a sentence dissimilar to the standard question sentence.

S102, in the iterative training process of using the training sample set to conduct diversity stages on an initial text matching model, the initial text matching model is input after the mixed coding processing based on linear interpolation is conducted on each training sample, and a first distance between the standard problem statement and the similar sample and a second distance between the standard problem statement and the different sample are obtained through output.

The problems of users are various, and how to improve the diversity of model input data as much as possible on the basis of limited training data so as to achieve the purpose of improving the accuracy of a text matching model is a technical problem to be solved in a diversity stage. In the embodiment of the invention, the data enhancement is performed by adopting the hybrid coding processing based on the linear interpolation in the diversity stage, so that the diversity of the input data of the model is improved.

And S103, adjusting model parameters of the text matching model according to the output first distance, the second distance and a loss function of the text matching model until the loss function of the text matching model meets a set condition, and obtaining the text matching model trained in the diversity stage.

In an alternative implementation, the initial text matching model may be structured using a Triplet Network, which is suitable for use in small sample learning scenarios. The structure of the Triplet Network is shown in fig. 2, and includes three feedforward neural networks (nets) which have the same structure and share model parameters. Feed-forward neural networks typically consist of an input layer, one to more hidden layers, an output layer, with data passing back through the network layer by layer until there is no feedback loop between the output layers.

Each training sample for iterative training of the Triplet Network consists of three samples, which may be referred to as sample triplets, comprising: a reference sample, a homogeneous sample and a heterogeneous sample, which are trained by comparison between samples. Each time training is iterated, a training sample (sample triplet) is input: a reference sample, a homogeneous sample, and a heterogeneous sample, the Triplet Network will output two values: the distance between the reference sample and the like sample, and the reference sample and the alien sample at the feature vector of Net (embedding) layer, can characterize the degree of similarity between the samples. The two values output by the Triplet Network are respectively a first distance between the eigenvectors of the reference sample and the eigenvectors of the same kind of samples, and a second distance between the eigenvectors of the reference sample and the eigenvectors of the different kinds of samples. The first distance and the second distance may be L2 distance between feature vectors, i.e., euclidean distance, or cosine distance, and the like, and are not particularly limited.

Assuming that the reference sample is denoted as x, the heterogeneous sample is denoted as x ^- The same class of samples is denoted as x ⁺ . The essence of the Triplet Network is x ^- And x ⁺ The distance relative to x is encoded, which may be Euclidean distance, e.g. equation [1 ]]The following is shown:

please refer to fig. 2,Triplet Network, which further includes a Comparator (Comparator) for processing the vector composed of the two distances. For each sample in a training sample (sample triplet), training a feedforward neural network (Net) with shared parameters to obtain feature vectors of three samples, and performing iterative training through a model to enable x to be equal to or smaller than that of the sample ⁺ And x are the first distance d between the respective corresponding feature vectors ₊ As small as possible, and x ^- And x are respectively corresponding to a second distance d between the feature vectors _- As large as possible, and to let x ⁺ And x are the first distance d between the respective corresponding feature vectors ₊ And x ^- And x are respectively corresponding to a second distance d between the feature vectors _- A minimum interval threshold is maintained, for example, set to 1.

In this embodiment of the present application, a loss function of the text matching model is shown in formula [2], where the loss function is used to characterize differences between the reference sample, i.e., the standard problem statement, and the homogeneous sample, and the heterogeneous sample:

Wherein const represents a constant, by d ₊ And d _- The initial variable can be normalized to within the (0, 1) range, d ₊ 、d _- Respectively as shown in the formula [3 ]]、[4]The following is shown:

it should be noted that, the trained text matching model is generally used for obtaining respective feature vectors for two input sentences respectively, and outputting the matching degree between the two sentences through similarity calculation, so the trained text matching model may include two trained feedforward neural networks.

In an alternative implementation, training of the text matching model may be performed using a curriculum learning approach, where curriculum learning (Curriculum Learning) is a learning strategy that models human-simulated learning methods, allowing models to learn from easy samples and progressively advance to complex samples and knowledge learning.

In this embodiment of the present application, at least one stage of iterative training may be performed on the initial text matching model using a training sample set, so as to obtain a trained text matching model, where the iterative training of each stage of the at least one stage is performed until a loss function of the text matching model meets a set condition. Under the condition of adopting a course learning mode, the iterative training of two adjacent stages is carried out successively, a text matching model trained in the previous stage is used as an initial text matching model in the next stage, and a text matching model trained in the last stage is used as a final training completion text matching model. The at least one stage includes the diversity stage.

Based on the structure of the initial text matching model, a process of performing at least one stage of iterative training on the initial text matching model using the training sample set is described in detail. The at least one stage includes at least a diversity stage; in an alternative implementation, the at least one stage may further comprise a baseline stage, the baseline stage preceding the diversity stage; in an alternative implementation, the at least one stage may further comprise a progressive stage, the progressive stage following the diversity stage. In specific implementation, the baseline stage, the diversity stage and the progressive stage form a training mode of course learning from easy to difficult.

The iterative training process of the three phases is described in sequence as follows.

Referring to fig. 3a, in the baseline stage, the initial text matching model may be iteratively trained using a first type of training sample, or a second type of training sample, or both the first type of training sample and the second type of training sample in the training sample set.

In the iterative training process of the baseline stage, the specific steps of each iterative training include:

and a1, respectively performing one-hot coding on the standard problem statement, the similar sample and the heterogeneous sample in the training sample to obtain respective corresponding one-hot coding matrixes, and respectively coding the respective corresponding one-hot coding matrixes based on a pre-trained language characterization model to obtain respective corresponding language characterization model coding matrixes.

It should be noted that, if the training sample is a first type training sample, the corresponding sentences that need to be subjected to one-hot coding and are coded based on the pre-trained language characterization model include the standard question sentences, similar sentences of the standard question sentences, and dissimilar sentences of the standard question sentences; if the training sample is a second class training sample, the corresponding sentences needing one-hot coding and coding based on the pre-trained language characterization model comprise the standard problem sentences, punctuation expansion similar sentences of the standard problem sentences and punctuation expansion dissimilar sentences of the standard problem sentences.

one-hot encoding, also known as "one-hot encoding", is a relatively common method of text feature extraction, essentially encoding N states with N-bit status registers, each state having an independent register bit, and only one state in each register bit.

The pre-trained language characterization model may be selected from any of BERT, roBERTa, ALBERT and the like. BERT (Bidirectional Encoder Representations from Transformers, transform-based bi-directional coding characterization) is represented by a transform bi-directional encoder. The BERT is divided into a pre-training part for training a language model and a model fine-tuning part, wherein the model fine-tuning part uses the pre-training language model to carry out model fine-tuning training and is widely applied to tasks such as text classification, text matching and the like. RoBERTa, ALBERT are all improved versions of BERT models.

And a2, respectively inputting the language representation model coding matrixes corresponding to the standard problem sentences, the similar samples and the heterogeneous samples into the corresponding feedforward neural network in the initial text matching model, and outputting a first distance between the standard problem sentences and the similar samples and a second distance between the standard problem sentences and the heterogeneous samples.

Specifically, the language characterization model coding matrix corresponding to the standard problem statement is input into the feedforward neural network corresponding to x in fig. 2, and the language characterization model coding matrix corresponding to the same type of sample is input into x in fig. 2 ⁺ Corresponding feedforward neural network, the language representation model coding matrix corresponding to the heterogeneous sample is input into x in figure 2 ^- A corresponding feedforward neural network.

Specifically, the first distance between the standard question sentence and the similar sample may be a euclidean distance between a feature vector corresponding to the standard question sentence and a feature vector corresponding to the similar sample, and the second distance between the standard question sentence and the dissimilar sample may be a euclidean distance between a feature vector corresponding to the standard question sentence and a feature vector corresponding to the dissimilar sample.

And a3, adjusting model parameters of the text matching model according to the output first distance, the second distance and a loss function of the text matching model, wherein the loss function is used for representing differences between the standard problem statement and the similar samples and the heterogeneous samples.

Specifically, a Comparator (Comparator) calculates a loss value of a loss function according to the first distance and the second distance, and then aims at loss reduction of a text matching model, and model parameters of the text matching model are adjusted.

It should be noted that the above process is only one iterative training process in the baseline stage. In practical application, repeated iterative training may be required to be performed for multiple times to reach the requirement that the loss function of the text matching model meets the set condition at this stage, and the loss function meets the set condition may specifically be that the loss reduction degree of the loss function is smaller than the set threshold, so that the iterative training process may be performed multiple times.

The baseline stage of course learning can be directly trained by using high-quality and high-distinction original corpus data marked by customer service personnel to obtain a baseline model; the data after punctuation expansion can be used for training to obtain a baseline model, and the punctuation expansion can enable the text matching model to have good recognition capability for common basic problems; the method can also use the original corpus data and the punctuation expanded data for training to obtain a baseline model, and the punctuation expanded data is added into the original corpus data to better learn the representation of the feature vector of the model input data.

Referring to fig. 3b, in the diversity stage, the initial text matching model may be iteratively trained using a first type of training sample, or a second type of training sample, or both the first type of training sample and the second type of training sample in the training sample set. In the diversity stage, under the condition that the first-type training sample and the second-type training sample are used for carrying out iterative training on the initial text matching model, the accuracy of the text matching model can be improved through expansion of training data due to the fact that punctuation expanded data are added into original corpus data.

In the iterative training process of the diversity stage, the specific steps of each iterative training include:

and b1, respectively performing linear interpolation-based hybrid coding processing on the standard problem statement, the similar samples and the heterogeneous samples in the training samples to obtain respective corresponding hybrid coding matrixes.

In an alternative implementation, step b1 may include the steps of:

and b1-1, respectively performing one-hot coding on the standard problem statement, the similar samples and the heterogeneous samples in the training samples to obtain respective corresponding one-hot coding matrixes.

It should be noted that, if the training sample is the first type training sample, the corresponding sentence to be processed by the hybrid coding based on the linear interpolation includes the standard question sentence, the similar sentence of the standard question sentence, and the dissimilar sentence of the standard question sentence; if the training sample is a second class training sample, the corresponding sentence needing to be processed by the mixed coding based on the linear interpolation comprises the standard question sentence, the punctuation expansion similar sentence of the standard question sentence and the punctuation expansion dissimilar sentence of the standard question sentence. The process of performing the hybrid coding process based on linear interpolation on each sentence is the same, and for any sentence, one-hot (hot-independent) coding is performed on the sentence, so as to form a one-hot coding matrix with a dimension of shape= (text length, total number of word list), and the one-hot coding matrix is used as a one-hot coding result and is marked as T. The vocabulary includes the encoding ranges of all characters.

And b1-2, respectively encoding the standard problem sentences, the similar samples and the one-hot encoding matrixes corresponding to the heterogeneous samples based on the pre-trained language characterization model to obtain the corresponding language characterization model encoding matrixes, and multiplying and normalizing the corresponding language characterization model encoding matrixes with the preset vocabulary vector matrixes to obtain the corresponding sentence coding prediction result matrixes.

Specifically, the pre-trained language characterization model may select any one of BERT, roBERTa, ALBERT and the like. For example, the one-hot coding matrix T is coded with BERT, and the resulting BERT coding matrix is denoted as BERT (T), and the BERT (T) dimension is shape= (text length, BERT coding vector dimension), where the BERT coding vector dimension is fixed, typically 768.

Multiplying BERT (T) by a pre-coded vocabulary vector matrix W, wherein the dimension size shape= (text length, vocabulary total number) of W, and performing softmax normalization processing to obtain a sentence coding prediction result matrix M, wherein the dimension size shape= (vocabulary total number, BERT coding vector dimension) of M. The calculation mode of the sentence coding prediction result matrix M is shown in a formula [5 ]:

M＝softmax(BERT(T)*W ^T )

and b1-3, performing linear interpolation processing on sentence coding prediction result matrixes and one-hot coding matrixes which are respectively corresponding to the standard problem sentences, the similar samples and the heterogeneous samples to obtain respective corresponding mixed coding matrixes.

Specifically, linear interpolation processing is performed on the sentence coding prediction result matrix M and the one-hot coding matrix T, namely mixup mixing is performed, and the expanded mixed coding matrix X is obtained. A linear interpolation super parameter λ may be set, where the super parameter λ is used to represent a proportion of the one-hot encoding matrix in the linear interpolation process, and a sum of proportions of the one-hot encoding matrix and the sentence coding prediction result matrix in the linear interpolation process is 1. The calculation mode of the mixed coding matrix X is shown in a formula [6 ]:

X＝λT+(1-λ)M

During the model iterative training of the diversity phase, the parameter value of the hyper-parameter λ may be set to a fixed value, for example λ=0.2. In order to further improve diversity, in the model iterative training process of the diversity stage, the parameter value of the super parameter lambda may be initialized to a minimum preset parameter value (for example, 0.05), and if the loss drop degree of the loss function is smaller than a set threshold value in the case of performing model iterative training based on the current parameter value of the super parameter, the current parameter value of the super parameter lambda is improved according to a set adjustment step (for example, 0.05) until the maximum preset parameter value (for example, 0.2) of the super parameter is reached.

For an example of a mixed coding process flow based on linear interpolation for any sentence, please refer to fig. 4, assuming that the current sentence is "i want to check weather", firstly, performing one-hot coding for the sentence to obtain a one-hot coding matrix T; encoding the one-hot encoding matrix T based on BERT to obtain a BERT encoding matrix BERT (T), multiplying the BERT encoding matrix BERT (T) by a preset vocabulary vector matrix W, and carrying out softmax normalization processing to obtain a sentence encoding prediction result matrix M; and carrying out linear interpolation processing on the sentence coding prediction result matrix M and the one-hot coding matrix T based on the super parameter lambda to obtain an expanded mixed coding matrix X.

The method for enhancing data based on the mixed coding processing of linear interpolation is characterized in that sentence coding prediction result matrixes are obtained by multiplying a language characterization model coding matrix and a vocabulary vector matrix, and because the vocabulary comprises coding ranges of all characters, other similar semantic information in the vocabulary is effectively utilized, the coding information in the vocabulary can be fused into the language characterization model coding matrix, and the diversity of model input data is effectively improved; the sentence coding prediction result matrix is mixed with the one-hot coding matrix by utilizing a linear interpolation, and the advantages of the sentence coding prediction result matrix and the one-hot coding matrix are taken, so that the addition of coding information is in a controllable range, and the trainability of the model is ensured.

And b2, respectively inputting the mixed coding matrixes corresponding to the standard question sentences, the similar samples and the heterogeneous samples into corresponding feedforward neural networks in the initial text matching model, and outputting a first distance between the standard question sentences and the similar samples and a second distance between the standard question sentences and the heterogeneous samples.

Specifically, the hybrid coding matrix corresponding to the standard problem statement is input to the feedforward neural network corresponding to x in fig. 2, and the hybrid coding matrix corresponding to the similar sample is input to x in fig. 2 ⁺ A corresponding feedforward neural network, the hybrid coding matrix corresponding to the heterogeneous sample is input into x in fig. 2 ^- A corresponding feedforward neural network.

And b3, adjusting model parameters of the text matching model according to the output first distance, the second distance and a loss function of the text matching model, wherein the loss function is used for representing differences between the standard problem statement and the similar samples and the heterogeneous samples.

It should be noted that the above process is only one iterative training process of diversity stage. In practical applications, repeated iterative training may be required to be performed for multiple times to reach the requirement that the loss function of the text matching model meets the set condition at this stage, and the loss reduction degree of the loss function is smaller than the set threshold value when the loss function meets the set condition and reaches the maximum preset parameter value (for example, 0.2) of the super parameter, so that the iterative training process may be performed multiple times.

In the diversity stage of course learning, high-quality and high-distinction original corpus data marked by customer service personnel can be directly used for training, punctuation expanded data can be used for training, original corpus data and punctuation expanded data can be used for training, and data enhancement is performed by adopting mixed coding based on linear interpolation, so that the diversity of model input data is improved; further, the linear interpolation super parameter lambda can be flexibly adjusted, the larger the parameter value of the super parameter is, the more the diversity of the model input data is, in the iterative training process of the model, the initialized super parameter is the minimum preset parameter value such as lambda=0.05, when the descending degree of the loss function of each verification set is smaller than a set threshold value, the parameter value of lambda is lifted by a set adjustment step length such as 0.05 until the parameter value is lifted to the maximum preset parameter value such as lambda=0.2, and under the condition that the super parameter reaches the maximum preset parameter value, if the descending degree of the loss function is smaller than the set threshold value, the iterative training of the diversity stage is finally completed, and the accuracy of the text matching model is improved by continuously lifting the diversity of the model input data.

Referring to fig. 3c, in the progressive stage, model iterative training may be performed on the initial text matching model by using the first type training sample and/or the second type training sample in the training sample set and a third type training sample selected based on a progressive factor τ, where the progressive factor τ is used to represent a proportion (for example, τ=0.1) of a sample size of the selected third type training sample to a total sample size of the third type training sample, and if a loss reduction degree of the loss function is less than a set threshold value in a case of performing model iterative training based on a current sample size, a sample size of the third type training sample corresponding to the progressive factor τ is increased on the basis of the current sample size until the third type training sample is added.

In the iterative training process of the progressive stage, the specific steps of each iterative training include:

and c1, respectively performing one-hot coding on the standard problem statement, the similar sample and the heterogeneous sample in the training sample to obtain respective corresponding one-hot coding matrixes, and respectively coding the respective corresponding one-hot coding matrixes based on a pre-trained language characterization model to obtain respective corresponding language characterization model coding matrixes.

It should be noted that, if the training sample is a first type training sample, the corresponding sentences that need to be subjected to one-hot coding and are coded based on the pre-trained language characterization model include the standard question sentences, similar sentences of the standard question sentences, and dissimilar sentences of the standard question sentences; if the training sample is a second class training sample, the corresponding sentences needing one-hot coding and coding based on a pre-trained language characterization model comprise the standard problem sentences, punctuation expansion similar sentences of the standard problem sentences and punctuation expansion dissimilar sentences of the standard problem sentences; if the training sample is a third type training sample, the corresponding sentences which need to be subjected to one-hot coding and are coded based on the pre-trained language characterization model comprise the standard problem sentences, the noise expansion similar sentences of the standard problem sentences and the noise expansion dissimilar sentences of the standard problem sentences. The pre-trained language characterization model may be selected from any of BERT, roBERTa, ALBERT and the like.

And c2, respectively inputting the language characterization model coding matrixes corresponding to the standard problem sentences, the similar samples and the heterogeneous samples into the corresponding feedforward neural network in the initial text matching model, and outputting a first distance between the standard problem sentences and the similar samples and a second distance between the standard problem sentences and the heterogeneous samples.

And c3, adjusting model parameters of the text matching model according to the output first distance, the second distance and a loss function of the text matching model, wherein the loss function is used for representing differences between the standard problem statement and the similar samples and the heterogeneous samples.

It should be noted that the above process is only one iterative training process in the progressive stage. In practical application, repeated iterative training may be required to be performed for multiple times to reach the requirement that the loss function of the text matching model meets the set condition at this stage, and the loss reduction degree of the loss function is smaller than the set threshold value when the loss function meets the set condition, and in particular, the third class training sample is added completely, so that the iterative training process may be performed multiple times.

In the progressive stage of course learning, original corpus data, punctuation expanded data and noise expanded data can be used for training, a progressive learning strategy is adopted, a progressive factor tau is introduced, tau starts from 0.1, when the reduction degree of a loss function of each verification set is smaller than a set threshold value, namely, the data quantity of 0.1 of the total sample quantity of a third type training sample is increased to a data pool for training until all the third type training samples are added, and under the condition that all the third type training samples are added, if the reduction degree of the loss function is smaller than the set threshold value, iterative training in the progressive stage is finally completed, so that models are easy to learn difficult samples gradually, and due to the fact that the difficult samples can simulate the condition that expression such as spoken language is not standard under the service scene of a robot voice conversation and inaccurate conditions such as errors and other words exist in voice recognition results, the accuracy of a text matching model can be improved, and adverse effects on intention recognition caused by the conditions are reduced.

In the following, a unified explanation is given on how to adjust the model parameters of the text matching model in each iterative training of each stage. Model parameters of the text matching model refer to parameters used to characterize the structure of the text matching model. In particular, the model parameters of the text matching model may include model parameters shared by three feedforward neural networks in the text matching model, which may include network parameters of network layers in the feedforward neural network. For each network layer, the network parameters of each network layer may include, for example, but not limited to, the number of neurons included in each network layer, connection relationships between neurons and neurons of other network layers, connection weights, and the like.

In an alternative implementation, the model parameters of the text matching model may be adjusted using a back-propagation method. Specifically, a loss value of the text matching model is determined based on the first distance, the second distance and a loss function of the text matching model, and then network parameters of each network layer in the text matching model are adjusted layer by counter propagation from the last network layer of the text matching model with the loss of the text matching model as a target.

For example, from the last network layer of the text detection model, according to the structure of each network layer in the text matching model, the connection relation and the connection weight between different network layers, and the like, the loss value of the text matching model is forward biased to obtain the loss value of each network layer, wherein the loss value of each network layer is used for representing the matching deviation caused by each network layer; then, the network parameters of each network layer are sequentially adjusted based on the loss values of each network layer with the goal of reducing the loss of the text matching model.

According to the training method of the text matching model, the problem that model training can only be carried out according to similar problem data in a knowledge base and training data are few is solved, data expansion is carried out on each training sample in a training sample set, and the standard problem statement and the similar samples are added into each training sample except the standard problem statement; in the iterative training process of using a training sample set to carry out diversity stage, after carrying out data enhancement on standard problem sentences, similar samples and heterogeneous samples in the training samples by adopting mixed coding processing based on linear interpolation, inputting an initial text matching model, and adjusting model parameters of the text matching model through comparison among the samples. Through carrying out data expansion to every training sample in the training sample set to because the problem that the customer put forward is various, adopt the mixed coding processing based on linear interpolation to carry out data enhancement, can promote the variety of model input data on limited training data basis, thereby can effectively promote the rate of accuracy of text matching model through the iterative training of diversity stage.

Further, aiming at the characteristics that each training sample comprises standard problem sentences, similar samples and heterogeneous samples, the initial text matching model adopts a triple network structure, and comprises three feedforward neural networks which have the same structure and share model parameters, and the triple network has the characteristic of high reasoning speed; in each iterative training process, the standard problem statement, the similar samples and the heterogeneous samples are respectively input into the corresponding feedforward neural network, and model parameters of the text matching model are adjusted through comparison among the samples.

Furthermore, aiming at the scene with less training data, a plurality of data expansion methods are adopted, including punctuation expansion, noise expansion, back translation expansion and the like, and aiming at different characteristics of the expansion data obtained by different data expansion methods, a training mode of course learning which is easy before difficult is designed, and the training mode can lead a model to learn the main contradiction of the problem first, is first and then second, has more advantages than the traditional training mode of one brain, leads the trained text matching model to be closer to the actual scene, and is particularly suitable for the intention recognition of a dialogue question-answering system.

Correspondingly to the training method of the text matching model, the embodiment of the application also provides an intention recognition method, which can accurately match the input voice recognition result with the standard problem statement conforming to the intention of the customer based on the text matching model trained by the method shown in fig. 1, thereby completing the intention recognition task and improving the accuracy of the intention recognition.

Referring to fig. 5, a flowchart of an intent recognition method according to an embodiment of the present application is provided, and the method may include the following steps:

s501, acquiring a voice recognition statement corresponding to a voice to be recognized.

The speech to be recognized generally refers to a question presented by a client in a dialogue question-answering system, a speech recognition sentence in a corresponding text format is generally obtained through speech recognition (ASR), the speech recognition sentence enters a question recognition process after being preprocessed, and the speech recognition sentence corresponding to the speech to be recognized is generally obtained in S501 refers to the preprocessed speech recognition sentence.

S502, inputting the voice recognition sentences and standard problem sentences in a knowledge base into a pre-trained text matching model, and outputting the matching degree between the voice recognition sentences and the standard problem sentences; the text matching model is obtained through training according to the training method of the text matching model shown in the figure 1. It is understood that the degree of matching can characterize the degree of similarity between the speech recognition statement and the standard question statement.

S503, determining a standard question sentence matched with the voice to be recognized as an intention recognition result based on the matching degree between the voice recognition result and each standard question sentence.

In an alternative implementation manner, a standard problem sentence with the highest matching degree with a voice recognition sentence is generally taken as an intention recognition result, and subsequent processes such as response and the like are generated according to the intention recognition result on the basis of accurately recognizing the intention of the customer.

According to the intention recognition method provided by the embodiment of the application, the text matching model is obtained through training by using the training method of the text matching model, text matching is carried out on the voice recognition sentences corresponding to the voices to be recognized and standard problem sentences in the knowledge base, as the training method of the text matching model aims at the problem of less training data, data expansion is carried out on a training sample set, besides the standard problem sentences and similar samples thereof, heterogeneous samples of labeling problem sentences are added, data enhancement is carried out by adopting mixed coding processing based on linear interpolation in a diversity stage, diversity of model input data is improved, therefore, the accuracy of the text matching model can be improved, the trained text matching model is suitable for intention recognition scenes with diversity of the voices to be recognized, and real clients expressed by the voices to be recognized can be accurately recognized on the basis that the text matching model carries out accurate text matching on various voices to be recognized, so that the accuracy of intention recognition is improved.

Taking a service scene of a voice robot dialogue as an example, a training method of a text matching model and an intention recognition method provided by the embodiment of the application are described in an exemplary manner. The processing flow related to the conversation of the voice robot is shown in fig. 6, the problem presented by the client to the voice robot is voice to be recognized, the voice to be recognized is subjected to voice recognition (ASR) to obtain a corresponding voice recognition result, the voice recognition result enters the problem recognition flow after being preprocessed, the preprocessing can comprise sensitive word processing, deactivated word processing, traditional Chinese character replacement and the like, and it can be understood that the voice recognition result is a voice recognition sentence in a text format; in the problem identification flow, if the problem is not empty, processing such as entity identification and intention identification is performed; then, the answer flow is entered to generate the answer of the voice robot based on the intention recognition result, the follow-up flow is continued, and the conversation flow of the voice robot is finally completed. The training method of the text matching model and the intention recognition method provided by the embodiment of the application can be applied to the intention recognition of the problem recognition flow. In a knowledge base of a voice robot, standard problem sentences in a text format and similar sentences of the standard problem sentences are usually manually input in advance, and in order to solve the problem of less training data in the knowledge base of the voice robot, dissimilar sentences of the standard problem sentences are input in the knowledge base in an added mode, so that a first training sample consisting of the standard problem sentences, the similar sentences of the standard problem sentences and the dissimilar sentences is obtained; obtaining a second type training sample and a third type training sample from the first type training sample through punctuation expansion, noise adding expansion and the like; and performing iterative training of at least one stage on the initial text matching model by using the training sample set after data expansion to obtain a trained text matching model for the voice robot dialogue, wherein the at least one stage comprises a diversity stage for data enhancement by adopting a hybrid coding process based on linear interpolation. Further, in the processing flow related to the voice robot dialogue, the trained text matching model is used for matching the preprocessed voice recognition sentences (text formats) with standard problem sentences (text formats) in a knowledge base of the voice robot, standard problem sentences matched with voices to be recognized (problems posed to the voice robot by a client) are determined to serve as intention recognition results corresponding to the problems posed to the voice robot by the client, and the response of the voice robot is generated based on the intention recognition results. It can be seen that under the business scene of the voice robot dialogue, the problem that training data of the voice robot is less is effectively solved, the accuracy of a text matching model can be improved through data expansion, the trained text matching model is suitable for the intention recognition scene with diversity of voices to be recognized, and the real customer intention expressed by the voices to be recognized can be accurately recognized on the basis that the text matching model carries out accurate text matching on various voices to be recognized, so that the accuracy of intention recognition is improved. The service scenario of the voice robot dialogue is only an exemplary illustration of an application scenario of the method provided by the embodiment of the present application, and the training method and the intention recognition method of the text matching model provided by the embodiment of the present application can also be applied to service scenarios of other dialogue questions and answers.

In addition, corresponding to the training method of the text matching model shown in fig. 1, the embodiment of the application also provides a training device of the text matching model. Fig. 7 is a schematic structural diagram of a training device 700 for text matching models according to an embodiment of the present application, including:

a construction module 701, configured to construct a training sample set based on standard problem sentences in a knowledge base, where each training sample in the training sample set includes: the standard problem statement is a statement which is manually input as a standard problem, the similar sample is used for indicating a statement similar to the standard problem statement, and the heterogeneous sample is used for indicating a statement dissimilar to the standard problem statement;

the diversity training module 702 is configured to, in an iterative training process of performing a diversity stage on an initial text matching model by using the training sample set, perform a hybrid encoding process based on linear interpolation on each training sample, input the initial text matching model, and output a first distance between the standard question sentence and the similar sample and a second distance between the standard question sentence and the heterogeneous sample;

And the first adjustment module 703 is configured to adjust model parameters of the text matching model according to the output first distance, the second distance, and a loss function of the text matching model until the loss function of the text matching model meets a set condition, thereby obtaining the text matching model after the diversity stage training.

In an alternative implementation, the initial text matching model includes three feedforward neural networks that are identical in structure and share model parameters; during the iterative training of the diversity phase:

the diversity training module 702 is specifically configured to perform a hybrid encoding process based on linear interpolation on the standard problem statement, the similar sample, and the heterogeneous sample in the training sample during each iteration training, so as to obtain respective corresponding hybrid encoding matrices; respectively inputting the mixed coding matrixes corresponding to the standard problem statement, the similar sample and the heterogeneous sample into a corresponding feedforward neural network in the initial text matching model, and outputting a first distance between the standard problem statement and the similar sample and a second distance between the standard problem statement and the heterogeneous sample;

The first adjustment module 703 is specifically configured to adjust model parameters of the text matching model according to the first distance, the second distance, and a loss function of the text matching model output by the diversity training module 702, where the loss function is used to characterize differences between the standard question statement and the similar samples and the heterogeneous samples.

In an alternative implementation, the diversity training module 702 uses the first type of training samples and/or the second type of training samples in the training sample set to perform iterative training of the diversity stage on the initial text matching model.

In an optional implementation manner, the diversity training module 702 is specifically configured to perform one-hot encoding on the standard problem statement, the similar sample, and the heterogeneous sample in the training sample during each iteration training, so as to obtain respective corresponding one-hot encoding matrices; encoding the standard problem statement, the similar sample and the one-hot encoding matrix corresponding to the heterogeneous sample respectively based on a pre-trained language characterization model to obtain respective corresponding language characterization model encoding matrix, multiplying the respective corresponding language characterization model encoding matrix with a preset vocabulary vector matrix respectively, and carrying out normalization processing to obtain respective corresponding sentence coding prediction result matrix; and carrying out linear interpolation processing on sentence coding prediction result matrixes and one-hot coding matrixes which respectively correspond to the standard problem sentences, the similar samples and the heterogeneous samples to obtain respective corresponding mixed coding matrixes.

In an alternative implementation, the training device 700 further includes:

a training control module 704, configured to perform at least one stage of model iterative training on the initial text matching model by using the training sample set, and use the text matching model trained in the last stage as a final training completed text matching model; the model of each stage in the at least one stage is trained iteratively until the loss function of the text matching model meets a set condition; the at least one stage includes a diversity stage.

In an alternative implementation, the at least one stage further includes a baseline stage, the baseline stage being preceded by the diversity stage by implementing an iterative training process for the baseline stage by a baseline training module 705 and a second adjustment module 706, the baseline training module 705 performing an iterative training for a baseline stage on an initial text matching model using a first type of training sample and/or a second type of training sample in the training sample set:

the baseline training module 705 is configured to perform one-hot encoding on the standard problem statement, the similar sample, and the heterogeneous sample in the training sample during each iteration training, to obtain respective corresponding one-hot encoding matrices, and then encode the respective corresponding one-hot encoding matrices based on the pre-trained language characterization model, to obtain respective corresponding language characterization model encoding matrices; respectively inputting the language characterization model coding matrixes corresponding to the standard problem statement, the similar sample and the heterogeneous sample into a corresponding feedforward neural network in the initial text matching model, and outputting a first distance between the standard problem statement and the similar sample and a second distance between the standard problem statement and the heterogeneous sample;

The second adjustment module 706 is configured to adjust model parameters of the text matching model according to the first distance, the second distance, and a loss function of the text matching model output by the baseline training module 705.

In an alternative implementation, the at least one stage further includes a progressive stage, the progressive stage following the diversity stage, implementing an iterative training process of the progressive stage by a progressive training module 707 and a third adjustment module 708, the progressive training module 707 performing iterative training of a progressive stage on an initial text matching model using training samples of a first type and/or training samples of a second type in the training sample set, and training samples of a third type selected based on a progressive factor:

the progressive training module 707 is configured to perform one-hot encoding on the standard problem statement, the similar sample, and the heterogeneous sample in the training sample during each iterative training, to obtain respective corresponding one-hot encoding matrices, and then encode the respective corresponding one-hot encoding matrices based on the pre-trained language characterization model, to obtain respective corresponding language characterization model encoding matrices; respectively inputting the language characterization model coding matrixes corresponding to the standard problem sentences, the similar samples and the heterogeneous samples into a specified feedforward neural network in the initial text matching model, and outputting a first distance between the standard problem sentences and the similar samples and a second distance between the standard problem sentences and the heterogeneous samples;

The third adjustment module 708 is configured to adjust model parameters of the text matching model according to the first distance, the second distance, and a loss function of the text matching model output by the progressive training module 707.

Obviously, the training device for the text matching model in the embodiment of the present application may be used as an execution subject of the training method for the text matching model shown in fig. 1, so that the function of the training method for the text matching model implemented in fig. 1 can be implemented. Since the principle is the same, the description is not repeated here.

In addition, corresponding to the intention recognition method shown in fig. 5, the embodiment of the application also provides an intention recognition device. Fig. 8 is a schematic structural diagram of an intent recognition device 800 according to an embodiment of the present application, including:

an obtaining module 801, configured to obtain a speech recognition sentence corresponding to a speech to be recognized;

a text matching module 802, configured to input the speech recognition sentence and a problem sentence preset in a knowledge base into a pre-trained text matching model, and output a matching degree between the speech recognition sentence and the problem sentence; the text matching model is obtained by training according to the training method of the text matching model shown in the figure 1;

An intent recognition module 803, configured to determine, as an intent recognition result, a standard question sentence that matches the speech to be recognized based on the degree of matching between the speech recognition sentence and each question sentence.

Obviously, the intention recognition device according to the embodiment of the present application may be used as an execution subject of the intention recognition method shown in fig. 5, so that the function of the intention recognition method implemented in fig. 5 can be implemented. Since the principle is the same, the description is not repeated here.

Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 9, at the hardware level, the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, network interface, and memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture ) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in fig. 9, but not only one bus or one type of bus.

And the memory is used for storing programs. In particular, the program may include program code including computer-operating instructions. The memory may include memory and non-volatile storage and provide instructions and data to the processor.

The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form a training device or an intention recognition device of the text matching model on a logic level. The processor executes the program stored in the memory, and is specifically configured to execute each process of implementing the training method or the embodiment of the intention recognition method for the text matching model, and achieve the same technical effect, so that repetition is avoided, and no further description is given here.

The method performed by the training device of the text matching model as disclosed in the embodiment of the application or the method performed by the intention recognition device as disclosed in the embodiment of the application may be applied to a processor or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.

Of course, other implementations, such as a logic device or a combination of hardware and software, are not excluded from the electronic device of the present application, that is, the execution subject of the following processing flow is not limited to each logic unit, but may be hardware or a logic device.

The embodiment of the application further provides a computer readable storage medium, and the computer readable storage medium stores one or more programs, where the one or more programs include instructions, and the program or the instructions implement the training method of the text matching model or the respective processes of the embodiment of the intention recognition method when executed by the processor, and achieve the same technical effects, so that repetition is avoided, and no further description is provided herein.

Wherein the processor is a processor in the electronic device described in the above embodiment. Such as a computer Read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, etc.

The embodiment of the application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled with the processor, the processor is used for running a program or instructions, implementing the training method of the text matching model or each process of the embodiment of the intention recognition method, and achieving the same technical effect, so that repetition is avoided, and no redundant description is provided here.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.

The foregoing describes specific embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In summary, the foregoing description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

All embodiments in the application are described in a progressive manner, and identical and similar parts of all embodiments are mutually referred, so that each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

Claims

1. A method for training a text matching model, comprising:

constructing a training sample set based on standard problem sentences in a knowledge base, wherein each training sample in the training sample set comprises: the standard question statement, a homogeneous sample of the standard question statement, and a heterogeneous sample of the standard question statement; the standard problem statement is a statement which is manually input and serves as a standard problem, the similar sample is used for indicating statements similar to the standard problem statement, and the heterogeneous sample is used for indicating statements dissimilar to the standard problem statement;

2. The method of claim 1, wherein the initial text matching model comprises three feed-forward neural networks that are identical in structure and share model parameters;

performing mixed coding processing based on linear interpolation on the standard problem statement, the similar sample and the heterogeneous sample in the training sample to obtain respective corresponding mixed coding matrixes;

Respectively inputting the mixed coding matrixes corresponding to the standard problem statement, the similar sample and the heterogeneous sample into a corresponding feedforward neural network in the initial text matching model, and outputting a first distance between the standard problem statement and the similar sample and a second distance between the standard problem statement and the heterogeneous sample;

and adjusting model parameters of the text matching model according to the output first distance, the second distance and a loss function of the text matching model, wherein the loss function is used for representing differences between the standard problem statement and the similar samples and the heterogeneous samples.

3. The method according to claim 2, wherein the training sample set comprises a first type of training sample and/or a second training sample;

the iterative training of the initial text matching model in the diversity stage by using the training sample set comprises the following steps:

performing iterative training of a diversity stage on the initial text matching model by using a first class training sample and/or a second class training sample in the training sample set;

the similar samples of the standard problem sentences in the first class training samples are similar sentences of the standard problem sentences, the heterogeneous samples are dissimilar sentences of the standard problem sentences, and the similar sentences and the dissimilar sentences of the standard problem sentences are recorded in the knowledge base in advance;

The similar samples of the standard problem statement in the second class training samples are punctuation expansion similar sentences obtained by adding punctuation marks to similar sentences of the standard problem statement, and the heterogeneous samples are punctuation expansion dissimilar sentences obtained by adding punctuation marks to dissimilar sentences of the standard problem statement.

4. A method according to claim 2 or 3, wherein the performing a hybrid coding process based on linear interpolation on the standard problem statement, the similar sample, and the heterogeneous sample in the training sample to obtain respective corresponding hybrid coding matrices specifically includes:

performing single-hot one-hot coding on the standard problem statement, the similar sample and the heterogeneous sample in the training sample to obtain respective corresponding one-hot coding matrixes;

encoding the standard problem statement, the similar sample and the one-hot encoding matrix corresponding to the heterogeneous sample respectively based on a pre-trained language characterization model to obtain respective corresponding language characterization model encoding matrix, multiplying the respective corresponding language characterization model encoding matrix with a preset vocabulary vector matrix respectively, and carrying out normalization processing to obtain respective corresponding sentence coding prediction result matrix;

And carrying out linear interpolation processing on sentence coding prediction result matrixes and one-hot coding matrixes which respectively correspond to the standard problem sentences, the similar samples and the heterogeneous samples to obtain respective corresponding mixed coding matrixes.

5. The method according to claim 4, wherein linear interpolation processing is performed on the sentence coding prediction result matrix and a one-hot coding matrix based on a super parameter of linear interpolation, the super parameter being used for representing a proportion of the one-hot coding matrix in the linear interpolation processing, and a sum of proportions of the one-hot coding matrix and the sentence coding prediction result matrix in the linear interpolation processing is 1;

in the iterative training process of the diversity stage, initializing and setting the parameter value of the super parameter to be a minimum preset parameter value, and under the condition of performing iterative training based on the current parameter value of the super parameter, if the loss reduction degree of the loss function is smaller than a set threshold value, lifting the current parameter value of the super parameter according to a set adjustment step length until the maximum preset parameter value of the super parameter is reached.

6. A method according to claim 3, wherein the training sample set is used to perform at least one stage of iterative training on the initial text matching model, and the text matching model trained in the last stage is used as the final training completed text matching model; iterative training of each stage of the at least one stage until a loss function of the text matching model meets a set condition; the at least one stage includes the diversity stage.

7. The method of claim 6, wherein the at least one stage further comprises a baseline stage, the iterative training of the initial text matching model using the training sample set at least one stage further comprising:

before iterative training in the diversity stage, performing iterative training in a baseline stage on an initial text matching model to be trained by using a first type training sample and/or a second type training sample in the training sample set;

performing single-hot one-hot coding on the standard problem statement, the similar sample and the heterogeneous sample in the training sample to obtain respective corresponding one-hot coding matrixes, and then performing coding on the respective corresponding one-hot coding matrixes based on a pre-trained language characterization model to obtain respective corresponding language characterization model coding matrixes;

respectively inputting the language characterization model coding matrixes corresponding to the standard problem statement, the similar sample and the heterogeneous sample into a corresponding feedforward neural network in the initial text matching model, and outputting a first distance between the standard problem statement and the similar sample and a second distance between the standard problem statement and the heterogeneous sample;

And adjusting model parameters of the text matching model according to the output first distance, the second distance and the loss function of the text matching model.

8. The method of claim 6, wherein the training sample set further comprises a third class of training samples; the similar samples of the standard problem statement in the third class training samples are noisy extended similar sentences obtained by performing EDA-based noisy processing on similar sentences of the standard problem statement, and the heterogeneous samples are noisy extended dissimilar sentences obtained by performing EDA-based noisy processing on dissimilar sentences of the standard problem statement;

the at least one stage further comprises a progressive stage, the iterative training of the initial text matching model using the training sample set for at least one stage further comprises:

after iterative training in the diversity stage, performing iterative training in the progressive stage on the initial text matching model by using a first type training sample and/or a second type training sample in the training sample set and a third type training sample selected based on a progressive factor; the progressive factor is used for representing the proportion of the sample size of the selected third-class training sample to the total sample size of the third-class training sample, and if the loss reduction degree of the loss function is smaller than a set threshold value under the condition of performing iterative training based on the current sample size, the sample size of the third-class training sample corresponding to the progressive factor is increased on the basis of the current sample size until all the third-class training samples are added;

9. The method of claim 8, wherein the EDA-based denoising of similar or dissimilar sentences of the standard question sentence comprises one or any combination of the following:

Randomly extracting at least one non-stop word from the sentence to be processed, and carrying out corresponding replacement in the sentence to be processed by using synonyms or homonyms corresponding to each non-stop word;

performing at least one random insertion in the statement to be processed: randomly extracting a non-stop word, and inserting synonyms corresponding to the non-stop word into random positions of the sentences to be processed;

performing at least one random exchange in the statement to be processed: randomly selecting two words to exchange positions;

randomly deleting each word in the sentence to be processed based on the set probability;

wherein the sentence to be processed is a similar sentence or a non-similar sentence of the standard question sentence.

10. An intent recognition method, comprising:

inputting the voice recognition sentences and standard problem sentences in a knowledge base into a pre-trained text matching model, and outputting the matching degree between the voice recognition sentences and the standard problem sentences; wherein the text matching model is trained according to the training method of the text matching model as claimed in any one of claims 1 to 9;

and determining a standard problem statement matched with the voice to be recognized as an intention recognition result based on the matching degree between the voice recognition result and each standard problem statement in the knowledge base.

11. A training device for a text matching model, comprising:

12. An intent recognition device, comprising:

the text matching module is used for inputting the voice recognition statement and standard problem statement in the knowledge base into a pre-trained text matching model and outputting the matching degree between the voice recognition result and the standard problem statement; wherein the text matching model is trained according to the training method of the text matching model as claimed in any one of claims 1 to 9;

and the intention recognition module is used for determining the standard problem statement matched with the voice to be recognized as an intention recognition result based on the matching degree between the voice recognition result and each standard problem statement in the knowledge base.

13. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the training method of the text matching model of any of claims 1 to 9 or to implement the intent recognition method of claim 10.

14. A computer readable storage medium, characterized in that instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the training method of a text matching model according to any one of claims 1 to 9, or to implement the intention recognition method according to claim 10.