CN113850383A

CN113850383A - Text matching model training method and device, electronic equipment and storage medium

Info

Publication number: CN113850383A
Application number: CN202111134466.7A
Authority: CN
Inventors: 颜泽龙; 王健宗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-09-27
Filing date: 2021-09-27
Publication date: 2021-12-28

Abstract

The application relates to artificial intelligence, and provides a text matching model training method, which comprises the following steps: obtaining a text matching model, wherein the text matching model comprises a pre-trained BERT model, and the pre-trained BERT model comprises a dropout layer; acquiring training data, wherein data samples in the training data do not comprise tags; inputting the training data to the text matching model at least twice, and respectively obtaining output results of the text matching model; obtaining similarity representation among the output results; and optimizing parameters in the model based on the loss function corresponding to the similarity representation to obtain a trained text matching model. By arranging the dropout layer with the determined activation proportion in the BERT model, inputting the same data twice, and performing reverse parameter optimization based on the difference of two output results, the cost of model training is reduced and the efficiency of model training is improved because data with labels is not needed.

Description

Text matching model training method and device, electronic equipment and storage medium

Technical Field

The embodiment of the application relates to, but not limited to, the field of artificial intelligence, and in particular relates to a text matching model training method and device, electronic equipment and a computer-readable storage medium.

Background

Text matching is an important basic problem in natural language processing, and can be applied to a large number of natural language processing tasks, such as information retrieval, dialog systems and the like. Taking an intelligent dialogue system as an example, the user can be given the most appropriate response by accurately matching the user question with the preset question in the dialogue library.

Most of the traditional text matching models depend on information of words and word frequencies, and the deep learning models can mine deeper text information, so that the accuracy of text matching is improved. However, the deep learning model training needs to rely on a large amount of labeled data, which not only needs to pay a large time cost, but also is difficult to implement in some practical cases with insufficient labeled data.

Disclosure of Invention

The embodiment of the application provides a text matching model training method and device, electronic equipment and a computer readable storage medium.

In a first aspect, an embodiment of the present application provides a text matching model training method, including: obtaining a text matching model, wherein the text matching model comprises a pre-trained BERT model, the pre-trained BERT model comprises a dropout layer, and the activation proportion of the dropout layer is less than 1; acquiring training data, wherein data samples in the training data do not comprise tags; inputting training data into the text matching model for multiple times, and respectively obtaining multiple output results of the text matching model; obtaining similarity representation among the output results; and optimizing parameters in the text matching model based on the loss function corresponding to the similarity representation to obtain a trained text matching model. In the text matching model training method provided by this embodiment, the dropout layer with the determined activation ratio is set in the BERT model, the same data is input twice, and the inverse parameter optimization is performed based on the difference between the two output results.

In a second aspect, an embodiment of the present application provides a text matching apparatus, including: the text acquisition module is used for acquiring a text to be matched; and the text matching module is used for performing text matching according to the text matching model training method in the first aspect.

In a third aspect, embodiments of the present application provide an electronic device comprising a processor, a memory, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the text matching model training method according to the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, including: stored are program instructions executable by a processor for performing a text matching model training method as in the first aspect.

The dropout layer with the determined activation proportion is arranged in the BERT model, the same batch of data is used for training and parameter optimization, and data with labels are not needed, so that model training cost is reduced, model training efficiency is improved, and meanwhile, the trained text matching model is high in matching precision in use, and user experience is improved.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

Fig. 1 is a schematic flowchart of a text matching model method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a dropout layer in a BERT model provided in this embodiment;

fig. 3 is a schematic flowchart of a text matching method according to an embodiment of the present application;

fig. 4 is a schematic flowchart of a text matching method according to another embodiment of the present application;

fig. 5 is a mapping relationship diagram of a matching text and a target text provided in the embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. In the present application, the embodiments and features of the embodiments may be arbitrarily combined with each other without conflict.

The terms first, second and the like in the description and in the claims, and the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that if an orientation description is referred to, such that the directions or positional relationships indicated, for example, up, down, front, rear, left, right, etc., are based on the directions or positional relationships shown in the drawings, it is only for convenience of describing the present application and simplifying the description, and does not indicate or imply that the device or element referred to must have a particular orientation, be constructed and operated in a particular orientation, and therefore should not be taken as limiting the present application.

It should be noted that at least one means one or more, a plurality of means two or more, and that larger, smaller, larger, etc. are understood as excluding the number, and that larger, smaller, etc. are understood as including the number. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.

In the prior art, dialog systems can be broadly divided into three main categories: chat-type dialog systems (Chitchat-bots), retrieval-type dialog systems (IR-bots), Task-type dialog systems (Task-bots). In any of the above dialog systems, it is necessary to process a natural language from a user. In the dialog system or the question-answering system, the natural language input by the user is recognized, analyzed and processed, and matched with the text preset by the system.

Typically, the question-answering system will present a list of standard texts most relevant to a query text (query) input by a user, wherein the standard texts indicate user intentions or user questions. And if the user clicks a certain standard text in the list, displaying the answering scheme corresponding to the marked text to the user.

The BERT (bidirectional Encoder expressions) model is a deep semantic understanding model. The method is based on the Bert model, and other structures and matched training methods are added, so that more accurate model output is realized.

Due to the limitation of artificial intelligence, each dialog or question-answering system needs a large amount of label data to train in order to ensure basic user experience, and in consideration of the diversity of user input, acquiring a large amount of labeled data requires a large cost and is sometimes even difficult to realize.

Based on this, the application provides a text matching model training method.

In a first aspect, fig. 1 is a schematic flow chart of a text matching model training method provided in another embodiment of the present application, and the text matching model training method provided in the embodiment with reference to fig. 1 at least includes:

step S110: the method comprises the steps of obtaining a text matching model, wherein the text matching model comprises a pre-trained BERT model, the pre-trained BERT model comprises a dropout layer, and the activation proportion of the dropout layer is smaller than 1.

dropout refers to that in the training process of a machine learning model, a neural network training unit is removed from a network according to a certain probability, and because the neural network training unit is removed randomly, each batch of data is trained on different networks, and even if the same batch of data is input for training, the output result is still different, so that the model also dynamically changes after each batch of data is trained.

In some embodiments, as shown in fig. 2, a dropout function is added after each network layer, and the generalization capability of the model is improved by randomly discarding part of the training parameters in each iteration.

In some embodiments, dropout has an activation ratio of 0.5, i.e., half of the neural units' training parameters are removed at will.

In some embodiments, the activation ratio of dropout is less than 1 to satisfy the requirement of the embodiments of the present application, i.e., to ensure that the neural unit has randomness to be removed.

Step S120: training data is obtained, and data samples in the training data do not include tags.

In some embodiments, model training using data that does not contain labels is part of unsupervised learning.

Step S130: and inputting the training data into the text matching model for multiple times to respectively obtain multiple output results of the text matching model.

The essence of this step is to input the same data to the text matching model twice, and because the model has a dropout mechanism, there will be a difference in the two output results of the same data.

In some embodiments, the same data is input to the model twice and the output results are obtained separately.

In some embodiments, the same data is input into the model three or more times and output results are obtained separately.

Step S140: and acquiring similarity representation among output results.

The output result is a vector representation of the text, and the similarity representation of the two output results can have various representation methods.

In some embodiments, Cosine is used for similarity representation.

In some embodiments, the representation of similarity is performed using Euclidean distance.

In some embodiments, the similarity is expressed in terms of Jaccard similarity, Pearson correlation coefficient, Mahalanobis distance, and the like.

It is noted that if the same data is input to the model more than twice in step S140, a similarity representation may be calculated between any two of the plurality of output results.

Step S150: and optimizing parameters in the text matching model based on the loss function corresponding to the similarity representation.

Each representation of similarity may correspond to a loss function, and model parameters may be optimized by minimizing the loss function.

In some embodiments, the iteration and optimization of the model is performed using the BP back propagation algorithm.

In the text matching model training method provided by this embodiment, the dropout layer with the determined activation ratio is set in the BERT model, the same data is input twice, and the inverse parameter optimization is performed based on the difference between the two output results.

In some embodiments, when the model training phase is finished and the output of the model is relatively stable, the training data is input to the trained text matching model again, and the output result is obtained as the standard text data. And in use of the subsequent model, comparing the output of the test data with the standard text data, and selecting the text corresponding to the closest standard text data for output.

The following description of the text matching model training method is made in a specific embodiment:

suppose the training data is { x }_i}，i＝1，2...，N，x_iRepresenting an independent sentence. A pre-training model BERT is used, wherein the BERT model has a dropout structure, and in the training process, units of the dropout are randomly activated in a certain proportion. In this embodiment, assuming that the dropout layer has 12 units and the activation ratio is 50%, only 50% of the probability of each unit participates in the operation in the training process, that is, only 6 units participate in the calculation. All training data are passed through the BERT model twice, and due to the randomness of the dropout layer, two sentences of the same sentence can be obtainedThe sentences are expressed as s_i，1，s_i，21, 2, N, wherein s_i，1，s_i，2Are respectively the sentence x_iThe final output results of the inputs to BERT are twice.

In this embodiment, cosine is used to represent similarity, and the purpose of model training is to make the vector representations generated twice in the same sentence close enough, so that for sentence x_iThe corresponding loss function is defined as

That is, for the ith sentence, the vector representations of the two BERT outputs are close enough together, but the ith sentence is far enough away from the vector representations of the BERT outputs of the other sentences. Thus, the loss function of the model is defined as

And optimizing the model parameters by a BP back propagation algorithm to enable the loss function to approach the minimum value as much as possible.

Fig. 3 is a schematic flow chart of a text matching model training method according to another embodiment of the present application, where the text matching model training method according to this embodiment at least includes:

step S210: and acquiring a text to be matched.

In some embodiments, the text to be matched is a user-entered sentence or even a keyword, and the user-entered sentence may be in multiple languages, such as chinese or english.

In some embodiments, the speech may also be converted to text by recognizing the user's speech input before matching.

Step S220: and acquiring a text matching model.

In some embodiments, the text matching model is obtained by training through a text matching model training method provided in the embodiments of the present application, but those skilled in the art should understand that, based on the model training method provided in the embodiments of the present application, a text matching model obtained by adding other parameter fine-tuning methods is also within the scope of the embodiments of the present application.

Step S230: and inputting the text to be matched to the text matching model.

In some embodiments, the input method of the text to be matched is indistinguishable from the input method of the training data. For a specific application scenario, such as a question and answer system, a question query is input as a text to be matched.

Step S240: and calculating the similarity between the text to be matched and the standard text data by using Cosine.

In some embodiments, the trained BERT model can output a vector representation of the text to be matched, and the distance between the vector of the text to be matched and preset standard text data is represented by using Cosine.

Step S250: and sequencing the similarity, and selecting the standard text data with the highest similarity as the text to serve as the matched text.

In some embodiments, since the standard text data is a set and includes model output vectors corresponding to all the preset texts, it is necessary to calculate the similarity between the text to be matched and each standard text data one by one, and select the standard text data with the highest similarity as the text to be matched.

In some embodiments, the target text corresponding to the matching text is output, and as shown in fig. 5, the target text and the matching text have a mapping relationship.

The text matching method provided by this embodiment calculates the similarity between the text to be matched and the standard text based on the text matching model obtained by training in the first aspect, and can accurately and efficiently select the correct matching text to display the target text.

Fig. 4 is a schematic flow chart of a text matching model training method according to another embodiment of the present application, where the text matching model training method according to this embodiment at least includes:

step S310: and acquiring a text to be matched.

Step S320: and acquiring a text matching model.

Step S330: and inputting the text to be matched to the text matching model.

In some embodiments, the input method of the text to be matched is indistinguishable from the input method of the training data. For a specific application scenario, such as a retrieval system, a retrieval keyword or phrase is input as a text to be matched.

Step S340: and calculating the similarity between the text to be matched and the standard text data by using the cosine of the Euclidean distance.

In some embodiments, the trained BERT model can output a vector representation of the text to be matched, and the euclidean distance is used to represent the distance between the vector of the text to be matched and the preset standard text data.

Step S250: and selecting the standard text data with the similarity larger than a preset threshold value as a text corresponding to the standard text data as a matched text.

In some embodiments, since the standard text data is a set and includes model output vectors corresponding to all the preset texts, it is necessary to calculate the similarity between the text to be matched and each standard text data one by one, and select the standard text data with the similarity greater than the preset threshold as the text to be matched.

It should be noted that the preset rule for similarity determination is not limited to the selection of the maximum similarity or the selection greater than the threshold, and may be set according to the rule learned in the training process or the preference of the user.

In a second aspect, an embodiment of the present application provides a text matching apparatus, including at least: the text matching method comprises a text obtaining module used for obtaining a text to be matched and a text matching module used for performing text matching according to the text matching model training method of the first aspect.

The text acquisition module can not only acquire text information input by a user, but also acquire voice information of the user and even image information.

The text matching module is used for executing the trained text matching method, and the text matching model is stored in the text matching module.

In a third aspect, an embodiment of the present application provides an electronic device, where the device includes a processor and a memory: a memory for storing a program: a processor for executing a program to perform the text matching model training method of any of the above embodiments.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, including:

the computer-readable storage medium stores a program that is executed by a processor to perform the text matching model training method according to any one of the above embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual combination or direct fusion or communication connection may be an indirect fusion or communication connection through some interfaces, devices or units, and may be in an electric, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art. The mobile terminal equipment can be a mobile phone, a tablet computer, a notebook computer, a palm computer, vehicle-mounted terminal equipment, wearable equipment, a super mobile personal computer, a netbook, a personal digital assistant, CPE, UFI (wireless hotspot equipment) and the like; the embodiments of the present invention are not particularly limited. The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), and a big data and artificial intelligence platform.

The above examples are only used to illustrate the technical solutions of the present application, but not to limit the same: although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced: and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A text matching model training method comprises the following steps:

obtaining a text matching model, wherein the text matching model comprises a pre-trained BERT model, the pre-trained BERT model comprises a dropout layer, and the activation proportion of the dropout layer is less than 1;

acquiring training data, wherein data samples in the training data do not comprise tags;

inputting the training data into the text matching model for multiple times, and respectively obtaining multiple output results of the text matching model:

obtaining similarity representation among the output results;

and optimizing parameters in the text matching model through a back propagation algorithm based on the loss function corresponding to the similarity representation to obtain the trained text matching model.

2. The method of claim 1, wherein obtaining the representation of the similarity between the plurality of output results comprises:

obtaining a plurality of output results { s } of the training data input to the text matching model for a plurality of times_i，1，s_i，2，...s_i，kWherein i denotes a label of each training data, i ═ 1, 2., N, k denote the number of times this training data is input to the text matching model;

calculating the distance Cosin(s) between any two output results by using the Cosine formula of Cosine_i，m，s_i，n) Wherein m, n represent the labels of any two output results;

the distance Cosin(s)_i，m，s_i，n) Is a representation of the similarity between the two output results.

3. The method of claim 2, wherein the loss function is a Cosine function corresponding to a Cosine-defined similarity representation

4. The method according to claim 1 or 2, wherein after the parameters in the text matching model are optimized by a back propagation algorithm based on the loss function corresponding to the similarity representation to obtain the text matching model, the method further comprises:

and adding the training data into the trained text matching model, and taking an output result as standard text data.

5. The method of training a text matching model according to claim 4, further comprising:

acquiring a text to be matched;

inputting the text to be matched to the text matching model;

calculating the similarity between the text to be matched and the standard text data;

and selecting a matching text of the text to be matched according to a preset rule judged by the similarity.

6. The text matching model training method according to claim 5, wherein the selecting the matching text of the text to be matched according to the preset rule judged by the similarity comprises:

sorting the similarity, and selecting the standard text data with the highest similarity as a text as a matching text;

or selecting the standard text data with the similarity larger than the preset threshold value as the text corresponding to the standard text data as the matched text.

7. The training method of the text matching model according to claim 5 or 6, wherein after the matching text is obtained, a target text corresponding to the matching text is output; wherein the matching text comprises a question text and the target text comprises a reply text corresponding to the question text.

8. A text matching apparatus, comprising:

the text acquisition module is used for acquiring a text to be matched;

the text matching module is used for performing text matching according to the text matching model training method of any one of claims 5-7.

9. An electronic device comprising a processor, a memory, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the text matching model training method of any of claims 1-7.

10. A computer-readable storage medium having stored thereon program instructions executable by a processor to perform the text matching model training method according to any one of claims 1-7.