WO2022188584A1

WO2022188584A1 - Similar sentence generation method and apparatus based on pre-trained language model

Info

Publication number: WO2022188584A1
Application number: PCT/CN2022/075657
Authority: WO
Inventors: 高臻; 闫慧丽; 顾松庠
Original assignee: 京东科技控股股份有限公司
Priority date: 2021-03-12
Filing date: 2022-02-09
Publication date: 2022-09-15
Also published as: CN113807074A

Abstract

The present application provides a similar sentence generation method and apparatus based on a pre-trained language model. The method comprises: acquiring a sentence to be processed; inputting, into a trained generative model, the sentence to be processed, so as to acquire a plurality of candidate similar sentences; generating a plurality of discriminative sentence pairs according to the sentence to be processed and the plurality of candidate similar sentences; and inputting the plurality of discriminative sentence pairs into a trained discriminative model, so as to acquire a discrimination result, and acquiring a target similar sentence from among the plurality of candidate similar sentences according to the discrimination result.

Description

Similar sentence generation method and device based on pre-trained language model

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is based on the Chinese patent application with the application number of 202110270871.5 and the filing date of March 12, 2021, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is incorporated herein by reference.

technical field

The present application relates to the technical field of artificial intelligence, and in particular, to a method and device for generating similar sentences based on a pre-trained language model.

Background technique

Usually, the customer service robot will add FAQs (Frequently Asked Questions, frequently asked questions) from time to time, and accordingly, it is necessary to expand the diversity of similar questions.

In the related art, the template is manually formulated, and only the corresponding entities and keywords need to be filled in to complete the problem expansion, which requires a lot of manpower and time to edit the template. The sentence pattern is fixed and lacks the diversity of expression.

SUMMARY OF THE INVENTION

The present application aims to solve one of the technical problems in the related art at least to a certain extent.

The present application proposes a method and device for generating similar sentences based on a pre-trained language model, so as to automatically generate similar questions with diverse forms and consistent semantics, and improve the quality and efficiency of similar sentence generation.

The embodiment of the first aspect of the present application proposes a method for generating similar sentences based on a pre-trained language model, including:

Get the pending statement;

Inputting the to-be-processed statement into a trained generative model to obtain multiple candidate similar statements;

generating a plurality of discriminative sentence pairs according to the to-be-processed sentence and the plurality of candidate similar sentences;

The plurality of discriminative sentence pairs are input into a trained discriminant model, a discriminant result is obtained, and a target similar sentence is obtained from the plurality of candidate similar sentences according to the discriminant result.

The method for generating similar sentences based on a pre-trained language model according to the embodiment of the present application obtains a statement to be processed; inputs the sentence to be processed into a trained generation model to obtain a plurality of candidate similar sentences; , generate multiple discriminative sentence pairs; input the multiple discriminative sentence pairs into the trained discriminant model, obtain the discriminant result, and obtain the target similar sentence from the multiple candidate similar sentences according to the discriminant result. As a result, similar questions with diverse forms and consistent semantics are automatically generated, and the quality and efficiency of similar sentences are improved.

The embodiment of the second aspect of the present application proposes an apparatus for generating similar sentences based on a pre-trained language model, including:

The first obtaining module is used to obtain the to-be-processed statement;

a first processing module, configured to input the to-be-processed statement into a trained generative model to obtain a plurality of candidate similar statements;

a first generation module, configured to generate a plurality of discriminative sentence pairs according to the to-be-processed sentence and the plurality of candidate similar sentences;

The second processing module is used for inputting the plurality of discriminative sentence pairs into a discriminant model that has been trained to obtain a discriminant result;

The second obtaining module is configured to obtain a target similar sentence from the plurality of candidate similar sentences according to the discrimination result.

The apparatus for generating similar sentences based on a pre-trained language model according to the embodiment of the present application obtains the sentences to be processed; inputs the sentences to be processed into the trained generation model to obtain a plurality of candidate similar sentences; , generate multiple discriminative sentence pairs; input the multiple discriminative sentence pairs into the trained discriminant model, obtain the discriminant result, and obtain the target similar sentence from the multiple candidate similar sentences according to the discriminant result. As a result, similar questions with diverse forms and consistent semantics are automatically generated, and the quality and efficiency of similar sentences are improved.

An embodiment of the third aspect of the present application proposes an electronic device, including: a memory, a processor, and a computer program stored in the memory and running on the processor. When the processor executes the program, the computer program as described in the present application A method for generating similar sentences based on a pre-trained language model proposed by the embodiments of the first aspect.

The embodiment of the fourth aspect of the present application provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, implements the pre-training language-based language proposed by the embodiment of the first aspect of the present application Similar sentence generation method for the model.

The embodiment of the fifth aspect of the present application provides a computer program product. When the instructions in the computer program product are executed by the processor, the similar sentence generation based on the pre-trained language model proposed in the embodiment of the first aspect of the present application is executed. method.

Additional aspects and advantages of the present application will be set forth, in part, in the following description, and in part will be apparent from the following description, or learned by practice of the present application.

Description of drawings

The above and/or additional aspects and advantages of the present application will become apparent and readily understood from the following description of embodiments taken in conjunction with the accompanying drawings, wherein:

1 is a schematic flowchart of a method for generating similar sentences based on a pre-trained language model provided in Embodiment 1 of the present application;

2 is a schematic flowchart of a method for generating similar sentences based on a pre-trained language model provided in Embodiment 2 of the present application;

3 is a schematic flowchart of a method for generating similar sentences based on a pre-trained language model provided in Embodiment 3 of the present application;

FIG. 4 is a schematic flow chart of generating a similar sentence in an embodiment of the present application;

5 is a schematic structural diagram of an apparatus for generating similar sentences based on a pre-trained language model provided in Embodiment 4 of the present application;

Figure 6 shows a block diagram of an exemplary electronic device or server suitable for use in implementing embodiments of the present application.

Detailed ways

The following describes in detail the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, and are intended to be used to explain the present application, but should not be construed as a limitation to the present application.

In view of the need to invest a lot of manpower and time to edit the template, the corresponding template needs to be customized every time a new question type is added, the generated sentence pattern is fixed, and the variety of expressions is lacking, the embodiment of the present application proposes a pre-training based method A method for generating similar sentences for language models, by obtaining the sentences to be processed; inputting the sentences to be processed into a trained generation model to obtain multiple candidate similar sentences; and generating multiple discriminative sentence pairs according to the sentences to be processed and the multiple candidate similar sentences; A plurality of discriminative sentence pairs are input into the trained discriminant model, a discriminant result is obtained, and a target similar sentence is obtained from a plurality of candidate similar sentences according to the discriminant result. As a result, similar questions with diverse forms and consistent semantics are automatically generated, and the quality and efficiency of similar sentences are improved.

The method and apparatus for generating similar sentences based on a pretrained language model according to the embodiments of the present application will be described below with reference to the accompanying drawings.

FIG. 1 is a schematic flowchart of a method for generating similar sentences based on a pre-trained language model according to Embodiment 1 of the present application.

The dialog recognition method of the embodiment of the present application can be applied to an electronic device. The electronic device can be any device with computing capabilities, such as a PC (Personal Computer), a mobile terminal, etc., and the mobile terminal can be, for example, a mobile phone, a tablet computer, a personal digital assistant, a wearable device, etc. An operating system, touch screen and/or display hardware device.

As shown in FIG. 1 , the method for generating similar sentences based on a pretrained language model may include the following steps 101 to 104 .

Step 101 , acquiring the to-be-processed statement.

In the embodiment of the present application, the to-be-processed statement can be understood as needing to generate a plurality of similar statements corresponding to it, which can be selected and acquired according to the actual application scenario.

For example, the to-be-processed sentences can be "how is the development of product A", "can you introduce the product A is not" and so on.

Step 102: Input the sentence to be processed into the trained generation model, and obtain a plurality of candidate similar sentences.

In the embodiment of the present application, the generation model is a pre-trained language model that has been trained. For details of the specific training process, please refer to the subsequent description, which will not be described in detail here.

In the embodiment of the present application, the to-be-processed sentence is encoded to obtain an encoding vector; an autoregressive method is used to generate candidate similar sentences word by word; wherein, the probability distribution of each candidate similar word is obtained, and the top N candidates with the highest probability are obtained A candidate similar word is randomly sampled from the similar words as a target candidate similar word, where N is a positive integer; candidate similar sentences are generated according to the target candidate similar words of each word to be processed.

For example, input the sentence X to be processed, encode X, and then use a random sampling strategy to generate candidate similar sentences. Specifically, the process of generating candidate similar sentences is generated word by word from left to right. The process is randomly selected from the N with the highest probability, such as 5 words. Therefore, the same to-be-processed sentence is input to the generation model and the output of the candidate similar sentences is different each time. Repeat this process many times to get multiple candidate similar sentences.

Therefore, the random sampling method is adopted, that is, each word generated is based on the standard question and the generated content, and is randomly selected from multiple candidate similar words with the highest probability in the current conditional distribution to obtain multiple candidate similar sentences. Increase the variety of generated expressions.

Step 103: Generate a plurality of discriminative sentence pairs according to the sentence to be processed and a plurality of candidate similar sentences.

Step 104 , inputting a plurality of discriminative sentence pairs into the trained discriminant model, obtaining a discriminant result, and obtaining a target similar sentence from a plurality of candidate similar sentences according to the discriminant result.

In the embodiment of the present application, the to-be-processed statement and each candidate similar statement respectively form a discriminative statement pair, such as the to-be-processed statement X, the multiple candidate similar statements are Y1-Y5, and the formed discriminative statement pairs are (X Y1), ( X Y2) to (X Y5), get 5 discriminative sentence pairs.

In the embodiment of the present application, the discriminant model is a trained BERT module based on machine translation bidirectional coding. For details of the training process, please refer to the subsequent description, which will not be described in detail here.

In the embodiment of the present application, each discriminant sentence pair is encoded, a plurality of discriminant vectors are obtained, each discriminant vector is predicted, and the similarity between the to-be-processed sentence and each candidate similar sentence is obtained.

Specifically, the input of the discriminant model is a sentence pair composed of (to-be-processed sentence, candidate similar sentence), the discriminant model encodes the sentence pair, and classifies and predicts whether the sentence pair is a similar sentence, and in the case of a similar sentence, obtain the corresponding Candidate similar sentences are target similar sentences.

In a possible implementation of the embodiment of the present application, a pre-trained language model UniLM (UNIfied pre-trained Language Model, unified pre-trained language model) is used as a generation model to generate high-quality text, using BERT (Bidirectional Encoder Representation from Transformers (two-way encoding representation based on machine translation) is used as a discriminant model to filter unqualified generated texts. The training process is described in detail with reference to Figures 2 and 3.

FIG. 2 is a schematic flowchart of a method for generating similar sentences based on a pre-trained language model provided by Embodiment 2 of the present application.

As shown in FIG. 2 , the method for generating similar sentences based on a pretrained language model may include the following steps 201 to 204 .

Step 201, obtaining a dataset of similar problems in a general domain.

Step 202: Input the data set of similar problems in the general domain into the training of the pre-trained language model, obtain the first training similar sentence, calculate the first error between the first training sentence and the first standard sentence through the loss function, and adjust the value of the pre-trained language model. parameter until the first error is smaller than the preset threshold, and generate a candidate generative model.

In the embodiment of the present application, the encoder (UniLM) is used to transfer the similar question generation task on the general domain similar question dataset, and there are many ways to obtain the general domain similar question dataset, such as crawling and collecting related posts, question answering For similar problems recommended by websites and other similar problems, use maximum likelihood estimation to perform transfer learning of similar problem generation tasks until the pre-trained language model converges. Since the training data is crawled from the Internet, manual annotation is not required, which improves the training efficiency.

In the embodiment of this application, the UniLM model is open sourced by Microsoft, based on the Transformer (deep self-attention transformation network) architecture, and a pre-trained language model that integrates natural language understanding and generation capabilities. The multi-task learning method combined with regression, the two tasks are: masked language model (MLM) and sequence to sequence (sequence to sequence, seq2seq), which can do both downstream tasks of natural language understanding type and natural language understanding. For downstream tasks of language generation type, that is, encoding and decoding training can be performed after random masking of each word in the training sentence to improve the quality of subsequent generation.

In the embodiment of this application, UniLM is a pre-training model. In the original UniLM pre-training task, there is no similar task of generating similar problems. This application uses the parameters of the UniLM model as initialization parameters to train similar problems on it. The generation task is transfer learning. The goal of training is to maximize the likelihood value of the generated target sequence. When the value of the objective function no longer changes, or the change is less than a certain threshold, the pre-trained language model is considered to have converged, and the training can be stopped. , generate candidate generative models.

Step 203, obtaining a dataset of similar questions in the target domain.

Step 204, input the target domain similar problem data set into the candidate generation model for training, obtain the second training similar sentence, calculate the second error between the second training sentence and the second standard sentence through the loss function, and adjust the parameters of the candidate generation model Until the second error is smaller than the preset threshold, the trained generative model is generated.

In the embodiment of the present application, the pre-trained language model for training is more suitable for the target domain, wherein the target domain can be selected and set according to the application scenario, such as the customer service business domain, etc., the encoder (UniLM) can be used in the customer service business. Similar question generation task fine-tuning on the FAQ similar question database.

In the embodiment of the present application, a relatively small dataset of similar problems in the target domain can be used to input the candidate generation model for maximum likelihood estimation, and the negative logarithm of the likelihood value of the obtained target sequence is smaller than the preset likelihood threshold, and the trained model is generated. Generate the model.

Therefore, the present application first uses a large amount of easily obtained supervised data to perform migration of similar tasks, and then uses a small amount of existing business data and a small amount of labeled data obtained when filtering available data to perform domain migration. To achieve the lowest cost of labeling, achieve ideal business indicators, and improve processing efficiency.

FIG. 3 is a schematic flowchart of a method for generating similar sentences based on a pre-trained language model provided by Embodiment 2 of the present application.

As shown in FIG. 4 , the method for generating similar sentences based on a pretrained language model may include the following steps 301 to 304 .

Step 301, obtaining a dataset of similar sentence pairs.

Step 302 , input the similar sentence pair data set into the BERT-based bidirectional encoding representation module for training, and generate a candidate discriminant model.

In the embodiment of the present application, the discriminant model (BERT) is used to perform similar problem discrimination task transfer on the financial semantic similarity data set, and the discriminator (BERT) is used to perform similar problem discrimination on the FAQ and similar questions that are being used by the customer service business. Mission fine-tuning.

Specifically, the discriminant model BERT is constructed, and the publicly available similar question corpus is used for similar question discrimination training. The BERT model is open sourced by Google and is a pre-trained language model based on the Transformer architecture. It is mainly used in natural language understanding tasks. BERT pre-training The process adopts self-encoding multi-task learning. The two tasks are: masked language model (MLM) and sequence-to-sequence (next sentence prediction, NSP). BERT can be used as the initialization parameter of the downstream task model. The effect of this application can be achieved by adding a simple output layer structure for specific tasks and fine-tuning on a small amount of labeled data.

Among them, the transfer learning of similar problem discrimination task is carried out on the publicly available data set such as financial semantic similarity data set that can be easily obtained, which does not require manual labeling of data, and improves the training efficiency.

Specifically, using the FAQ similar question data accumulated by the customer service business, the maximum likelihood estimation method is used to transfer the domain, so that the discriminant model learns the data distribution of the customer service business, and the amount of data used is much smaller than the training data scale in generating the candidate discriminant model. Further improve training efficiency.

Step 303 , acquiring positive samples and negative samples of similar sentence pairs in the target domain.

Step 304 , input the candidate discriminant models of the similar sentences to positive samples and negative samples, and generate a trained discriminant model.

Specifically, the similar problem discrimination task is fine-tuned on the FAQ similar questions accumulated by the customer service business. In addition to the FAQ similar questions accumulated by the customer service, the training discriminant model also needs to mark the unavailable data when screening the available similar questions as a counter-example for domain transfer. This enables the discriminant model to learn the operator's discriminant criteria, and the amount of data required is relatively small, which further improves the training efficiency.

Based on the above embodiment, the present application uses the pre-trained language model UniLM as the generation model to generate high-quality text, and uses BERT as the discriminant model to filter unqualified generated text, such as shown in FIG. The sentences are filtered by the discriminant model to obtain the target similar sentences that meet the standard, thereby automatically generating similar problems with diverse forms and consistent semantics, and improving the quality and efficiency of similar sentences generation.

Corresponding to the method for generating similar sentences based on the pre-training language model provided by the above-mentioned embodiments of FIGS. 1 to 4 , the present application also provides a similar sentence generating device based on the pre-training language model. The similar sentence generation device for training the language model corresponds to the similar sentence generation method based on the pre-trained language model provided in the embodiments of FIG. 1 to FIG. 4, so the implementation of the similar sentence generation method based on the pre-trained language model also applies The apparatus for generating similar sentences based on the pre-trained language model provided by the embodiments of the present application will not be described in detail in the embodiments of the present application.

FIG. 5 is a schematic structural diagram of an apparatus for generating similar sentences based on a pre-trained language model according to Embodiment 4 of the present application.

As shown in FIG. 5 , the similar sentence generation apparatus 500 based on the pre-trained language model is applied to an electronic device, and includes: a first acquisition module 501, a first processing module 502, a first generation module 503, a second processing module 504, and a first 2. Obtaining module 505.

The first obtaining module 501 is used to obtain the to-be-processed statement.

The first processing module 502 is configured to input the sentence to be processed into the trained generation model, and obtain a plurality of candidate similar sentences.

The first generating module 503 is configured to generate a plurality of discriminative sentence pairs according to the sentence to be processed and the plurality of candidate similar sentences.

The second processing module 504 is configured to input a plurality of discriminative sentence pairs into the trained discriminant model to obtain discriminant results.

The second obtaining module 505 is configured to obtain a target similar sentence from the plurality of candidate similar sentences according to the discrimination result.

Further, in a possible implementation manner of the embodiment of the present application, the first processing module 502 is specifically configured to: encode the statement to be processed, and obtain an encoding vector; perform decoding processing on the encoding vector, and use an autoregressive method to generate candidate candidates Similar sentences; wherein, the probability distribution of each candidate similar word is obtained, and a candidate similar word is randomly sampled from the top N candidate similar words with the highest probability as the target candidate similar word, where N is a positive integer, according to the target candidate Similar words generate candidate similar sentences.

Further, in a possible implementation manner of the embodiment of the present application, the second processing module 504 is specifically configured to: encode each discriminant sentence pair to obtain multiple discriminant vectors; predict each discriminant vector, Get the similarity between the sentence to be processed and each candidate similar sentence.

Further, in a possible implementation manner of the embodiment of the present application, the apparatus 500 for generating similar sentences based on a pre-trained language model may further include:

The third acquisition module is used to acquire a data set of similar problems in the general field; the second generation module is used to input the data set of similar problems in the general field into the pre-training language model for training, and obtain the first training similar sentence, which is calculated by the loss function For the first error between the first training sentence and the first standard sentence, adjust the parameters of the pre-trained language model until the first error is less than a preset threshold, and generate a candidate generation model; the fourth acquisition module is used to obtain similarities in the target domain The problem data set; the third generation module is used to input the similar problem data set in the target domain into the candidate generation model for training, obtain the second training similar sentence, and calculate the second training sentence and the second standard sentence through the loss function. error, adjust the parameters of the candidate generative model until the second error is smaller than the preset threshold, and generate a trained generative model.

The fifth acquisition module is used to acquire the similar sentence pair data set; the fourth generation module is used to input the similar sentence pair data set into the BERT-based bidirectional encoding representation module for training to generate a candidate discriminant model; the sixth acquisition module is used to Obtaining the positive samples and negative samples of similar sentences in the target field; the fifth generation module is used for inputting the positive samples and negative samples of similar sentences into the candidate discrimination model for training, and obtaining the similarity of the target sequence is less than the preset similarity threshold, Generate a trained discriminative model.

In order to realize the above-mentioned embodiments, the present application also proposes an electronic device, which includes a memory, a processor, and a computer program stored in the memory and running on the processor. When the processor executes the program, the above-mentioned FIG. The method for generating similar sentences based on a pre-trained language model proposed by any of the embodiments in FIG. 4 .

In order to realize the above-mentioned embodiments, the present application also proposes a non-transitory computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the pre-trained language model based on the pre-training proposed in any of the foregoing embodiments of the present application is implemented. The similar sentence generation method of .

In order to realize the above-mentioned embodiments, the present application also proposes a computer program product. When the instructions in the computer program product are executed by the processor, the similar sentence generation method based on the pre-trained language model proposed in any of the foregoing embodiments of the present application is executed. .

Figure 6 shows a block diagram of an exemplary electronic device or server suitable for use in implementing embodiments of the present application. The electronic device or server 12 shown in FIG. 6 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.

As shown in Figure 6, the electronic device or server 12 takes the form of a general purpose computing device. Components of the electronic device or server 12 may include, but are not limited to, one or more processors or processing units 16 , system memory 28 , and a bus 18 connecting various system components including system memory 28 and processing unit 16 .

Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures. For example, these architectures include, but are not limited to, Industry Standard Architecture (hereinafter referred to as: ISA) bus, Micro Channel Architecture (hereinafter referred to as: MAC) bus, enhanced ISA bus, video electronics Standards Association (Video Electronics Standards Association; hereinafter referred to as: VESA) local bus and Peripheral Component Interconnection (Peripheral Component Interconnection; hereinafter referred to as: PCI) bus.

The electronic device or server 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by the electronic device or server 12, including both volatile and non-volatile media, removable and non-removable media.

The memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (Random Access Memory; hereinafter: RAM) 30 and/or cache memory 32 . The electronic device or server 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media. For example only, storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 6, commonly referred to as a "hard drive"). Although not shown in FIG. 6, a magnetic disk drive for reading and writing to removable non-volatile magnetic disks (eg "floppy disks") and removable non-volatile optical disks (eg compact disk read only memory) may be provided Disc Read Only Memory; hereinafter referred to as: CD-ROM), Digital Video Disc Read Only Memory (hereinafter referred to as: DVD-ROM) or other optical media) read and write optical drives. In these cases, each drive may be connected to bus 18 through one or more data media interfaces. Memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present application.

A program/utility 40 having a set (at least one) of program modules 42, which may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data , each or some combination of these examples may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.

The electronic device or server 12 may also communicate with one or more external devices 14 (eg, keyboard, pointing device, display 24, etc.), and may also communicate with one or more devices that enable a user to interact with the electronic device or server 12, and/or with any device (eg, network card, modem, etc.) that enables the electronic device or server 12 to communicate with one or more other computing devices. Such communication may take place through input/output (I/O) interface 22 . In addition, the electronic device or server 12 can also communicate with one or more networks (such as a local area network (Local Area Network; hereinafter referred to as: LAN), wide area network (Wide Area Network; hereinafter referred to as: WAN) and/or a public network through the network adapter 20, such as Internet) communication. As shown, the network adapter 20 communicates with the electronic device or other modules of the server 12 via the bus 18 . It should be understood that, although not shown, other hardware and/or software modules may be used in conjunction with the electronic device or server 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, Tape drives and data backup storage systems, etc.

The processing unit 16 executes various functional applications and data processing by running the programs stored in the system memory 28 , for example, implements the methods mentioned in the foregoing embodiments.

In the description of this specification, description with reference to the terms "one embodiment," "some embodiments," "example," "specific example," or "some examples", etc., mean specific features described in connection with the embodiment or example , structure, material or feature is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine the different embodiments or examples described in this specification, as well as the features of the different embodiments or examples, without conflicting each other.

In addition, the terms "first" and "second" are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature delimited with "first", "second" may expressly or implicitly include at least one of that feature. In the description of the present application, "plurality" means at least two, such as two, three, etc., unless expressly and specifically defined otherwise.

Any process or method description in the flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or more executable instructions for implementing custom logical functions or steps of the process , and the scope of the preferred embodiments of the present application includes alternative implementations in which the functions may be performed out of the order shown or discussed, including performing the functions substantially concurrently or in the reverse order depending upon the functions involved, which should It is understood by those skilled in the art to which the embodiments of the present application belong.

The logic and/or steps represented in flowcharts or otherwise described herein, for example, may be considered an ordered listing of executable instructions for implementing the logical functions, may be embodied in any computer-readable medium, For use with, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a system including a processor, or other system that can fetch instructions from and execute instructions from an instruction execution system, apparatus, or apparatus) or equipment. For the purposes of this specification, a "computer-readable medium" can be any device that can contain, store, communicate, propagate, or transport the program for use by or in connection with an instruction execution system, apparatus, or apparatus. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections with one or more wiring (electronic devices), portable computer disk cartridges (magnetic devices), random access memory (RAM), Read Only Memory (ROM), Erasable Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, followed by editing, interpretation, or other suitable medium as necessary process to obtain the program electronically and then store it in computer memory.

It should be understood that various parts of this application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware as in another embodiment, it can be implemented by any one of the following techniques known in the art, or a combination thereof: discrete with logic gates for implementing logic functions on data signals Logic circuits, application specific integrated circuits with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.

Those skilled in the art can understand that all or part of the steps carried by the methods of the above embodiments can be completed by instructing the relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and the program can be stored in a computer-readable storage medium. When executed, one or a combination of the steps of the method embodiment is included.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist physically alone, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. If the integrated modules are implemented in the form of software functional modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, and the like. Although the embodiments of the present application have been shown and described above, it should be understood that the above embodiments are exemplary and should not be construed as limitations to the present application. Embodiments are subject to variations, modifications, substitutions and variations.

Claims

A method for generating similar sentences based on a pre-trained language model, comprising:

Get the pending statement;

Inputting the to-be-processed statement into a trained generative model to obtain multiple candidate similar statements;

generating a plurality of discriminative sentence pairs according to the to-be-processed sentence and the plurality of candidate similar sentences;

The plurality of discriminative sentence pairs are input into a trained discriminant model, a discriminant result is obtained, and a target similar sentence is obtained from the plurality of candidate similar sentences according to the discriminant result.
The method of claim 1, wherein inputting the to-be-processed sentence into a trained generative model to obtain a plurality of candidate similar sentences, comprising:

Encoding the to-be-processed statement to obtain an encoding vector;

The encoding vector is decoded, and a candidate similar sentence is generated by an autoregressive method; wherein, the probability distribution of each candidate similar word is obtained, and a candidate similar word is randomly sampled from the top N candidate similar words with the highest probability as Target candidate similar words, where N is a positive integer, and the candidate similar sentences are generated according to the target candidate similar words.
The method of claim 1, wherein inputting the plurality of discriminative sentence pairs into a trained discriminant model to obtain a discriminant result comprises:

encoding each pair of the discriminant sentences to obtain a plurality of discriminant vectors;

Predicting each of the discriminant vectors to obtain the similarity between the to-be-processed sentence and each of the candidate similar sentences.
The method of claim 1, wherein before said inputting said to-be-processed statement into the trained generative model, further comprising:

Obtain a dataset of similar problems in general domains;

Inputting the data set of similar questions in the general domain into a pre-training language model for training, obtaining a first training similar sentence, calculating the first error between the first training sentence and the first standard sentence through a loss function, and adjusting the pre-training sentence. Train the parameters of the language model until the first error is less than a preset threshold, and generate a candidate generative model;

Obtain a dataset of similar problems in the target domain;

Inputting the target domain similar problem data set into the candidate generation model for training, obtaining a second training similar sentence, calculating the second error between the second training sentence and the second standard sentence through a loss function, and adjusting the candidate generation model The parameters of the model are generated until the second error is less than a preset threshold, and the trained generative model is generated.
The method of claim 1, wherein before inputting the plurality of discriminative sentence pairs into the trained discriminant model, further comprising:

Get a dataset of similar sentence pairs;

Inputting the data set of the similar sentences into a BERT-based bidirectional encoding representation module for training to generate a candidate discriminant model;

Obtain the positive and negative samples of similar sentences in the target domain;

The positive samples and negative samples of the similar sentences are input into the candidate discriminant model for training, and the trained discriminant model is generated.
A similar sentence generation device based on a pre-trained language model, comprising:

The first obtaining module is used to obtain the to-be-processed statement;

a first processing module, configured to input the to-be-processed statement into a trained generative model to obtain a plurality of candidate similar statements;

a first generation module, configured to generate a plurality of discriminative sentence pairs according to the to-be-processed sentence and the plurality of candidate similar sentences;

The second processing module is used for inputting the plurality of discriminative sentence pairs into a discriminant model that has been trained to obtain a discriminant result;

The second obtaining module is configured to obtain a target similar sentence from the plurality of candidate similar sentences according to the discrimination result.
The apparatus of claim 6, wherein the first processing module is further configured to:

Encoding the to-be-processed statement to obtain an encoding vector;

The encoding vector is decoded, and a candidate similar sentence is generated by an autoregressive method; wherein, the probability distribution of each candidate similar word is obtained, and a candidate similar word is randomly sampled from the top N candidate similar words with the highest probability as Target candidate similar words, where N is a positive integer, and the candidate similar sentences are generated according to the target candidate similar words.
The apparatus of claim 6, wherein the second processing module is further configured to:

encoding each pair of the discriminant sentences to obtain a plurality of discriminant vectors;

Predicting each of the discriminant vectors to obtain the similarity between the to-be-processed sentence and each of the candidate similar sentences.
The apparatus of claim 6, further comprising:

The third acquisition module is used to acquire a dataset of similar problems in general fields;

The second generation module is configured to input the data set of similar problems in the general domain into the pre-trained language model for training, obtain the first training similar sentence, and calculate the first training sentence and the first standard sentence through the loss function. an error, adjusting the parameters of the pre-trained language model until the first error is less than a preset threshold, and generating a candidate generative model;

The fourth acquisition module is used to acquire a dataset of similar problems in the target domain;

The third generation module is used for inputting the target domain similar problem data set into the candidate generation model for training, obtaining the second training similar sentence, and calculating the second training sentence between the second training sentence and the second standard sentence through the loss function. error, adjust the parameters of the candidate generative model until the second error is less than a preset threshold, and generate the trained generative model.
The method of claim 6, further comprising:

a fifth acquisition module, used to acquire a dataset of similar sentence pairs;

The fourth generation module is used to input the data set of the similar sentences into the BERT-based bidirectional encoding representation module for training, and generate a candidate discriminant model;

The sixth acquisition module is used to acquire positive samples and negative samples of similar sentence pairs in the target field;

The fifth generation module is configured to input the similar sentences into the candidate discriminant model for training positive samples and negative samples to generate the trained discriminant model.
An electronic device comprising a memory, a processor and a computer program stored in the memory and running on the processor, when the processor executes the program, the following steps are implemented:

Get the pending statement;

Inputting the to-be-processed statement into a trained generative model to obtain multiple candidate similar statements;

generating a plurality of discriminative sentence pairs according to the to-be-processed sentence and the plurality of candidate similar sentences;

The plurality of discriminative sentence pairs are input into a trained discriminant model, a discriminant result is obtained, and a target similar sentence is obtained from the plurality of candidate similar sentences according to the discriminant result.
A non-transitory computer-readable storage medium on which a computer program is stored, wherein, when the program is executed by a processor, the following steps are implemented:

Get the pending statement;

Inputting the to-be-processed statement into a trained generative model to obtain multiple candidate similar statements;

generating a plurality of discriminative sentence pairs according to the to-be-processed sentence and the plurality of candidate similar sentences;

The plurality of discriminative sentence pairs are input into a trained discriminant model, a discriminant result is obtained, and a target similar sentence is obtained from the plurality of candidate similar sentences according to the discriminant result.
A computer program product wherein, when the instructions in the computer program product are executed by a processor, the following steps are performed:

Get the pending statement;

Inputting the to-be-processed statement into a trained generative model to obtain multiple candidate similar statements;

generating a plurality of discriminative sentence pairs according to the to-be-processed sentence and the plurality of candidate similar sentences;

The plurality of discriminative sentence pairs are input into a trained discriminant model, a discriminant result is obtained, and a target similar sentence is obtained from the plurality of candidate similar sentences according to the discriminant result.