WO2023241226A1 - Statement generation method and device and storage medium - Google Patents

Statement generation method and device and storage medium Download PDF

Info

Publication number
WO2023241226A1
WO2023241226A1 PCT/CN2023/090386 CN2023090386W WO2023241226A1 WO 2023241226 A1 WO2023241226 A1 WO 2023241226A1 CN 2023090386 W CN2023090386 W CN 2023090386W WO 2023241226 A1 WO2023241226 A1 WO 2023241226A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
candidate similar
statement
generation model
user
Prior art date
Application number
PCT/CN2023/090386
Other languages
French (fr)
Chinese (zh)
Inventor
蒋炜
段新宇
王喆锋
怀宝兴
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2023241226A1 publication Critical patent/WO2023241226A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation

Definitions

  • This application relates to the field of artificial intelligence (AI), and in particular to a sentence generation method, device and storage medium.
  • AI artificial intelligence
  • the training and use of AI models often rely on corpora containing a large number of similar sentences.
  • a large number of similar sentences in the corpus can be used to train the question matching model.
  • the question in the corpus can also be matched with the target question through the trained question matching model, and then the answer corresponding to the matching question of the target question can be The answer to the target question.
  • the more similar questions to the target question in the corpus the easier it is to obtain matching questions with higher accuracy during the matching process.
  • Embodiments of the present application provide a sentence generation method, device and storage medium, which can automatically generate high-quality similar sentences to achieve effective expansion of the corpus.
  • the technical solutions are as follows:
  • a sentence generation method includes: based on a reference sentence, generating a plurality of first candidate similar sentences through a first sentence generation model; based on a user's modification operation for at least one first candidate similar sentence, Generating at least one second candidate similar statement, the plurality of first candidate similar statements including the at least one first candidate similar statement; using the at least one second candidate similar statement to update the first statement generation model, Obtain the second sentence generation model.
  • At least one second candidate similar sentence can be obtained based on the user's modification operation on at least one first candidate similar sentence. , and then use the at least one second candidate similar sentence to update the first sentence generation model. That is to say, the embodiment of the present application can optimize the sentence generation model in real time through the candidate similar sentences modified by the user. On this basis, more high-quality similar sentences can be generated based on the optimized sentence generation model and reference sentences. In this way, not only the corpus is effectively expanded, but also the accuracy of the sentence generation model can be continuously improved.
  • the above sentence generation model is an artificial intelligence model.
  • the sentence generation model can be a neural network model or a decision tree model.
  • the sentence generation model can be UniLM, a unified pre-trained language model for natural language understanding and generation.
  • the method further includes: displaying the plurality of first candidate similar sentences to the user.
  • the user can browse the plurality of first candidate similar sentences and modify, confirm or delete each first candidate similar sentence.
  • the method further includes: generating a plurality of third candidate similar sentences based on the reference sentence and the second sentence generation model; The plurality of third candidate similar statements are displayed to the user.
  • the plurality of third candidate similar sentences generated by the optimized sentence generation model will be more accurate.
  • multiple third candidate similar sentences are continued to be displayed to the user, so that the user can continue to modify the displayed candidate similar sentences, thereby continuing to optimize the sentence generation model.
  • the generation results of the sentence generation model can be made more and more accurate, and at the same time, more similar sentences can be obtained to expand the corpus.
  • the modification operation includes one or more of the following: adding character operations, deleting character operations, exchanging character operations, replacing character operations, and rewriting statement operations.
  • adding character operations deleting character operations, exchanging character operations, replacing character operations
  • rewriting statement operations can be used when the statement has semantic problems. Rewrite the statement below.
  • the implementation process of updating the first sentence generation model using the at least one second candidate similar sentence to obtain the second sentence generation model may include: obtaining the at least one first candidate similar sentence Difference information between each first candidate similar sentence and the corresponding modified second candidate similar sentence; based on the difference information, update the parameters of the first sentence generation model to obtain the second sentence generation Model.
  • a second aspect provides a sentence generation device, which is used to implement the sentence generation method described in the first aspect.
  • the statement generating device may include at least one module.
  • the at least one module may include a generation module, a modification module and an update module.
  • the generation module is used to generate a plurality of first candidate similar sentences based on the reference sentence through the first sentence generation model;
  • the modification module is used to generate at least one second candidate similar sentence based on the user's modification operation of at least one first candidate similar sentence.
  • Candidate similar sentences, the plurality of first candidate similar sentences include the at least one first candidate similar sentence; an update module for updating the first sentence generation model using the at least one second candidate similar sentence, Obtain the second sentence generation model.
  • the sentence generation model is an artificial intelligence model.
  • the device further includes: a display module.
  • the display module is used to display the plurality of first candidate similar sentences to the user; the generation module is also used to generate a plurality of first candidate similar sentences based on the reference sentence and the second sentence generation model. Three candidate similar sentences; the display module is also used to display the plurality of third candidate similar sentences to the user.
  • the modification operation includes one or more of adding character operation, deleting character operation, exchanging character operation, replacing character operation, and rewriting statement operation.
  • the update module is mainly configured to: obtain difference information between each first candidate similar statement in the at least one first candidate similar statement and the corresponding modified second candidate similar statement; based on the The difference information is obtained, and parameters of the first sentence generation model are updated to obtain the second sentence generation model.
  • a computer device in a third aspect, includes a processor and a memory.
  • the memory is used to store at least one program instruction that supports the computer device to execute the statement generation method provided in the first aspect. code, and storage of data involved in implementing the statement generation method provided in the first aspect.
  • the processor is configured to execute program instructions or code stored in the memory.
  • a computer-readable storage medium In a fourth aspect, a computer-readable storage medium is provided. Instructions are stored in the computer-readable storage medium. When the instructions are run on a computer device, the computer device executes the statement generation method described in the first aspect. .
  • a fifth aspect provides a computer program product containing instructions that, when run on a computer, causes the computer to execute the statement generation method described in the first aspect.
  • At least one second candidate similar sentence can be obtained based on the user's modification operation on at least one first candidate similar sentence. , and then use the at least one second candidate similar sentence to update the first sentence generation model. That is to say, the embodiment of the present application can optimize the sentence generation model in real time through the candidate similar sentences modified by the user. On this basis, more high-quality similar sentences can be generated based on the optimized sentence generation model and reference sentences, thereby achieving An effective expansion of the corpus.
  • Figure 1 is a schematic diagram of the implementation environment of a statement generation method provided by an embodiment of the present application
  • Figure 2 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • Figure 3 is a flow chart of a statement generation method provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of a display page and a modification page of a first candidate similar statement provided by an embodiment of the present application;
  • Figure 5 is a schematic flow chart of interaction between a user and a terminal device provided by an embodiment of the present application
  • Figure 6 is a schematic diagram of a display page of a second candidate similar sentence provided by an embodiment of the present application.
  • Figure 7 is an exemplary flow chart of a statement generation process provided by an embodiment of the present application.
  • Figure 8 is a schematic structural diagram of a sentence generation device provided by an embodiment of the present application.
  • the sentence generation method provided by the embodiment of the present application can be used to generate similar sentences of the reference sentence, thereby effectively expanding the corpus, and thereby providing data support for the processing of text information.
  • automatic question and answer technology is a research hotspot in the field of natural language.
  • This technology is widely used in various Internet products, such as question and answer robots, chat robots, intelligent customer service robots, etc.
  • retrieval question answering is an important branch of automatic question answering.
  • a retrieval question answering system a large number of similar sentences in the corpus can be used to match questions The model is trained. Subsequently, after receiving the target question input by the user, the question in the corpus can also be matched with the target question through the trained question matching model, and then the answer corresponding to the matching question of the target question can be The answer to the target question.
  • the sentence generation method provided by the embodiment of the present application can be used to automatically generate a large number of high-quality similar sentences based on the reference sentences to expand the corpus, thereby improving the accuracy of automatic question and answer.
  • the document retrieval system when building a document retrieval system, after the user uploads a reference document, the document retrieval system can extract the key content in the document, such as extracting the title, table of contents, etc. of the document, and build a corpus based on the extracted key content. On this basis, the document retrieval system can also use the key content extracted from the document as a reference sentence, and then use the sentence generation method provided by the embodiment of the present application to generate similar sentences to the reference sentence, thereby expanding the corpus. In this way, when the document retrieval system subsequently receives a retrieval statement for document retrieval, it can use the retrieval model to search for statements matching the retrieval statement among the extracted key content and generated similar statements, thereby improving the accuracy of retrieval. In addition, these similar sentences in the corpus can also be used to train the retrieval model to improve the accuracy of the retrieval model.
  • the statement generation method provided by the embodiments of this application can be implemented by a computer device in any environment.
  • the sentence generation method can be implemented through a terminal device, and the terminal device can be a device such as a desktop computer, a notebook computer, a smart phone, a tablet computer, etc.
  • the statement generation method can also be implemented by multiple computer devices.
  • it can be implemented by the client device 101 and the server 102 shown in FIG. 1 .
  • the client device 101 is used to provide an interactive interface, and interact with the user through the interactive interface to obtain the reference sentence. Afterwards, the reference sentence is sent to the server 102.
  • a statement generation model is deployed on the server 102. Based on this, after receiving the reference sentence sent by the client device 101, the server 102 generates a plurality of candidate similar sentences through the sentence generation model. Afterwards, the plurality of candidate similar sentences are sent to the client device 101.
  • the client device 101 is also configured to present the plurality of candidate similar sentences to the user. Based on the user's modification operation on the displayed at least one candidate similar sentence, at least one modified candidate similar sentence is generated, and the at least one modified candidate similar sentence is sent to the server 102 . In addition, the client device 101 may also send the candidate similar sentence to the server 102 as a similar sentence of the reference sentence based on the user's confirmation operation on any displayed candidate similar sentence.
  • the server 102 is also configured to add similar sentences to the reference sentence to the database when receiving similar sentences to the reference sentence confirmed by the user.
  • the sentence generation model is updated using the at least one modified candidate similar sentence, thereby obtaining an updated sentence generation model.
  • the server 102 can continue to use the updated sentence generation model to generate multiple candidate similar sentences based on the reference sentences, and again send the multiple candidate similar sentences to the client device 101 for display, to repeat the above steps until termination is reached.
  • the termination condition may be that the similar statements added to the reference statements in the database by the server 102 reach the reference threshold, or the termination condition may be that the number of update rounds of the statement generation model by the server 102 reaches the reference number of rounds, etc.
  • the above-mentioned server 102 can be deployed in a cloud environment.
  • the server 102 may be a server in a cloud data center.
  • the server 102 may also be an edge computing device in an edge environment.
  • the server 102 may be a computing device in a data center of an organization.
  • the server 102 may also be a computing device in other scenarios or types, which is not limited in the embodiments of this application.
  • the above-mentioned client device 101 can be a device such as a laptop computer, a smart phone, a tablet computer, etc.
  • Figure 2 is a schematic structural diagram of a computer device provided by an embodiment of the present application. Regardless of whether the sentence generation method provided by the embodiment of the present application is implemented by one computer device or multiple computer devices, the computer device can be the computer device 200 shown in FIG. 2 .
  • the computer device 200 includes: a processor 201, a communication bus 202, a memory 203 and at least one communication interface 204.
  • the processor 201 can be a general central processing unit (CPU), an application-specific integrated circuit (ASIC), a graphics processing unit (GPU) or any combination thereof.
  • the processor 201 may include one or more chips, and the processor 201 may include an AI accelerator, such as a neural network processor (neural processing unit, NPU).
  • NPU neural network processor
  • Communication bus 202 may include a path that carries information between various components of computer device 200 (eg, processor 201, memory 203, communication interface 204).
  • the memory 203 may be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory (RAM)) or other type of static storage device that can store information and instructions.
  • ROM read-only memory
  • RAM random access memory
  • dynamic storage device it can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc Storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be used by Any other media accessed by a computer, but not limited to this.
  • EEPROM electrically erasable programmable read-only memory
  • CD-ROM compact disc read-only memory
  • optical disc Storage including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs,
  • the memory 203 may exist independently and be connected to the processor 201 through the communication bus 202.
  • the memory 203 may also be integrated with the processor 201.
  • the memory 203 can store computer instructions. When the computer instructions stored in the memory 203 are executed by the processor 201, the statement generation method provided by the embodiment of the present application can be implemented.
  • the memory 203 may also store data required by the processor during execution of the above method as well as intermediate data and/or result data generated.
  • the communication interface 204 uses any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet, wireless access network (RAN), wireless local area networks (WLAN), etc.
  • a transceiver to communicate with other devices or communication networks, such as Ethernet, wireless access network (RAN), wireless local area networks (WLAN), etc.
  • the processor 201 may include one or more CPUs.
  • the computer device 200 may include multiple processors. Each of these processors may be a single-CPU processor or a multi-CPU processor.
  • a processor here may refer to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).
  • the computer device 200 may also include an output device 205 and an input device 206.
  • Output device 205 communicates with processor 201 and can display information in a variety of ways.
  • the output device 205 may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, a projector, etc.
  • Input device 206 communicates with processor 201 and may receive user input in a variety of ways.
  • the input device 206 may be a mouse, a keyboard, a touch screen device, a sensing device, or the like.
  • Figure 3 is a flow chart of a statement generation method provided by an embodiment of the present application. This method can be applied to one or more computer devices. The following description takes the method applied to one terminal device as an example. Referring to Figure 3, the method includes the following steps:
  • Step 301 Based on the reference sentence, generate a plurality of first candidate similar sentences through the first sentence generation model.
  • an interactive interface is displayed on the terminal device.
  • the interactive interface may include a text input box.
  • the user may enter one or more reference sentences in the text input box.
  • the terminal device may obtain user input. reference sentence.
  • the interactive interface may include a document upload option. After detecting the user's selection of the document upload option, the user receives the document uploaded by the user and extracts one or more reference sentences from the document.
  • the interactive interface includes a voice option. After detecting the user's selection of the voice option, the terminal device turns on the audio collection device to collect the user's voice information, and then processes the voice information through a voice recognition model, thereby Get one or more reference statements.
  • the terminal device can also obtain the reference sentence through other methods. For example, get reference statements from a database that stores a large number of statements.
  • the database can be deployed on the terminal device or on other devices, for example, in a background server.
  • the terminal device can perform text domain processing operations such as word segmentation and stop word removal on the reference sentence. Afterwards, the terminal device can use the processed reference sentence as an input of the first sentence generation model, process the reference sentence through the first sentence generation model, and thereby output a plurality of first candidate similar sentences.
  • the number of output first candidate similar sentences may be set by the user, or may be a default number. For example, the number of output first candidate similar sentences is 10.
  • the first sentence generation model may be an AI model.
  • the first sentence generation model may be a neural network model or a decision tree model.
  • the neural network model can be a sequence-to-sequence ( seq2seq) model.
  • the first sentence generation model may be a sentence generation model pre-trained using a large amount of unlabeled data and open source similar problem sets.
  • the first sentence generation model may also be a sentence generation model that has been updated using candidate similar sentences modified by the user.
  • Step 302 Generate at least one second candidate similar sentence based on the user's modification operation on at least one first candidate similar sentence.
  • the terminal device may display the plurality of first candidate similar sentences.
  • the user may perform a modification operation on at least one first candidate similar statement among the displayed plurality of first candidate similar statements.
  • the terminal device generates at least one second candidate similar sentence based on the user's modification operation on the at least one first candidate similar sentence.
  • the modification operation includes one or more of adding character operation, deleting character operation, exchanging character operation, replacing character operation, and rewriting statement operation.
  • the terminal device may display the plurality of first candidate similar sentences in the interactive interface, and display the operation options corresponding to each first candidate similar sentence.
  • the operation options may include modification options, deletion options and confirmation options.
  • the user can perform a selection operation on the modification option corresponding to a first candidate similar sentence.
  • the terminal device can detect the user's modification operation on the first candidate similar sentence, and then generate a second modification based on the modification operation. wait Choose similar sentences.
  • the terminal device in response to the user selecting a modification option corresponding to a first candidate similar sentence, displays the modification sub-option corresponding to the first candidate similar sentence.
  • the modification of sub-options may include adding sub-options, deleting sub-options, exchanging sub-options, replacing sub-options and rewriting sub-options.
  • the user can perform one or more of adding character operations, deleting character operations, swapping character operations, replacing character operations, and rewriting statement operations based on the displayed modification sub-options.
  • the terminal device can generate a modified sentence, that is, a second candidate similar sentence, based on the modification operation performed by the user on the first candidate similar sentence.
  • the user can perform a selection operation on the add sub-option corresponding to the first candidate similar sentence.
  • the terminal device can receive a character inserted by the user at a certain position in the first candidate similar sentence.
  • the first candidate similar statement is updated based on the inserted character.
  • the user can perform a selection operation on the deletion sub-option corresponding to the first candidate similar sentence.
  • the terminal device can detect the characters selected and deleted by the user, and then based on the characters deleted by the user, the first candidate is similar. statement to update.
  • the user can perform a selection operation on the exchange sub-option corresponding to the first candidate similar sentence.
  • the terminal device can detect that the user has selected and dragged one or more characters in the first candidate similar sentence. Operation, based on the user's drag operation, the selected character or characters are moved to the position specified by the user.
  • the user can perform a selection operation on the replacement sub-option corresponding to the first candidate similar sentence.
  • the terminal device can detect the character selected by the user in the first candidate similar sentence and display a text input box. The characters input by the user in the text input box are obtained, and the selected characters in the first candidate similar sentence are replaced with the characters input by the user, thereby obtaining the updated first candidate similar sentence.
  • the user can perform a selection operation on the rewriting sub-option corresponding to the first candidate similar sentence.
  • the terminal device can display a text input box and obtain the rewriting sentence input by the user in the text input box. Treat the rewritten statement as the updated statement.
  • the user can sequentially perform at least one modification operation on the first candidate similar sentence through at least one modification sub-option.
  • the interactive interface of the terminal device may also include an end modification option, so that after the user completes the modification, the user can select the end modification option.
  • the terminal device detects the selection operation of the end modification option the first candidate similar statement after the last update is used as the second candidate similar statement.
  • the first sentence generation model generates n first candidate similar sentences based on the reference sentence, which are respectively sentence 1-1, sentence 1-2, ..., and sentence 1-n.
  • the terminal device displays a display page in which the n first candidate similar sentences are displayed, and corresponding modification options, confirmation options, and deletion options are displayed after each sentence. Users can browse each statement. If the user is partially satisfied with a certain statement, for example, if the user believes that statements 1-3 can be modified to become similar statements that meet the requirements, the user can select the modification options displayed after statements 1-3. In response to the selection operation of the modification option, the terminal device may display the modification interface of statements 1-3. The modification interface displays statements 1-3, multiple modification sub-options, and an end modification option.
  • the multiple modification sub-options include adding sub-options, deleting sub-options, exchanging sub-options, replacing sub-options, and rewriting sub-options.
  • Users can refer to the method introduced previously to perform modification operations on statements 1-3 based on the multiple modification sub-options.
  • the user can perform a selection operation on the end modification option.
  • the terminal device can display the modification results in the display page. Sentences 1-3'.
  • the terminal device can also display a continue option and an end option in the display page. If the user has completed modifying the displayed statement in the display page, the user can select the continue option.
  • the terminal device may perform step 303. If the user no longer needs to generate similar sentences for the reference sentence, the user can also instruct the terminal device to end updating the sentence generation model and end generating candidate similar sentences for the reference sentence by clicking the end option.
  • the terminal device in response to the user selecting a modification option corresponding to a first candidate similar statement, can change the first candidate similar statement from an uneditable state to an editable state.
  • the user can insert characters into the first candidate similar sentence, delete, replace, exchange, etc. characters in the first candidate similar sentence.
  • the terminal device can update the first candidate similar sentence according to the modification operation performed by the user, thereby obtaining the modified sentence, that is, the second candidate similar sentence.
  • the user can also perform a selection operation on the confirmation option of a first candidate similar sentence.
  • the terminal device can store the first candidate similar sentence as a similar sentence of the reference sentence. to the database.
  • the database may be a corpus storing a large number of statements.
  • the user can also perform a selection operation on the deletion option of a certain first candidate similar sentence.
  • the terminal device can directly delete the first candidate similar sentence.
  • FIG. 5 is a schematic flowchart of interaction between a user and a terminal device according to an embodiment of the present application.
  • a certain candidate similar statement displayed on the terminal device such as statement n
  • the user determines whether the statement n is satisfactory. If satisfied, click the confirmation option, and then the terminal device adds the statement n to the database. If you are not satisfied, click the delete option, and then the terminal device deletes the statement n. If satisfied with the n part of the statement, the user clicks the modification option. Afterwards, the user determines the problem with statement n. If there is a semantic problem, the user can click the reselect sub-option and enter the rewritten data to obtain a modified similar statement.
  • the user can click one or more of add sub-option, delete sub-option, modify sub-option and exchange sub-option to modify the statement n. After that, if the modification is completed, click on the End Modification option. Afterwards, the terminal device generates a modified similar statement based on the user's modification operation.
  • the user can modify at least one first candidate similar sentence displayed on the terminal device, thereby obtaining at least one second candidate similar sentence.
  • the terminal device displays a plurality of first candidate similar sentences
  • the user can select the plurality of first candidate similar sentences.
  • the confirmation operation is all performed, so that the terminal device can add the plurality of first candidate similar sentences to the database as similar sentences of the reference sentence.
  • the end device will not update the statement generation model.
  • the user if the user is satisfied with some of the displayed first candidate similar sentences and is dissatisfied with some of the sentences, the user can perform a confirmation operation on the satisfied first candidate similar sentences and not on the dissatisfied first candidate similar sentences. No modification operations are performed.
  • the terminal device can add the first candidate similar statement confirmed by the user to the database as a similar statement of the reference statement based on the user operation, and discard the first candidate similar statement for which the user has not performed a modification operation. In this case, the terminal device also does not update the sentence generation model.
  • Step 303 Update the first sentence generation model using at least one second candidate similar sentence to obtain a second sentence generation model.
  • the terminal device can obtain the difference information between each first candidate similar sentence in the at least one first candidate similar sentence and the corresponding modified second candidate similar sentence; based on the Difference Based on the different information, the parameters of the first sentence generation model are updated to obtain the second sentence generation model.
  • At least one second candidate similar sentence is obtained by modifying at least one first candidate similar sentence.
  • the at least one first candidate similar sentence is the actual output value of the first sentence generation model
  • the at least one modified second candidate similar sentence can be used as the expected output value of the first sentence generation model.
  • the terminal device may calculate an error value between each first candidate similar sentence in the at least one first candidate similar sentence and the corresponding modified second candidate similar sentence, that is, calculate each actual output value
  • the error value between the error value and the corresponding expected output value is used as the difference information, and then based on the difference information, the parameters of the first sentence generation model are updated to obtain the second sentence generation model.
  • the The terminal device can use the cross-entropy loss function based on autoregression to determine the total error value based on the calculated at least one error value. If the total error value is greater than the error threshold, the terminal device can determine the total error value from the neural network model based on the total error value.
  • the output layer of the neural network model begins to perform backpropagation to update the parameters of each layer of the neural network model layer by layer, thereby obtaining the second sentence generation model.
  • the terminal device can also update the parameters of the first sentence generation model through other update methods to obtain the second sentence generation model.
  • Newton's method can be used to update the parameters of the first sentence generation model.
  • the first sentence generation model is updated, which will not be described again in the embodiment of this application.
  • the terminal device can use the aforementioned reference sentence as the input of the second sentence generation model, process the reference sentence through the second sentence generation model, and thereby output a plurality of third similar candidates. statement. Afterwards, the terminal device can display the plurality of third candidate similar sentences to the user in the method introduced in step 302, and decide whether to use the displayed candidate similar sentences as a reference sentence based on the user's operation on the displayed candidate similar sentences. Add similar statements to the database, delete or modify candidate similar statements.
  • the second sentence generation model can continue to be updated, and so on, until the terminal device modifies the sentence
  • the number of update rounds of the generated model reaches the specified number of rounds, or until the similar sentences of the reference sentence stored by the terminal device in the database reach the specified number, the terminal device can stop updating the sentence generation model and stop generating by sentences.
  • the model generates candidate similar sentences for the reference sentence.
  • the second sentence generation model After the second sentence generation model is updated based on at least one second candidate similar sentence, the second sentence generation model generates n third candidate similar sentences, namely sentence 2-1, sentence 2-2, and sentence 2. -3,..., statement 2-n.
  • the terminal device can display the n third candidate similar sentences, and display corresponding operation options after each sentence for the user to perform the corresponding operation.
  • the terminal device can also store the at least one second candidate similar statement into the database.
  • the embodiment of this application provides an exemplary flow chart of the sentence generation process.
  • the sentence generation model first generates similar sentences 1 to n based on the reference sentence, and then displays the similar sentences 1 to n on the interactive interface. Users can provide feedback on the displayed statements through the operation options provided in the interactive interface. Among them, statements that the user confirms to be satisfactory can be directly stored in the database. Statements that users are not satisfied with will be deleted directly.
  • the modified statement can be used to update the parameters of the statement generation model, and the modified statement can be stored in the database.
  • At least one second candidate similar sentence can be obtained based on the user's modification operation on at least one first candidate similar sentence. , and then use the at least one second candidate similar sentence to update the first sentence generation model. That is to say, the embodiments of the present application can optimize the sentence generation model in real time through user-modified candidate similar sentences. On this basis, new higher-quality similar sentences can be generated again based on the optimized sentence generation model and reference sentences. In this way, not only the corpus is effectively expanded, but also the accuracy of the sentence generation model can be continuously improved.
  • the sentence generation model deployed on the terminal device can be pre-trained through a large amount of unlabeled data and open source labeled similar question sets. In this way, the generalization ability and cold start ability of the model can be improved.
  • the implementation process of the method is introduced above by taking the terminal device executing the statement generation method as an example.
  • the sentence generation method is executed by multiple computer devices, for example, by the client device and the server shown in Figure 1, then the above embodiment shows candidate similar sentences, interaction with the user, and acquisition based on the user's modification operation.
  • the step of obtaining the modified candidate similar sentences can be implemented by the client device, or the step of obtaining the modified candidate similar sentences based on the user's modification operation can also be implemented by the server.
  • the steps of generating candidate similar sentences based on the reference sentences and updating the parameters of the sentence generation model based on the modified candidate similar sentences can be implemented by the server.
  • the device 800 includes a generation module 801, a modification module 802 and an update module 803.
  • the update module 803 is used to perform step 303 in the above embodiment.
  • the generation module 801, the modification module 802 and the update module 803 can be deployed on a computer device and executed by a processor on the computer device.
  • the generation module 801, modification module 802 and update module 803 can be distributed on multiple computer devices and executed by processors of multiple computer devices to jointly implement the above statement generation method.
  • the sentence generation model is an artificial intelligence model.
  • the device 800 also includes:
  • Display module 804 used to display multiple first candidate similar sentences to the user
  • the generation module 801 is also used to generate a plurality of third candidate similar sentences based on the reference sentence and the second sentence generation model;
  • the display module 804 is also used to display multiple third candidate similar sentences to the user.
  • the modification operation includes one or more of the following:
  • the update module 803 is mainly used for:
  • the parameters of the first sentence generation model are updated to obtain the second sentence generation model.
  • multiple first candidates of the reference sentence are generated through the first sentence generation model.
  • at least one second candidate similar sentence can be obtained based on the user's modification operation on at least one first candidate similar sentence, and then the at least one second candidate similar sentence can be used to update the first sentence generation model. That is to say, the embodiments of the present application can optimize the sentence generation model in real time through user-modified candidate similar sentences. On this basis, new higher-quality similar sentences can be generated again based on the optimized sentence generation model and reference sentences. In this way, not only the corpus is effectively expanded, but also the accuracy of the sentence generation model can be continuously improved.
  • the sentence generation device provided in the above embodiment generates similar sentences to the reference sentence
  • the division of the above functional modules is only used as an example.
  • the above functions can be allocated to different functions as needed.
  • Module completion means dividing the internal structure of the device into different functional modules to complete all or part of the functions described above.
  • the sentence generation device provided by the above embodiments and the sentence generation method embodiments belong to the same concept. Please refer to the method embodiments for the specific implementation process, which will not be described again here.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated.
  • the available media may be magnetic media (such as floppy disks, hard disks, tapes), optical media (such as digital versatile discs (DVD)), or semiconductor media (such as solid state disks (SSD) )wait.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A statement generation method and device and a storage medium, relating to the technical field of artificial intelligence. In the method, after a plurality of first candidate similar statements of a reference statement are generated by means of a first statement generation model, at least one second candidate similar statement can be obtained on the basis of a modification operation of a user on at least one first candidate similar statement, and then the first statement generation model is updated by means of the at least one second candidate similar statement. That is, according to the method, the statement generation model can be optimized in real time by means of the candidate similar statement modified by the user; on this basis, more high-quality similar statements can be generated by means of the optimized statement generation model and the reference statement, thereby effectively expanding a corpus.

Description

语句生成方法、装置及存储介质Statement generation method, device and storage medium
本申请要求于2022年6月17日提交的申请号为202210693429.8、发明名称为“语句生成方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application with application number 202210693429.8 and the invention title "Sentence Generation Method, Device and Storage Medium" submitted on June 17, 2022, the entire content of which is incorporated into this application by reference.
技术领域Technical field
本申请涉及人工智能(artificial intelligence,AI)领域,特别涉及一种语句生成方法、装置及存储介质。This application relates to the field of artificial intelligence (AI), and in particular to a sentence generation method, device and storage medium.
背景技术Background technique
当前,在自然语言处理领域,AI模型的训练和使用往往需要依赖包含有大量相似语句的语料库。例如,在基于检索式问答算法实现的自动问答系统中,可以利用语料库中大量相似语句对问句匹配模型进行训练。后续,在接收到用户输入的目标问句后,还可以通过训练好的问句匹配模型将语料库中的问句与目标问句进行匹配,进而将该目标问句的匹配问句对应的回答作为该目标问句的回答。其中,语料库中目标问句的相似问句越多,则在匹配过程中越容易获得准确率较高的匹配问句。Currently, in the field of natural language processing, the training and use of AI models often rely on corpora containing a large number of similar sentences. For example, in an automatic question and answer system based on a retrieval question and answer algorithm, a large number of similar sentences in the corpus can be used to train the question matching model. Subsequently, after receiving the target question input by the user, the question in the corpus can also be matched with the target question through the trained question matching model, and then the answer corresponding to the matching question of the target question can be The answer to the target question. Among them, the more similar questions to the target question in the corpus, the easier it is to obtain matching questions with higher accuracy during the matching process.
综上可见,自然语言处理领域的AI模型的训练和使用将受到语料库中包含的相似语句的数量和质量的影响。基于此,亟需提供一种能够自动生成高质量相似语句的方法,以此来扩充语料库,从而为AI模型的训练和使用提供数据支持。In summary, it can be seen that the training and use of AI models in the field of natural language processing will be affected by the number and quality of similar sentences contained in the corpus. Based on this, there is an urgent need to provide a method that can automatically generate high-quality similar sentences to expand the corpus and provide data support for the training and use of AI models.
发明内容Contents of the invention
本申请实施例提供了一种语句生成方法、装置及存储介质,可以自动生成高质量的相似语句,以实现对语料库的有效扩充。所述技术方案如下:Embodiments of the present application provide a sentence generation method, device and storage medium, which can automatically generate high-quality similar sentences to achieve effective expansion of the corpus. The technical solutions are as follows:
第一方面,提供了一种语句生成方法,所述方法包括:基于参考语句,通过第一语句生成模型生成多个第一候选相似语句;基于用户针对至少一个第一候选相似语句的修改操作,生成至少一个第二候选相似语句,所述多个第一候选相似语句包括所述至少一个第一候选相似语句;利用所述至少一个第二候选相似语句对所述第一语句生成模型进行更新,得到第二语句生成模型。In a first aspect, a sentence generation method is provided. The method includes: based on a reference sentence, generating a plurality of first candidate similar sentences through a first sentence generation model; based on a user's modification operation for at least one first candidate similar sentence, Generating at least one second candidate similar statement, the plurality of first candidate similar statements including the at least one first candidate similar statement; using the at least one second candidate similar statement to update the first statement generation model, Obtain the second sentence generation model.
在本申请实施例中,在通过第一语句生成模型生成参考语句的多个第一候选相似语句后,可以基于用户对至少一个第一候选相似语句的修改操作来获得至少一个第二候选相似语句,进而利用该至少一个第二候选相似语句来对第一语句生成模型进行更新。也即,本申请实施例可以通过用户修改后的候选相似语句来实时优化语句生成模型,在此基础上,基于优化后的语句生成模型和参考语句能够生成更多高质量的相似语句。这样,不仅实现了对语料库的有效扩充,还能够不断的提升语句生成模型的准确率。In this embodiment of the present application, after generating a plurality of first candidate similar sentences of the reference sentence through the first sentence generation model, at least one second candidate similar sentence can be obtained based on the user's modification operation on at least one first candidate similar sentence. , and then use the at least one second candidate similar sentence to update the first sentence generation model. That is to say, the embodiment of the present application can optimize the sentence generation model in real time through the candidate similar sentences modified by the user. On this basis, more high-quality similar sentences can be generated based on the optimized sentence generation model and reference sentences. In this way, not only the corpus is effectively expanded, but also the accuracy of the sentence generation model can be continuously improved.
可选地,上述的语句生成模型为人工智能模型。例如,该语句生成模型可以为神经网络模型或者是决策树模型等。例如,该语句生成模型可以为用于自然语言理解和生成的统一预训练语言模型UniLM。 Optionally, the above sentence generation model is an artificial intelligence model. For example, the sentence generation model can be a neural network model or a decision tree model. For example, the sentence generation model can be UniLM, a unified pre-trained language model for natural language understanding and generation.
可选地,在基于参考语句生成多个第一候选相似语句后,所述方法还包括:向所述用户展示所述多个第一候选相似语句。这样,用户可以浏览该多个第一候选相似语句,并对每个第一候选相似语句进行修改、确认或删除。Optionally, after generating a plurality of first candidate similar sentences based on the reference sentence, the method further includes: displaying the plurality of first candidate similar sentences to the user. In this way, the user can browse the plurality of first candidate similar sentences and modify, confirm or delete each first candidate similar sentence.
可选地,在对第一语句生成模型进行更新得到第二语句生成模型之后,所述方法还包括:基于所述参考语句和所述第二语句生成模型,生成多个第三候选相似语句;向所述用户展示所述多个第三候选相似语句。Optionally, after updating the first sentence generation model to obtain the second sentence generation model, the method further includes: generating a plurality of third candidate similar sentences based on the reference sentence and the second sentence generation model; The plurality of third candidate similar statements are displayed to the user.
在基于用户修改后的第一候选相似语句对第一语句生成模型进行优化后,利用优化后的语句生成模型生成的多个第三候选相似语句将更为准确。在此基础上,继续向用户展示多个第三候选相似语句,这样,用户可以继续对展示的候选相似语句进行修改,从而继续对语句生成模型进行优化。如此,一方面能够使得语句生成模型的生成结果越来越准确,同时,还可以获得更多的相似语句,实现对语料库的扩充。After the first sentence generation model is optimized based on the first candidate similar sentences modified by the user, the plurality of third candidate similar sentences generated by the optimized sentence generation model will be more accurate. On this basis, multiple third candidate similar sentences are continued to be displayed to the user, so that the user can continue to modify the displayed candidate similar sentences, thereby continuing to optimize the sentence generation model. In this way, on the one hand, the generation results of the sentence generation model can be made more and more accurate, and at the same time, more similar sentences can be obtained to expand the corpus.
可选地,所述修改操作包括下述的一种或多种:添加字符操作、删除字符操作、交换字符操作、替换字符操作、重写语句操作。其中,添加字符操作、删除字符操作、交换字符操作、替换字符操作可以用于在语句存在词法问题的情况下对语句中的词进行修改,重写语句操作可以用于在语句存在语义问题的情况下对语句进行重写。Optionally, the modification operation includes one or more of the following: adding character operations, deleting character operations, exchanging character operations, replacing character operations, and rewriting statement operations. Among them, the operations of adding characters, deleting characters, exchanging characters, and replacing characters can be used to modify the words in the statement when there are lexical problems in the statement, and the rewriting statement operation can be used when the statement has semantic problems. Rewrite the statement below.
可选地,所述利用所述至少一个第二候选相似语句对所述第一语句生成模型进行更新,得到第二语句生成模型的实现过程可以包括:获取所述至少一个第一候选相似语句中的每个第一候选相似语句与对应的修改后的第二候选相似语句之间的差异信息;基于所述差异信息,对所述第一语句生成模型进行参数更新,得到所述第二语句生成模型。Optionally, the implementation process of updating the first sentence generation model using the at least one second candidate similar sentence to obtain the second sentence generation model may include: obtaining the at least one first candidate similar sentence Difference information between each first candidate similar sentence and the corresponding modified second candidate similar sentence; based on the difference information, update the parameters of the first sentence generation model to obtain the second sentence generation Model.
第二方面,提供了一种语句生成装置,所述语句生成装置用于实现第一方面所述的语句生成方法。其中,该语句生成装置可以包括至少一个模块。示例性地,该至少一个模块可以包括生成模块、修改模块和更新模块。A second aspect provides a sentence generation device, which is used to implement the sentence generation method described in the first aspect. Wherein, the statement generating device may include at least one module. Exemplarily, the at least one module may include a generation module, a modification module and an update module.
其中,生成模块,用于基于参考语句,通过第一语句生成模型生成多个第一候选相似语句;修改模块,用于基于用户针对至少一个第一候选相似语句的修改操作,生成至少一个第二候选相似语句,所述多个第一候选相似语句包括所述至少一个第一候选相似语句;更新模块,用于利用所述至少一个第二候选相似语句对所述第一语句生成模型进行更新,得到第二语句生成模型。Among them, the generation module is used to generate a plurality of first candidate similar sentences based on the reference sentence through the first sentence generation model; the modification module is used to generate at least one second candidate similar sentence based on the user's modification operation of at least one first candidate similar sentence. Candidate similar sentences, the plurality of first candidate similar sentences include the at least one first candidate similar sentence; an update module for updating the first sentence generation model using the at least one second candidate similar sentence, Obtain the second sentence generation model.
可选地,所述语句生成模型为人工智能模型。Optionally, the sentence generation model is an artificial intelligence model.
可选地,所述装置还包括:展示模块。其中,所述展示模块,用于向所述用户展示所述多个第一候选相似语句;所述生成模块,还用于基于所述参考语句和所述第二语句生成模型,生成多个第三候选相似语句;所述展示模块,还用于向所述用户展示所述多个第三候选相似语句。Optionally, the device further includes: a display module. Wherein, the display module is used to display the plurality of first candidate similar sentences to the user; the generation module is also used to generate a plurality of first candidate similar sentences based on the reference sentence and the second sentence generation model. Three candidate similar sentences; the display module is also used to display the plurality of third candidate similar sentences to the user.
可选地,所述修改操作包括添加字符操作、删除字符操作、交换字符操作、替换字符操作、重写语句操作中的一种或多种。Optionally, the modification operation includes one or more of adding character operation, deleting character operation, exchanging character operation, replacing character operation, and rewriting statement operation.
可选地,所述更新模块主要用于:获取所述至少一个第一候选相似语句中的每个第一候选相似语句与对应的修改后的第二候选相似语句之间的差异信息;基于所述差异信息,对所述第一语句生成模型进行参数更新,得到所述第二语句生成模型。 Optionally, the update module is mainly configured to: obtain difference information between each first candidate similar statement in the at least one first candidate similar statement and the corresponding modified second candidate similar statement; based on the The difference information is obtained, and parameters of the first sentence generation model are updated to obtain the second sentence generation model.
第三方面,提供了一种计算机设备,所述计算机设备的结构中包括处理器和存储器,所述存储器用于存储支持计算机设备执行上述第一方面所提供的语句生成方法的至少一条程序指令或代码,以及存储用于实现上述第一方面所提供的语句生成方法所涉及的数据。所述处理器被配置为用于执行所述存储器中存储的程序指令或代码。In a third aspect, a computer device is provided. The structure of the computer device includes a processor and a memory. The memory is used to store at least one program instruction that supports the computer device to execute the statement generation method provided in the first aspect. code, and storage of data involved in implementing the statement generation method provided in the first aspect. The processor is configured to execute program instructions or code stored in the memory.
第四方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当该指令在计算机设备上运行时,使得计算机设备执行上述第一方面所述的语句生成方法。In a fourth aspect, a computer-readable storage medium is provided. Instructions are stored in the computer-readable storage medium. When the instructions are run on a computer device, the computer device executes the statement generation method described in the first aspect. .
第五方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面所述的语句生成方法。A fifth aspect provides a computer program product containing instructions that, when run on a computer, causes the computer to execute the statement generation method described in the first aspect.
上述第二方面、第三方面、第四方面和第五方面所获得的技术效果与第一方面中对应的技术手段获得的技术效果近似,在这里不再赘述。The technical effects obtained by the above-mentioned second aspect, third aspect, fourth aspect and fifth aspect are similar to those obtained by the corresponding technical means in the first aspect, and will not be described again here.
本申请实施例提供的技术方案至少具有如下有益效果:The technical solution provided by the embodiments of this application has at least the following beneficial effects:
在本申请实施例中,在通过第一语句生成模型生成参考语句的多个第一候选相似语句后,可以基于用户对至少一个第一候选相似语句的修改操作来获得至少一个第二候选相似语句,进而利用该至少一个第二候选相似语句来对第一语句生成模型进行更新。也即,本申请实施例可以通过用户修改后的候选相似语句来实时优化语句生成模型,在此基础上,基于优化后的语句生成模型和参考语句能够生成更多高质量的相似语句,从而实现对语料库的有效扩充。In this embodiment of the present application, after generating a plurality of first candidate similar sentences of the reference sentence through the first sentence generation model, at least one second candidate similar sentence can be obtained based on the user's modification operation on at least one first candidate similar sentence. , and then use the at least one second candidate similar sentence to update the first sentence generation model. That is to say, the embodiment of the present application can optimize the sentence generation model in real time through the candidate similar sentences modified by the user. On this basis, more high-quality similar sentences can be generated based on the optimized sentence generation model and reference sentences, thereby achieving An effective expansion of the corpus.
附图说明Description of the drawings
图1是本申请实施例提供的一种语句生成方法的实施环境示意图;Figure 1 is a schematic diagram of the implementation environment of a statement generation method provided by an embodiment of the present application;
图2是本申请实施例提供的一种计算机设备的结构示意图;Figure 2 is a schematic structural diagram of a computer device provided by an embodiment of the present application;
图3是本申请实施例提供的一种语句生成方法的流程图;Figure 3 is a flow chart of a statement generation method provided by an embodiment of the present application;
图4是本申请实施例提供的一种第一候选相似语句的展示页面和修改页面的示意图;Figure 4 is a schematic diagram of a display page and a modification page of a first candidate similar statement provided by an embodiment of the present application;
图5是本申请实施例提供的一种用户与终端设备之间交互的流程示意图;Figure 5 is a schematic flow chart of interaction between a user and a terminal device provided by an embodiment of the present application;
图6是本申请实施例提供的一种第二候选相似语句的展示页面的示意图;Figure 6 is a schematic diagram of a display page of a second candidate similar sentence provided by an embodiment of the present application;
图7是本申请实施例提供的一种语句生成过程的示例性流程图;Figure 7 is an exemplary flow chart of a statement generation process provided by an embodiment of the present application;
图8是本申请实施例提供的一种语句生成装置的结构示意图。Figure 8 is a schematic structural diagram of a sentence generation device provided by an embodiment of the present application.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the objectives, technical solutions, and advantages of the embodiments of the present application clearer, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
在对本申请实施例进行详细的解释说明之前,先对本申请实施例的应用场景进行介绍。Before explaining the embodiments of the present application in detail, the application scenarios of the embodiments of the present application are first introduced.
当前,在自然语言处理领域中,文本信息的处理往往需要依赖包含有大量相似语句的语料库。本申请实施例提供的语句生成方法即可以用于生成参考语句的相似语句,以此来实现对语料库的有效扩充,进而为文本信息的处理提供数据支持。Currently, in the field of natural language processing, the processing of text information often relies on corpora containing a large number of similar sentences. The sentence generation method provided by the embodiment of the present application can be used to generate similar sentences of the reference sentence, thereby effectively expanding the corpus, and thereby providing data support for the processing of text information.
例如,自动问答技术是自然语言领域中的一个研究热点,该技术被广泛的应用于各种互联网产品中,例如,问答机器人、闲聊机器人、智能客服机器人等。其中,检索式问答是自动问答的一个重要分支。在检索式问答系统中,可以利用语料库中大量相似语句对问句匹配 模型进行训练。后续,在接收到用户输入的目标问句后,还可以通过训练好的问句匹配模型将语料库中的问句与目标问句进行匹配,进而将该目标问句的匹配问句对应的回答作为该目标问句的回答。其中,语料库中目标问句的相似问句越多,则在匹配过程中越容易获得准确率较高的匹配问句。基于此,在该种场景中,即可以采用本申请实施例提供的语句生成方法,基于参考语句自动生成大量高质量的相似语句来对语料库进行扩充,以此提高自动问答的准确性。For example, automatic question and answer technology is a research hotspot in the field of natural language. This technology is widely used in various Internet products, such as question and answer robots, chat robots, intelligent customer service robots, etc. Among them, retrieval question answering is an important branch of automatic question answering. In a retrieval question answering system, a large number of similar sentences in the corpus can be used to match questions The model is trained. Subsequently, after receiving the target question input by the user, the question in the corpus can also be matched with the target question through the trained question matching model, and then the answer corresponding to the matching question of the target question can be The answer to the target question. Among them, the more similar questions to the target question in the corpus, the easier it is to obtain matching questions with higher accuracy during the matching process. Based on this, in this scenario, the sentence generation method provided by the embodiment of the present application can be used to automatically generate a large number of high-quality similar sentences based on the reference sentences to expand the corpus, thereby improving the accuracy of automatic question and answer.
再例如,在构建文档检索系统时,用户上传参考文档后,该文档检索系统可以对文档中的关键内容进行提取,例如,提取文档的标题、目录等,通过提取的关键内容构建语料库。在此基础上,该文档检索系统还可以将从文档中提取的关键内容作为参考语句,进而利用本申请实施例提供的语句生成方法生成该参考语句的相似语句,以此来对语料库进行扩充。这样,后续该文档检索系统在接收到检索语句进行文档检索时,可以利用检索模型在提取到的关键内容以及生成的相似语句中检索与该检索语句匹配的语句,提高了检索的准确率。除此之外,语料库中的这些相似语句还能够用来训练该检索模型,以提高检索模型的精度。For another example, when building a document retrieval system, after the user uploads a reference document, the document retrieval system can extract the key content in the document, such as extracting the title, table of contents, etc. of the document, and build a corpus based on the extracted key content. On this basis, the document retrieval system can also use the key content extracted from the document as a reference sentence, and then use the sentence generation method provided by the embodiment of the present application to generate similar sentences to the reference sentence, thereby expanding the corpus. In this way, when the document retrieval system subsequently receives a retrieval statement for document retrieval, it can use the retrieval model to search for statements matching the retrieval statement among the extracted key content and generated similar statements, thereby improving the accuracy of retrieval. In addition, these similar sentences in the corpus can also be used to train the retrieval model to improve the accuracy of the retrieval model.
上述是本申请实施例提供的语句生成方法的两种示例性应用场景,当然,该语句生成方法也可以应用于其他场景中,例如,同义句生成、语句改写等,本申请实施例对此不做限定。The above are two exemplary application scenarios of the sentence generation method provided by the embodiments of the present application. Of course, the sentence generation method can also be applied in other scenarios, such as synonymous sentence generation, sentence rewriting, etc. The embodiments of the present application are No restrictions.
接下来对本申请实施例提供的语句生成方法的实施环境进行介绍。Next, the implementation environment of the statement generation method provided by the embodiment of this application is introduced.
本申请实施例提供的语句生成方法可以通过任意环境中的一台计算机设备来实现。例如,该语句生成方法可以通过一台终端设备来实现,该终端设备可以为诸如台式计算机、笔记本电脑、智能手机、平板电脑等设备。The statement generation method provided by the embodiments of this application can be implemented by a computer device in any environment. For example, the sentence generation method can be implemented through a terminal device, and the terminal device can be a device such as a desktop computer, a notebook computer, a smart phone, a tablet computer, etc.
或者,该语句生成方法也可以由多台计算机设备来实现。例如,可以通过图1所示的客户端设备101和服务器102来实现。Alternatively, the statement generation method can also be implemented by multiple computer devices. For example, it can be implemented by the client device 101 and the server 102 shown in FIG. 1 .
其中,客户端设备101用于提供交互界面,通过该交互界面与用户进行交互,以获得参考语句。之后,将该参考语句发送至服务器102。Among them, the client device 101 is used to provide an interactive interface, and interact with the user through the interactive interface to obtain the reference sentence. Afterwards, the reference sentence is sent to the server 102.
服务器102上部署有语句生成模型。基于此,服务器102在接收到客户端设备101发送的参考语句之后,通过语句生成模型生成多个候选相似语句。之后,将该多个候选相似语句发送至客户端设备101。A statement generation model is deployed on the server 102. Based on this, after receiving the reference sentence sent by the client device 101, the server 102 generates a plurality of candidate similar sentences through the sentence generation model. Afterwards, the plurality of candidate similar sentences are sent to the client device 101.
客户端设备101还用于向用户展示该多个候选相似语句。基于用户针对展示的至少一个候选相似语句的修改操作,生成至少一个修改后的候选相似语句,并将至少一个修改后的候选相似语句发送至服务器102。除此之外,该客户端设备101还可以基于用户针对展示的任一个候选相似语句的确认操作,将该候选相似语句作为该参考语句的相似语句发送至服务器102。The client device 101 is also configured to present the plurality of candidate similar sentences to the user. Based on the user's modification operation on the displayed at least one candidate similar sentence, at least one modified candidate similar sentence is generated, and the at least one modified candidate similar sentence is sent to the server 102 . In addition, the client device 101 may also send the candidate similar sentence to the server 102 as a similar sentence of the reference sentence based on the user's confirmation operation on any displayed candidate similar sentence.
服务器102还用于在接收到用户确认的参考语句的相似语句时,将该参考语句的相似语句添加至数据库。在接收到至少一个修改后的候选相似语句时,利用该至少一个修改后的候选相似语句对语句生成模型进行更新,从而得到更新后的语句生成模型。之后,服务器102可以继续利用该更新后的语句生成模型,基于参考语句生成多个候选相似语句,并再次将多个候选相似语句发送至客户端设备101进行展示,以重复上述步骤,直至达到终止条件为止。其中,该终止条件可以为服务器102添加至数据库中的参考语句的相似语句达到参考阈值,或者,该终止条件可以为服务器102对语句生成模型的更新轮数达到参考轮数等。 The server 102 is also configured to add similar sentences to the reference sentence to the database when receiving similar sentences to the reference sentence confirmed by the user. When at least one modified candidate similar sentence is received, the sentence generation model is updated using the at least one modified candidate similar sentence, thereby obtaining an updated sentence generation model. Afterwards, the server 102 can continue to use the updated sentence generation model to generate multiple candidate similar sentences based on the reference sentences, and again send the multiple candidate similar sentences to the client device 101 for display, to repeat the above steps until termination is reached. conditions. The termination condition may be that the similar statements added to the reference statements in the database by the server 102 reach the reference threshold, or the termination condition may be that the number of update rounds of the statement generation model by the server 102 reaches the reference number of rounds, etc.
需要说明的是,上述的服务器102可以部署在云环境。例如,该服务器102可以为云数据中心的服务器。或者,该服务器102也可以为边缘环境中的边缘计算设备。例如,该服务器102可以为某个组织的数据中心的计算设备。或者,该服务器102也可以为其他场景或类型的计算设备,本申请实施例对此不做限定。It should be noted that the above-mentioned server 102 can be deployed in a cloud environment. For example, the server 102 may be a server in a cloud data center. Alternatively, the server 102 may also be an edge computing device in an edge environment. For example, the server 102 may be a computing device in a data center of an organization. Alternatively, the server 102 may also be a computing device in other scenarios or types, which is not limited in the embodiments of this application.
另外,上述的客户端设备101可以为诸如笔记本电脑、智能手机、平板电脑等设备。In addition, the above-mentioned client device 101 can be a device such as a laptop computer, a smart phone, a tablet computer, etc.
图2是本申请实施例提供的一种计算机设备的结构示意图。无论本申请实施例提供的语句生成方法通过一台计算机设备还是多台计算机设备来实现,该计算机设备均可以为图2所示的计算机设备200。示例性地,该计算机设备200包括:处理器201,通信总线202,存储器203以及至少一个通信接口204。Figure 2 is a schematic structural diagram of a computer device provided by an embodiment of the present application. Regardless of whether the sentence generation method provided by the embodiment of the present application is implemented by one computer device or multiple computer devices, the computer device can be the computer device 200 shown in FIG. 2 . Exemplarily, the computer device 200 includes: a processor 201, a communication bus 202, a memory 203 and at least one communication interface 204.
其中,处理器201可以是一个通用中央处理器(central processing unit,CPU),特定应用集成电路(application-specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或其任意组合。处理器201可以包括一个或多个芯片,处理器201可以包括AI加速器,例如:神经网络处理器(neural processing unit,NPU)。Among them, the processor 201 can be a general central processing unit (CPU), an application-specific integrated circuit (ASIC), a graphics processing unit (GPU) or any combination thereof. The processor 201 may include one or more chips, and the processor 201 may include an AI accelerator, such as a neural network processor (neural processing unit, NPU).
通信总线202可包括在计算机设备200各个部件(例如,处理器201、存储器203、通信接口204)之间传送信息的通路。Communication bus 202 may include a path that carries information between various components of computer device 200 (eg, processor 201, memory 203, communication interface 204).
存储器203可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其它类型的静态存储设备,随机存取存储器(random access memory,RAM))或者可存储信息和指令的其它类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其它光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其它磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其它介质,但不限于此。存储器203可以是独立存在,通过通信总线202与处理器201相连接。存储器203也可以和处理器201集成在一起。存储器203可以存储计算机指令,当存储器203中存储的计算机指令被处理器201执行时,可以实现本申请实施例提供的语句生成方法。另外,存储器203中还可以存储有处理器在执行上述方法的过程中所需的数据以及所产生的中间数据和/或结果数据。The memory 203 may be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory (RAM)) or other type of static storage device that can store information and instructions. Type of dynamic storage device, it can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc Storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be used by Any other media accessed by a computer, but not limited to this. The memory 203 may exist independently and be connected to the processor 201 through the communication bus 202. The memory 203 may also be integrated with the processor 201. The memory 203 can store computer instructions. When the computer instructions stored in the memory 203 are executed by the processor 201, the statement generation method provided by the embodiment of the present application can be implemented. In addition, the memory 203 may also store data required by the processor during execution of the above method as well as intermediate data and/or result data generated.
通信接口204,使用任何收发器一类的装置,用于与其它设备或通信网络通信,如以太网,无线接入网(RAN),无线局域网(wireless local area networks,WLAN)等。The communication interface 204 uses any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet, wireless access network (RAN), wireless local area networks (WLAN), etc.
作为一种实施例,处理器201可以包括一个或多个CPU。As an embodiment, the processor 201 may include one or more CPUs.
作为一种实施例,计算机设备200可以包括多个处理器。这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。As an embodiment, the computer device 200 may include multiple processors. Each of these processors may be a single-CPU processor or a multi-CPU processor. A processor here may refer to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).
作为一种实施例,该计算机设备200还可以包括输出设备205和输入设备206。输出设备205和处理器201通信,可以以多种方式来显示信息。例如,输出设备205可以是液晶显示器(liquid crystal display,LCD)、发光二级管(light emitting diode,LED)显示设备、阴极射线管(cathode ray tube,CRT)显示设备或投影仪(projector)等。输入设备206和处理器201通信,可以以多种方式接收用户的输入。例如,输入设备206可以是鼠标、键盘、触摸屏设备或传感设备等。 As an embodiment, the computer device 200 may also include an output device 205 and an input device 206. Output device 205 communicates with processor 201 and can display information in a variety of ways. For example, the output device 205 may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, a projector, etc. . Input device 206 communicates with processor 201 and may receive user input in a variety of ways. For example, the input device 206 may be a mouse, a keyboard, a touch screen device, a sensing device, or the like.
接下来对本申请实施例提供的语句生成方法进行介绍。Next, the statement generation method provided by the embodiment of this application is introduced.
图3是本申请实施例提供的一种语句生成方法的流程图。该方法可以应用于一台或多台计算机设备中,下文中以该方法应用于一台终端设备中为例进行说明。参见图3,该方法包括以下步骤:Figure 3 is a flow chart of a statement generation method provided by an embodiment of the present application. This method can be applied to one or more computer devices. The following description takes the method applied to one terminal device as an example. Referring to Figure 3, the method includes the following steps:
步骤301:基于参考语句,通过第一语句生成模型生成多个第一候选相似语句。Step 301: Based on the reference sentence, generate a plurality of first candidate similar sentences through the first sentence generation model.
在本申请实施例中,终端设备上显示有交互界面,该交互界面中可以包括文本输入框,用户可以在该文本输入框中输入一条或多条参考语句,相应地,终端设备可以获取用户输入的参考语句。或者,该交互界面可以包括文档上传选项,当检测到用户对该文档上传选项的选中操作之后,接收用户上传的文档,并从该文档中提取一条或多条参考语句。或者,该交互界面中包括语音选项,当检测到用户对该语音选项的选中操作后,该终端设备开启音频采集设备采集用户的语音信息,之后,通过语音识别模型对该语音信息进行处理,从而得到一条或多条参考语句。In this embodiment of the present application, an interactive interface is displayed on the terminal device. The interactive interface may include a text input box. The user may enter one or more reference sentences in the text input box. Correspondingly, the terminal device may obtain user input. reference sentence. Alternatively, the interactive interface may include a document upload option. After detecting the user's selection of the document upload option, the user receives the document uploaded by the user and extracts one or more reference sentences from the document. Alternatively, the interactive interface includes a voice option. After detecting the user's selection of the voice option, the terminal device turns on the audio collection device to collect the user's voice information, and then processes the voice information through a voice recognition model, thereby Get one or more reference statements.
可选地,在本申请实施例中,该终端设备也可以通过其他方式来获取参考语句。例如,从存储有大量语句的数据库中获取参考语句。其中,该数据库可以部署在该终端设备上,也可以部署在其他设备上,例如,部署在后台服务器中。Optionally, in this embodiment of the present application, the terminal device can also obtain the reference sentence through other methods. For example, get reference statements from a database that stores a large number of statements. The database can be deployed on the terminal device or on other devices, for example, in a background server.
对于获取到的任一条参考语句,终端设备可以将该参考语句进行分词、去除停用词等文本域处理操作。之后,该终端设备可以将处理后的参考语句作为第一语句生成模型的输入,通过该第一语句生成模型对该参考语句进行处理,从而输出多个第一候选相似语句。其中,输出的第一候选相似语句的数量可以由用户来设定,或者也可以为默认数量。例如,输出的第一候选相似语句的数量为10。For any reference sentence obtained, the terminal device can perform text domain processing operations such as word segmentation and stop word removal on the reference sentence. Afterwards, the terminal device can use the processed reference sentence as an input of the first sentence generation model, process the reference sentence through the first sentence generation model, and thereby output a plurality of first candidate similar sentences. The number of output first candidate similar sentences may be set by the user, or may be a default number. For example, the number of output first candidate similar sentences is 10.
在本申请实施例中,该第一语句生成模型可以为AI模型。例如,该第一语句生成模型可以为神经网络模型或决策树模型等。当该第一语句生成模型为神经网络模型时,该神经网络模型可以为采用循环神经网络(recurrent neural network,RNN)、卷积神经网络(convolutional neural networks,CNN)等网络结构的序列到序列(seq2seq)模型。另外,该第一语句生成模型可以是通过大量的无标注数据以及开源的相似问题集进行预训练后的语句生成模型。或者,该第一语句生成模型也可以是已经利用用户修改后的候选相似语句更新过的语句生成模型。In this embodiment of the present application, the first sentence generation model may be an AI model. For example, the first sentence generation model may be a neural network model or a decision tree model. When the first sentence generation model is a neural network model, the neural network model can be a sequence-to-sequence ( seq2seq) model. In addition, the first sentence generation model may be a sentence generation model pre-trained using a large amount of unlabeled data and open source similar problem sets. Alternatively, the first sentence generation model may also be a sentence generation model that has been updated using candidate similar sentences modified by the user.
步骤302:基于用户针对至少一个第一候选相似语句的修改操作,生成至少一个第二候选相似语句。Step 302: Generate at least one second candidate similar sentence based on the user's modification operation on at least one first candidate similar sentence.
在通过第一语句生成模型生成多个第一候选相似语句之后,终端设备可以展示该多个第一候选相似语句。用户可以对展示的多个第一候选相似语句中的至少一个第一候选相似语句执行修改操作。相应地,终端设备基于用户对该至少一个第一候选相似语句的修改操作,生成至少一个第二候选相似语句。其中,该修改操作包括添加字符操作、删除字符操作、交换字符操作、替换字符操作、重写语句操作中的一种或多种。After generating a plurality of first candidate similar sentences through the first sentence generation model, the terminal device may display the plurality of first candidate similar sentences. The user may perform a modification operation on at least one first candidate similar statement among the displayed plurality of first candidate similar statements. Correspondingly, the terminal device generates at least one second candidate similar sentence based on the user's modification operation on the at least one first candidate similar sentence. The modification operation includes one or more of adding character operation, deleting character operation, exchanging character operation, replacing character operation, and rewriting statement operation.
示例性地,终端设备可以在交互界面中显示该多个第一候选相似语句,并显示每个第一候选相似语句对应的操作选项。其中,该操作选项可以包括修改选项、删除选项和确认选项。用户可以对某个第一候选相似语句对应的修改选项执行选中操作,响应于该选中操作,该终端设备可以检测用户对该第一候选相似语句的修改操作,进而基于该修改操作,生成第二候 选相似语句。For example, the terminal device may display the plurality of first candidate similar sentences in the interactive interface, and display the operation options corresponding to each first candidate similar sentence. The operation options may include modification options, deletion options and confirmation options. The user can perform a selection operation on the modification option corresponding to a first candidate similar sentence. In response to the selection operation, the terminal device can detect the user's modification operation on the first candidate similar sentence, and then generate a second modification based on the modification operation. wait Choose similar sentences.
在一种可能的实现方式中,响应于用户对某个第一候选相似语句对应的修改选项的选中操作,该终端设备显示该第一候选相似语句对应的修改子选项。其中,该修改子选项可以包括添加子选项、删除子选项、交换子选项、替换子选项和重写子选项。用户可以基于显示的修改子选项执行添加字符操作、删除字符操作、交换字符操作、替换字符操作、重写语句操作中的一种或多种。相应地,终端设备可以基于用户对该第一候选相似语句执行的修改操作,生成修改后的语句,也即第二候选相似语句。In one possible implementation, in response to the user selecting a modification option corresponding to a first candidate similar sentence, the terminal device displays the modification sub-option corresponding to the first candidate similar sentence. The modification of sub-options may include adding sub-options, deleting sub-options, exchanging sub-options, replacing sub-options and rewriting sub-options. The user can perform one or more of adding character operations, deleting character operations, swapping character operations, replacing character operations, and rewriting statement operations based on the displayed modification sub-options. Correspondingly, the terminal device can generate a modified sentence, that is, a second candidate similar sentence, based on the modification operation performed by the user on the first candidate similar sentence.
示例性地,用户可以对该第一候选相似语句对应的添加子选项执行选中操作,响应于该选中操作,该终端设备可以接收用户在该第一候选相似语句中的某个位置处插入的字符,基于该插入的字符对该第一候选相似语句进行更新。For example, the user can perform a selection operation on the add sub-option corresponding to the first candidate similar sentence. In response to the selection operation, the terminal device can receive a character inserted by the user at a certain position in the first candidate similar sentence. , the first candidate similar statement is updated based on the inserted character.
或者,用户可以对该第一候选相似语句对应的删除子选项执行选中操作,响应于该选中操作,该终端设备可以检测用户选中并删除的字符,进而基于用户删除的字符对该第一候选相似语句进行更新。Alternatively, the user can perform a selection operation on the deletion sub-option corresponding to the first candidate similar sentence. In response to the selection operation, the terminal device can detect the characters selected and deleted by the user, and then based on the characters deleted by the user, the first candidate is similar. statement to update.
或者,用户可以对该第一候选相似语句对应的交换子选项执行选中操作,响应于该选中操作,该终端设备可以检测用户对该第一候选相似语句中的一个或多个字符选中和拖拽操作,基于用户的拖拽操作将选中的一个或多个字符移动至用户指定的位置处。Alternatively, the user can perform a selection operation on the exchange sub-option corresponding to the first candidate similar sentence. In response to the selection operation, the terminal device can detect that the user has selected and dragged one or more characters in the first candidate similar sentence. Operation, based on the user's drag operation, the selected character or characters are moved to the position specified by the user.
或者,用户可以对该第一候选相似语句对应的替换子选项执行选中操作,响应于该选中操作,该终端设备可以检测用户在该第一候选相似语句中选中的字符,并显示文本输入框。获取用户在该文本输入框中输入的字符,将该第一候选相似语句中选中的字符替换为用户输入的字符,从而得到更新后的第一候选相似语句。Alternatively, the user can perform a selection operation on the replacement sub-option corresponding to the first candidate similar sentence. In response to the selection operation, the terminal device can detect the character selected by the user in the first candidate similar sentence and display a text input box. The characters input by the user in the text input box are obtained, and the selected characters in the first candidate similar sentence are replaced with the characters input by the user, thereby obtaining the updated first candidate similar sentence.
或者,用户可以对该第一候选相似语句对应的重写子选项执行选中操作,响应于该选中操作,该终端设备可以显示文本输入框,获取用户在该文本输入框中输入的重写语句,将该重写语句作为更新后的语句。Alternatively, the user can perform a selection operation on the rewriting sub-option corresponding to the first candidate similar sentence. In response to the selection operation, the terminal device can display a text input box and obtain the rewriting sentence input by the user in the text input box. Treat the rewritten statement as the updated statement.
需要说明的是,通过上述方法,用户可以依次通过至少一种修改子选项对第一候选相似语句执行至少一种修改操作,终端设备每接收到用户对第一候选相似语句的一种修改操作时,基于该种修改操作对最近一次更新过的第一候选相似语句进行更新。在此基础上,该终端设备的交互界面中还可以包括结束修改选项,这样,当用户修改完毕之后,可以选中该结束修改选项。当终端设备检测到对该结束修改选项的选中操作时,将最后一次更新后的第一候选相似语句作为第二候选相似语句。It should be noted that through the above method, the user can sequentially perform at least one modification operation on the first candidate similar sentence through at least one modification sub-option. Each time the terminal device receives a modification operation on the first candidate similar sentence by the user, , based on this modification operation, update the most recently updated first candidate similar statement. On this basis, the interactive interface of the terminal device may also include an end modification option, so that after the user completes the modification, the user can select the end modification option. When the terminal device detects the selection operation of the end modification option, the first candidate similar statement after the last update is used as the second candidate similar statement.
例如,第一语句生成模型基于参考语句生成了n个第一候选相似语句,分别为语句1-1、语句1-2、……、语句1-n。参见图4,终端设备显示展示页面,在该展示页面中展示该n个第一候选相似语句,并在每个语句后显示对应的修改选项、确认选项和删除选项。用户可以浏览每个语句。如果用户对某个语句部分满意,例如,用户认为语句1-3可以通过修改而成为符合要求的相似语句,则用户可以对语句1-3后显示的修改选项执行选中操作。响应于对该修改选项的选中操作,该终端设备可以显示语句1-3的修改界面。在该修改界面中显示有语句1-3、多个修改子选项以及结束修改选项,该多个修改子选项分别添加子选项、删除子选项、交换子选项、替换子选项和重写子选项。用户可以参考前述介绍的方法,基于该多个修改子选项对该语句1-3执行修改操作。在完成修改之后,用户可以对结束修改选项执行选中操作,响应于对该结束修改选项的选中操作,该终端设备可以在展示页面中显示修改得到语 句1-3’。另外,终端设备还可以在该展示页面中显示继续选项和结束选项,如果用户已经对该展示页面中的展示的语句完成了修改,则用户可以选中该继续选项。响应于对该继续选项的选中操作,该终端设备可以执行步骤303。如果用户不再需要生成该参考语句的相似语句,则用户也可以通过点击结束选项来指示终端设备结束对语句生成模型的更新以及结束生成该参考语句的候选相似语句。For example, the first sentence generation model generates n first candidate similar sentences based on the reference sentence, which are respectively sentence 1-1, sentence 1-2, ..., and sentence 1-n. Referring to Figure 4, the terminal device displays a display page in which the n first candidate similar sentences are displayed, and corresponding modification options, confirmation options, and deletion options are displayed after each sentence. Users can browse each statement. If the user is partially satisfied with a certain statement, for example, if the user believes that statements 1-3 can be modified to become similar statements that meet the requirements, the user can select the modification options displayed after statements 1-3. In response to the selection operation of the modification option, the terminal device may display the modification interface of statements 1-3. The modification interface displays statements 1-3, multiple modification sub-options, and an end modification option. The multiple modification sub-options include adding sub-options, deleting sub-options, exchanging sub-options, replacing sub-options, and rewriting sub-options. Users can refer to the method introduced previously to perform modification operations on statements 1-3 based on the multiple modification sub-options. After completing the modification, the user can perform a selection operation on the end modification option. In response to the selection operation on the end modification option, the terminal device can display the modification results in the display page. Sentences 1-3'. In addition, the terminal device can also display a continue option and an end option in the display page. If the user has completed modifying the displayed statement in the display page, the user can select the continue option. In response to the selection operation of the continue option, the terminal device may perform step 303. If the user no longer needs to generate similar sentences for the reference sentence, the user can also instruct the terminal device to end updating the sentence generation model and end generating candidate similar sentences for the reference sentence by clicking the end option.
在另一种可能的实现方式中,响应于用户对某个第一候选相似语句对应的修改选项的选中操作,该终端设备可以将该第一候选相似语句从不可编辑状态更改为可编辑状态。在该第一候选相似语句处于可编辑状态下时,用户可以在该第一候选相似语句中插入字符、对该第一候选相似语句中的字符进行删除、替换、交换等。该终端设备可以根据用户执行的修改操作来对该第一候选相似语句进行更新,从而得到修改后的语句,也即第二候选相似语句。In another possible implementation, in response to the user selecting a modification option corresponding to a first candidate similar statement, the terminal device can change the first candidate similar statement from an uneditable state to an editable state. When the first candidate similar sentence is in an editable state, the user can insert characters into the first candidate similar sentence, delete, replace, exchange, etc. characters in the first candidate similar sentence. The terminal device can update the first candidate similar sentence according to the modification operation performed by the user, thereby obtaining the modified sentence, that is, the second candidate similar sentence.
可选地,用户也可以对某个第一候选相似语句的确认选项执行选中操作,响应于对确认选项的选中操作,该终端设备可以将该第一候选相似语句作为该参考语句的相似语句存储至数据库中。其中,该数据库可以为存储有大量语句的语料库。Optionally, the user can also perform a selection operation on the confirmation option of a first candidate similar sentence. In response to the selection operation on the confirmation option, the terminal device can store the first candidate similar sentence as a similar sentence of the reference sentence. to the database. The database may be a corpus storing a large number of statements.
可选地,用户也可以对某个第一候选相似语句的删除选项执行选中操作,响应于对删除选项的选中操作,该终端设备可以直接将该第一候选相似语句删除。Optionally, the user can also perform a selection operation on the deletion option of a certain first candidate similar sentence. In response to the selection operation on the deletion option, the terminal device can directly delete the first candidate similar sentence.
图5是本申请实施例示出的一种用户与终端设备之间交互的流程示意图。如图5所示,对于终端设备上显示的某个候选相似语句,例如语句n,用户判断该语句n是否满意。如果满意,则点击确认选项,之后,终端设备将该语句n添加至数据库中。如果不满意,则点击删除选项,之后,终端设备将该语句n删除。如果对该语句n部分满意,则用户点击修改选项。之后,用户确定该语句n存在的问题。如果是语义存在问题,则用户可以点击重选子选项,输入重写的数据,从而得到修改后的相似语句。如果是词法存在问题,则用户可以点击添加子选项、删除子选项、修改子选项和交换子选项中的一个或多个,以对该语句n进行修改。之后,如果修改完成,则点击结束修改选项。之后,该终端设备基于用户的修改操作生成修改后的相似语句。FIG. 5 is a schematic flowchart of interaction between a user and a terminal device according to an embodiment of the present application. As shown in Figure 5, for a certain candidate similar statement displayed on the terminal device, such as statement n, the user determines whether the statement n is satisfactory. If satisfied, click the confirmation option, and then the terminal device adds the statement n to the database. If you are not satisfied, click the delete option, and then the terminal device deletes the statement n. If satisfied with the n part of the statement, the user clicks the modification option. Afterwards, the user determines the problem with statement n. If there is a semantic problem, the user can click the reselect sub-option and enter the rewritten data to obtain a modified similar statement. If there is a lexical problem, the user can click one or more of add sub-option, delete sub-option, modify sub-option and exchange sub-option to modify the statement n. After that, if the modification is completed, click on the End Modification option. Afterwards, the terminal device generates a modified similar statement based on the user's modification operation.
通过上述介绍的方法,用户可以对终端设备展示的至少一个第一候选相似语句进行修改,从而得到至少一个第二候选相似语句。Through the method introduced above, the user can modify at least one first candidate similar sentence displayed on the terminal device, thereby obtaining at least one second candidate similar sentence.
可选地,在一些可能的情况中,在终端设备展示多个第一候选相似语句后,如果用户对该多个第一候选相似语句均满意,则用户可以对该多个第一候选相似语句均执行确认操作,这样,终端设备可以将该多个第一候选相似语句均作为该参考语句的相似语句添加至数据库。在这种情况下,终端设备将不对语句生成模型进行更新。或者,如果用户对展示的多个第一候选相似语句中的部分语句满意,部分语句不满意,则用户可以对满意的第一候选相似语句执行确认操作,而对不满意的第一候选相似语句不执行修改操作。这样,终端设备可以基于用户操作,将用户确认的第一候选相似语句作为该参考语句的相似语句添加至数据库,而将用户未执行修改操作的第一候选相似语句丢弃。在这种情况下,该终端设备也不对语句生成模型进行更新。Optionally, in some possible situations, after the terminal device displays a plurality of first candidate similar sentences, if the user is satisfied with the plurality of first candidate similar sentences, the user can select the plurality of first candidate similar sentences. The confirmation operation is all performed, so that the terminal device can add the plurality of first candidate similar sentences to the database as similar sentences of the reference sentence. In this case, the end device will not update the statement generation model. Alternatively, if the user is satisfied with some of the displayed first candidate similar sentences and is dissatisfied with some of the sentences, the user can perform a confirmation operation on the satisfied first candidate similar sentences and not on the dissatisfied first candidate similar sentences. No modification operations are performed. In this way, the terminal device can add the first candidate similar statement confirmed by the user to the database as a similar statement of the reference statement based on the user operation, and discard the first candidate similar statement for which the user has not performed a modification operation. In this case, the terminal device also does not update the sentence generation model.
步骤303:利用至少一个第二候选相似语句对第一语句生成模型进行更新,得到第二语句生成模型。Step 303: Update the first sentence generation model using at least one second candidate similar sentence to obtain a second sentence generation model.
在得到至少一个第二候选相似语句之后,终端设备可以获取至少一个第一候选相似语句中的每个第一候选相似语句与对应的修改后的第二候选相似语句之间的差异信息;基于该差 异信息,对第一语句生成模型进行参数更新,得到第二语句生成模型。After obtaining at least one second candidate similar sentence, the terminal device can obtain the difference information between each first candidate similar sentence in the at least one first candidate similar sentence and the corresponding modified second candidate similar sentence; based on the Difference Based on the different information, the parameters of the first sentence generation model are updated to obtain the second sentence generation model.
由前述介绍可知,至少一个第二候选相似语句是对至少一个第一候选相似语句进行修改后得到的。其中,该至少一个第一候选相似语句为该第一语句生成模型的实际输出值,而经过修改得到的至少一个第二候选相似语句则可以作为该第一语句生成模型的期望输出值。基于此,终端设备可以计算该至少一个第一候选相似语句中的每个第一候选相似语句与对应的修改后的第二候选相似语句之间的误差值,也即,计算每个实际输出值与对应的期望输出值之间的误差值,将计算得到的误差值作为差异信息,进而基于该差异信息,对该第一语句生成模型进行参数更新,得到第二语句生成模型。It can be known from the foregoing introduction that at least one second candidate similar sentence is obtained by modifying at least one first candidate similar sentence. The at least one first candidate similar sentence is the actual output value of the first sentence generation model, and the at least one modified second candidate similar sentence can be used as the expected output value of the first sentence generation model. Based on this, the terminal device may calculate an error value between each first candidate similar sentence in the at least one first candidate similar sentence and the corresponding modified second candidate similar sentence, that is, calculate each actual output value The error value between the error value and the corresponding expected output value is used as the difference information, and then based on the difference information, the parameters of the first sentence generation model are updated to obtain the second sentence generation model.
需要说明的是,在本申请实施例中,在得到每个第一候选相似语句与对应的第二候选相似语句之间的误差值之后,以第一语句生成模型为神经网络模型为例,该终端设备可以利用基于自回归的交叉熵损失函数,根据计算得到的至少一个误差值确定总误差值,如果该总误差值大于误差阈值,则该终端设备可以基于该总误差值从该神经网络模型的输出层开始进行反向传播,以此来逐层更新该神经网络模型的每个层的参数,从而得到第二语句生成模型。It should be noted that in the embodiment of the present application, after obtaining the error value between each first candidate similar sentence and the corresponding second candidate similar sentence, taking the first sentence generation model as a neural network model as an example, the The terminal device can use the cross-entropy loss function based on autoregression to determine the total error value based on the calculated at least one error value. If the total error value is greater than the error threshold, the terminal device can determine the total error value from the neural network model based on the total error value. The output layer of the neural network model begins to perform backpropagation to update the parameters of each layer of the neural network model layer by layer, thereby obtaining the second sentence generation model.
当然,如果第一语句生成模型为其他类型的AI模型,终端设备也可以通过其他更新方式对该第一语句生成模型的参数进行更新,从而得到第二语句生成模型,例如,可以通过牛顿法来对第一语句生成模型进行更新,本申请实施例在此不再赘述。Of course, if the first sentence generation model is another type of AI model, the terminal device can also update the parameters of the first sentence generation model through other update methods to obtain the second sentence generation model. For example, Newton's method can be used to update the parameters of the first sentence generation model. The first sentence generation model is updated, which will not be described again in the embodiment of this application.
在得到第二语句生成模型之后,终端设备可以将前述介绍的参考语句作为该第二语句生成模型的输入,通过该第二语句生成模型对该参考语句进行处理,从而输出多个第三候选相似语句。之后,该终端设备可以如步骤302中介绍的方法向用户展示该多个第三候选相似语句,并再次基于用户对该展示的候选相似语句的操作来决定是将展示的候选相似语句作为参考语句的相似语句加入到数据库、还是删除或修改候选相似语句。如果该终端设备基于用户对至少一个第三候选相似语句的修改操作得到了至少一个修改后的候选相似语句,则可以继续对该第二语句生成模型进行更新,以此类推,直至终端设备对语句生成模型的更新轮数达到指定轮数,或者,直至该终端设备存储至数据库的该参考语句的相似语句达到指定数量为止,该终端设备可以停止对该语句生成模型进行更新,并停止通过语句生成模型生成该参考语句的候选相似语句。After obtaining the second sentence generation model, the terminal device can use the aforementioned reference sentence as the input of the second sentence generation model, process the reference sentence through the second sentence generation model, and thereby output a plurality of third similar candidates. statement. Afterwards, the terminal device can display the plurality of third candidate similar sentences to the user in the method introduced in step 302, and decide whether to use the displayed candidate similar sentences as a reference sentence based on the user's operation on the displayed candidate similar sentences. Add similar statements to the database, delete or modify candidate similar statements. If the terminal device obtains at least one modified candidate similar sentence based on the user's modification operation of at least one third candidate similar sentence, the second sentence generation model can continue to be updated, and so on, until the terminal device modifies the sentence The number of update rounds of the generated model reaches the specified number of rounds, or until the similar sentences of the reference sentence stored by the terminal device in the database reach the specified number, the terminal device can stop updating the sentence generation model and stop generating by sentences. The model generates candidate similar sentences for the reference sentence.
例如,在基于至少一个第二候选相似语句更新得到第二语句生成模型之后,该第二语句生成模型生成了n个第三候选相似语句,分别为语句2-1、语句2-2、语句2-3、……、语句2-n。参见图6,终端设备可以展示该n个第三候选相似语句,并在每个语句后显示对应的操作选项,以供用户执行对应的操作。For example, after the second sentence generation model is updated based on at least one second candidate similar sentence, the second sentence generation model generates n third candidate similar sentences, namely sentence 2-1, sentence 2-2, and sentence 2. -3,..., statement 2-n. Referring to Figure 6, the terminal device can display the n third candidate similar sentences, and display corresponding operation options after each sentence for the user to perform the corresponding operation.
可选地,在本申请实施例中,在利用该至少一个第二候选相似语句对第一语句生成模型进行更新之后,由于该至少一个第二候选相似语句是由用户修改后的符合要求的相似语句,因此,该终端设备可以将该至少一个第二候选相似语句也存储至数据库中。Optionally, in this embodiment of the present application, after the first sentence generation model is updated using the at least one second candidate similar sentence, since the at least one second candidate similar sentence is a similar sentence that meets the requirements modified by the user, statement, therefore, the terminal device can also store the at least one second candidate similar statement into the database.
基于上述实施例中提供的语句生成方法,本申请实施例给出了一种语句生成过程的示例性流程图。参见图7,语句生成模型基于参考语句初次生成相似语句1至n,之后,在交互界面上展示该相似语句1至n。用户可以通过交互界面中提供的操作选项对展示的语句进行反馈。其中,对于用户确认满意的语句,可以直接存储至数据库。对于用户不满意的语句直接删除。对于用户修改后的语句,可以利用该修改后的语句对语句生成模型进行参数更新,并将该修改后的语句存储至数据库。 Based on the sentence generation method provided in the above embodiments, the embodiment of this application provides an exemplary flow chart of the sentence generation process. Referring to Figure 7, the sentence generation model first generates similar sentences 1 to n based on the reference sentence, and then displays the similar sentences 1 to n on the interactive interface. Users can provide feedback on the displayed statements through the operation options provided in the interactive interface. Among them, statements that the user confirms to be satisfactory can be directly stored in the database. Statements that users are not satisfied with will be deleted directly. For the statement modified by the user, the modified statement can be used to update the parameters of the statement generation model, and the modified statement can be stored in the database.
在本申请实施例中,在通过第一语句生成模型生成参考语句的多个第一候选相似语句后,可以基于用户对至少一个第一候选相似语句的修改操作来获得至少一个第二候选相似语句,进而利用该至少一个第二候选相似语句来对第一语句生成模型进行更新。也即,本申请实施例可以通过用户修改后的候选相似语句来实时优化语句生成模型,在此基础上,基于优化后的语句生成模型和参考语句能够再次生成新的更高质量的相似语句。这样,不仅实现了对语料库的有效扩充,还能够不断的提升语句生成模型的准确率。In this embodiment of the present application, after generating a plurality of first candidate similar sentences of the reference sentence through the first sentence generation model, at least one second candidate similar sentence can be obtained based on the user's modification operation on at least one first candidate similar sentence. , and then use the at least one second candidate similar sentence to update the first sentence generation model. That is to say, the embodiments of the present application can optimize the sentence generation model in real time through user-modified candidate similar sentences. On this basis, new higher-quality similar sentences can be generated again based on the optimized sentence generation model and reference sentences. In this way, not only the corpus is effectively expanded, but also the accuracy of the sentence generation model can be continuously improved.
另外,在本申请实施例中,部署在终端设备上的语句生成模型可以预先通过大量的无标注数据和开源的已标注的相似问题集来进行预训练。这样,可以提高模型的泛化能力和冷启动能力。In addition, in this embodiment of the present application, the sentence generation model deployed on the terminal device can be pre-trained through a large amount of unlabeled data and open source labeled similar question sets. In this way, the generalization ability and cold start ability of the model can be improved.
需要说明的是,上文中以终端设备执行该语句生成方法为例对该方法的实现过程进行了介绍。当该语句生成方法由多个计算机设备执行时,例如,由图1所示的客户端设备和服务器来执行时,则上述实施例中展示候选相似语句、与用户交互以及基于用户的修改操作获取修改后的候选相似语句的步骤可以由该客户端设备来实现,或者,基于用户的修改操作获取修改后的候选相似语句的步骤也可以由服务器来实现。另外,基于参考语句生成候选相似语句以及基于修改后的候选相似语句对语句生成模型进行参数更新的步骤均可以由服务器来实现。It should be noted that the implementation process of the method is introduced above by taking the terminal device executing the statement generation method as an example. When the sentence generation method is executed by multiple computer devices, for example, by the client device and the server shown in Figure 1, then the above embodiment shows candidate similar sentences, interaction with the user, and acquisition based on the user's modification operation. The step of obtaining the modified candidate similar sentences can be implemented by the client device, or the step of obtaining the modified candidate similar sentences based on the user's modification operation can also be implemented by the server. In addition, the steps of generating candidate similar sentences based on the reference sentences and updating the parameters of the sentence generation model based on the modified candidate similar sentences can be implemented by the server.
接下来对本申请实施例提供的语句生成装置进行介绍。Next, the sentence generation device provided by the embodiment of the present application will be introduced.
参见图8,本申请实施例提供了一种语句生成装置800,该装置800包括生成模块801、修改模块802和更新模块803。Referring to Figure 8, this embodiment of the present application provides a statement generation device 800. The device 800 includes a generation module 801, a modification module 802 and an update module 803.
生成模块801,用于执行上述实施例中的步骤301;Generating module 801, used to perform step 301 in the above embodiment;
修改模块802,用于执行上述实施例中的步骤302;Modify module 802, used to perform step 302 in the above embodiment;
更新模块803,用于执行上述实施例中的步骤303。The update module 803 is used to perform step 303 in the above embodiment.
其中,该生成模块801、修改模块802和更新模块803可以部署在一台计算机设备上,并由该计算机设备上的处理器来执行。或者,该生成模块801、修改模块802和更新模块803可以分布在多台计算机设备上,由多台计算机设备的处理器来执行以共同实现上述的语句生成方法。The generation module 801, the modification module 802 and the update module 803 can be deployed on a computer device and executed by a processor on the computer device. Alternatively, the generation module 801, modification module 802 and update module 803 can be distributed on multiple computer devices and executed by processors of multiple computer devices to jointly implement the above statement generation method.
可选地,语句生成模型为人工智能模型。Optionally, the sentence generation model is an artificial intelligence model.
可选地,参见图8,该装置800还包括:Optionally, referring to Figure 8, the device 800 also includes:
展示模块804,用于向用户展示多个第一候选相似语句;Display module 804, used to display multiple first candidate similar sentences to the user;
生成模块801,还用于基于参考语句和第二语句生成模型,生成多个第三候选相似语句;The generation module 801 is also used to generate a plurality of third candidate similar sentences based on the reference sentence and the second sentence generation model;
展示模块804,还用于向用户展示多个第三候选相似语句。The display module 804 is also used to display multiple third candidate similar sentences to the user.
可选地,修改操作包括下述的一种或多种:Optionally, the modification operation includes one or more of the following:
添加字符操作、删除字符操作、交换字符操作、替换字符操作、重写语句操作。Add character operations, delete character operations, exchange character operations, replace character operations, and rewrite statement operations.
可选地,更新模块803主要用于:Optionally, the update module 803 is mainly used for:
获取至少一个第一候选相似语句中的每个第一候选相似语句与对应的修改后的第二候选相似语句之间的差异信息;Obtaining difference information between each first candidate similar statement in the at least one first candidate similar statement and the corresponding modified second candidate similar statement;
基于差异信息,对第一语句生成模型进行参数更新,得到第二语句生成模型。Based on the difference information, the parameters of the first sentence generation model are updated to obtain the second sentence generation model.
综上所述,在本申请实施例中,在通过第一语句生成模型生成参考语句的多个第一候选 相似语句后,可以基于用户对至少一个第一候选相似语句的修改操作来获得至少一个第二候选相似语句,进而利用该至少一个第二候选相似语句来对第一语句生成模型进行更新。也即,本申请实施例可以通过用户修改后的候选相似语句来实时优化语句生成模型,在此基础上,基于优化后的语句生成模型和参考语句能够再次生成新的更高质量的相似语句。这样,不仅实现了对语料库的有效扩充,还能够不断的提升语句生成模型的准确率。To sum up, in the embodiment of the present application, multiple first candidates of the reference sentence are generated through the first sentence generation model. After generating similar sentences, at least one second candidate similar sentence can be obtained based on the user's modification operation on at least one first candidate similar sentence, and then the at least one second candidate similar sentence can be used to update the first sentence generation model. That is to say, the embodiments of the present application can optimize the sentence generation model in real time through user-modified candidate similar sentences. On this basis, new higher-quality similar sentences can be generated again based on the optimized sentence generation model and reference sentences. In this way, not only the corpus is effectively expanded, but also the accuracy of the sentence generation model can be continuously improved.
需要说明的是:上述实施例提供的语句生成装置在生成参考语句的相似语句时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的语句生成装置与语句生成方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that when the sentence generation device provided in the above embodiment generates similar sentences to the reference sentence, the division of the above functional modules is only used as an example. In practical applications, the above functions can be allocated to different functions as needed. Module completion means dividing the internal structure of the device into different functional modules to complete all or part of the functions described above. In addition, the sentence generation device provided by the above embodiments and the sentence generation method embodiments belong to the same concept. Please refer to the method embodiments for the specific implementation process, which will not be described again here.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意结合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如:同轴电缆、光纤、数据用户线(digital subscriber line,DSL))或无线(例如:红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如:软盘、硬盘、磁带)、光介质(例如:数字通用光盘(digital versatile disc,DVD))、或者半导体介质(例如:固态硬盘(solid state disk,SSD))等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated. The available media may be magnetic media (such as floppy disks, hard disks, tapes), optical media (such as digital versatile discs (DVD)), or semiconductor media (such as solid state disks (SSD) )wait.
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps to implement the above embodiments can be completed by hardware, or can be completed by instructing relevant hardware through a program. The program can be stored in a computer-readable storage medium. The above-mentioned The storage media mentioned can be read-only memory, magnetic disks or optical disks, etc.
以上所述并不用以限制本申请实施例,凡在本申请实施例的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请实施例的保护范围之内。 The above description is not intended to limit the embodiments of the present application. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the embodiments of the present application shall be included in the protection scope of the embodiments of the present application.

Claims (12)

  1. 一种语句生成方法,其特征在于,所述方法包括:A sentence generation method, characterized in that the method includes:
    基于参考语句,通过第一语句生成模型生成多个第一候选相似语句;Based on the reference sentence, generate multiple first candidate similar sentences through the first sentence generation model;
    基于用户针对至少一个第一候选相似语句的修改操作,生成至少一个第二候选相似语句,所述多个第一候选相似语句包括所述至少一个第一候选相似语句;Generate at least one second candidate similar statement based on the user's modification operation on at least one first candidate similar statement, the plurality of first candidate similar statements including the at least one first candidate similar statement;
    利用所述至少一个第二候选相似语句对所述第一语句生成模型进行更新,得到第二语句生成模型。The first sentence generation model is updated using the at least one second candidate similar sentence to obtain a second sentence generation model.
  2. 根据权利要求1所述的方法,其特征在于,所述语句生成模型为人工智能模型。The method according to claim 1, characterized in that the sentence generation model is an artificial intelligence model.
  3. 根据权利要求1或2所述的方法,其特征在于,所述生成多个第一候选相似语句后,所述方法还包括:The method according to claim 1 or 2, characterized in that after generating a plurality of first candidate similar sentences, the method further includes:
    向所述用户展示所述多个第一候选相似语句;displaying the plurality of first candidate similar sentences to the user;
    所述方法还包括:The method also includes:
    基于所述参考语句和所述第二语句生成模型,生成多个第三候选相似语句;Generate a plurality of third candidate similar sentences based on the reference sentence and the second sentence generation model;
    向所述用户展示所述多个第三候选相似语句。The plurality of third candidate similar statements are displayed to the user.
  4. 根据权利要求1至3中任一所述的方法,其特征在于,所述修改操作包括下述的一种或多种:The method according to any one of claims 1 to 3, characterized in that the modification operation includes one or more of the following:
    添加字符操作、删除字符操作、交换字符操作、替换字符操作、重写语句操作。Add character operations, delete character operations, exchange character operations, replace character operations, and rewrite statement operations.
  5. 根据权利要求1至4中任一所述的方法,其特征在于,所述利用所述至少一个第二候选相似语句对所述第一语句生成模型进行更新,得到第二语句生成模型,包括:The method according to any one of claims 1 to 4, characterized in that, using the at least one second candidate similar sentence to update the first sentence generation model to obtain a second sentence generation model includes:
    获取所述至少一个第一候选相似语句中的每个第一候选相似语句与对应的修改后的第二候选相似语句之间的差异信息;Obtaining difference information between each first candidate similar statement in the at least one first candidate similar statement and the corresponding modified second candidate similar statement;
    基于所述差异信息,对所述第一语句生成模型进行参数更新,得到所述第二语句生成模型。Based on the difference information, parameters of the first sentence generation model are updated to obtain the second sentence generation model.
  6. 一种语句生成装置,其特征在于,所述装置包括:A sentence generation device, characterized in that the device includes:
    生成模块,用于基于参考语句,通过第一语句生成模型生成多个第一候选相似语句;A generation module, configured to generate multiple first candidate similar sentences through the first sentence generation model based on the reference sentence;
    修改模块,用于基于用户针对至少一个第一候选相似语句的修改操作,生成至少一个第二候选相似语句,所述多个第一候选相似语句包括所述至少一个第一候选相似语句;A modification module configured to generate at least one second candidate similar statement based on the user's modification operation on at least one first candidate similar statement, the plurality of first candidate similar statements including the at least one first candidate similar statement;
    更新模块,用于利用所述至少一个第二候选相似语句对所述第一语句生成模型进行更新,得到第二语句生成模型。An update module, configured to update the first sentence generation model using the at least one second candidate similar sentence to obtain a second sentence generation model.
  7. 根据权利要求6所述的装置,其特征在于,所述语句生成模型为人工智能模型。 The device according to claim 6, wherein the sentence generation model is an artificial intelligence model.
  8. 根据权利要求6或7所述的装置,其特征在于,所述装置还包括:The device according to claim 6 or 7, characterized in that the device further includes:
    展示模块,用于向所述用户展示所述多个第一候选相似语句;a display module, configured to display the plurality of first candidate similar sentences to the user;
    所述生成模块,还用于基于所述参考语句和所述第二语句生成模型,生成多个第三候选相似语句;The generation module is also configured to generate a plurality of third candidate similar sentences based on the reference sentence and the second sentence generation model;
    所述展示模块,还用于向所述用户展示所述多个第三候选相似语句。The display module is also configured to display the plurality of third candidate similar sentences to the user.
  9. 根据权利要求6至8中任一所述的装置,其特征在于,所述修改操作包括下述的一种或多种:The device according to any one of claims 6 to 8, characterized in that the modification operation includes one or more of the following:
    添加字符操作、删除字符操作、交换字符操作、替换字符操作、重写语句操作。Add character operations, delete character operations, exchange character operations, replace character operations, and rewrite statement operations.
  10. 根据权利要求6至9中任一所述的装置,其特征在于,所述更新模块主要用于:The device according to any one of claims 6 to 9, characterized in that the update module is mainly used for:
    获取所述至少一个第一候选相似语句中的每个第一候选相似语句与对应的修改后的第二候选相似语句之间的差异信息;Obtaining difference information between each first candidate similar statement in the at least one first candidate similar statement and the corresponding modified second candidate similar statement;
    基于所述差异信息,对所述第一语句生成模型进行参数更新,得到所述第二语句生成模型。Based on the difference information, parameters of the first sentence generation model are updated to obtain the second sentence generation model.
  11. 一种计算机设备,其特征在于,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条程序指令或代码,所述至少一条程序指令或代码由所述处理器加载并执行,以使所述计算机设备实现权利要求1-5中任一所述的语句生成方法。A computer device, characterized in that the computer device includes a processor and a memory, at least one program instruction or code is stored in the memory, and the at least one program instruction or code is loaded and executed by the processor to The computer device is caused to implement the statement generation method described in any one of claims 1-5.
  12. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当所述指令在计算机设备上运行时,使得计算机设备执行权利要求1-5任一项所述的语句生成方法。 A computer-readable storage medium, characterized in that instructions are stored in the computer-readable storage medium, and when the instructions are run on a computer device, the computer device executes the method described in any one of claims 1-5. Statement generation method.
PCT/CN2023/090386 2022-06-17 2023-04-24 Statement generation method and device and storage medium WO2023241226A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210693429.8 2022-06-17
CN202210693429.8A CN117291181A (en) 2022-06-17 2022-06-17 Statement generation method, device and storage medium

Publications (1)

Publication Number Publication Date
WO2023241226A1 true WO2023241226A1 (en) 2023-12-21

Family

ID=89192175

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/090386 WO2023241226A1 (en) 2022-06-17 2023-04-24 Statement generation method and device and storage medium

Country Status (2)

Country Link
CN (1) CN117291181A (en)
WO (1) WO2023241226A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112906392A (en) * 2021-03-23 2021-06-04 北京天融信网络安全技术有限公司 Text enhancement method, text classification method and related device
CN113807074A (en) * 2021-03-12 2021-12-17 京东科技控股股份有限公司 Similar statement generation method and device based on pre-training language model
CN114328857A (en) * 2021-11-29 2022-04-12 腾讯科技(深圳)有限公司 Statement extension method, device and computer readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113807074A (en) * 2021-03-12 2021-12-17 京东科技控股股份有限公司 Similar statement generation method and device based on pre-training language model
CN112906392A (en) * 2021-03-23 2021-06-04 北京天融信网络安全技术有限公司 Text enhancement method, text classification method and related device
CN114328857A (en) * 2021-11-29 2022-04-12 腾讯科技(深圳)有限公司 Statement extension method, device and computer readable storage medium

Also Published As

Publication number Publication date
CN117291181A (en) 2023-12-26

Similar Documents

Publication Publication Date Title
US11394667B2 (en) Chatbot skills systems and methods
EP3956763B1 (en) Systems and methods for semi-automated data transformation and presentation of content through adapted user interface
US20190103111A1 (en) Natural Language Processing Systems and Methods
US10120955B2 (en) State tracking over machine-learned relational trees in a dialog system
US20210342549A1 (en) Method for training semantic analysis model, electronic device and storage medium
US11429651B2 (en) Document provenance scoring based on changes between document versions
US7809552B2 (en) Instance-based sentence boundary determination by optimization
US11281862B2 (en) Significant correlation framework for command translation
US9514098B1 (en) Iteratively learning coreference embeddings of noun phrases using feature representations that include distributed word representations of the noun phrases
CN110073349B (en) Word order suggestion considering frequency and formatting information
US11321534B2 (en) Conversation space artifact generation using natural language processing, machine learning, and ontology-based techniques
JP2022050379A (en) Semantic retrieval method, apparatus, electronic device, storage medium, and computer program product
JP2022024102A (en) Method for training search model, method for searching target object and device therefor
US20160147844A1 (en) Applying Level of Permanence to Statements to Influence Confidence Ranking
US20150169676A1 (en) Generating a Table of Contents for Unformatted Text
US20230021797A1 (en) Dynamic cross-platform ask interface and natural language processing model
WO2024011813A1 (en) Text expansion method and apparatus, device, and medium
CN111508502A (en) Transcription correction using multi-tag constructs
US20200410056A1 (en) Generating machine learning training data for natural language processing tasks
US20210390258A1 (en) Systems and methods for identification of repetitive language in document using linguistic analysis and correction thereof
WO2023241226A1 (en) Statement generation method and device and storage medium
KR102285115B1 (en) Chatbot-based web page navigation apparatus and method
US20220198358A1 (en) Method for generating user interest profile, electronic device and storage medium
US20220269659A1 (en) Method, device and storage medium for deduplicating entity nodes in graph database
WO2019113938A1 (en) Data annotation method and apparatus, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23822797

Country of ref document: EP

Kind code of ref document: A1