CN118152537A

CN118152537A - Method and device for improving large model speech operation generation quality, storage medium and electronic equipment

Info

Publication number: CN118152537A
Application number: CN202410333016.8A
Authority: CN
Inventors: 于洋; 曾文佳; 梁鹏斌; 李航
Original assignee: Lingxi Beijing Technology Co Ltd
Current assignee: Lingxi Beijing Technology Co Ltd
Priority date: 2024-03-22
Filing date: 2024-03-22
Publication date: 2024-06-07

Abstract

The application relates to the technical field of intelligent customer service, and particularly provides a method and device for improving generation quality of large model speech surgery, a storage medium and electronic equipment, wherein the method comprises the following steps: determining a problem to be replied based on the current problem in the dialogue content; obtaining a plurality of answer dialogs matched with the to-be-replied problem based on a retrieval enhancement generation model RAG (RETRIEVAL-Augmented Generation); and inputting the to-be-replied question and the plurality of reply dialects into a target large language model to generate a target reply dialects. According to some embodiments of the application, the reasoning efficiency, quality and effect of the reply phone operation can be improved, and the user experience is improved.

Description

Method and device for improving large model speech operation generation quality, storage medium and electronic equipment

Technical Field

The application relates to the technical field of intelligent customer service, in particular to a method and device for improving generation quality of large model speech surgery, a storage medium and electronic equipment.

Background

With the development of artificial intelligence, intelligent customer service robots are widely used in various business fields.

At present, in the process of interaction between an intelligent customer service robot and a user, a reply content corresponding to the text content is generally generated by extracting keywords from the text content input by the user or performing semantic analysis. However, this method only analyzes the text content currently input by the user, and cannot guarantee the logical uniformity of the context, and there are situations that the reply call operation cannot accurately correspond to the user input problem.

Therefore, how to provide a method for generating a reply phone with high accuracy becomes a technical problem to be solved.

Disclosure of Invention

The application aims to provide a method, a device, a storage medium and electronic equipment for improving the generation quality of large model speech, and the technical scheme of the embodiment of the application can improve the efficiency, quality and recovery accuracy of the recovery speech and provide high-quality service for users.

In a first aspect, some embodiments of the present application provide a method for improving the quality of large model speech generation, comprising: determining a problem to be replied based on the current problem in the dialogue content; acquiring a plurality of answer dialects matched with the to-be-replied problem based on a retrieval enhancement generation model RAG; and inputting the to-be-replied question and the plurality of reply dialects into a target large language model to generate a target reply dialects.

According to the method and the device for processing the voice response, after the to-be-replied problem is determined through the current problem in the dialogue content, a plurality of voice response technologies are matched, and then the to-be-replied problem and the plurality of voice response technologies are input into the target large language model to generate the target voice response technology, so that the efficiency and the accuracy of voice response technology can be improved, the quality of the voice response technology generated after RAG processing is higher, high-quality service is provided for users, and user experience is improved.

In some embodiments, the determining the question to reply to based on the current question in the dialogue content includes: judging whether a parent topic exists in the current problem in the dialogue content to obtain a judging result; and acquiring the problem to be replied according to the judging result.

According to the method and the device for judging whether the parent topics exist or not according to the current problems, the problems to be replied are determined according to the judging result, the replying accuracy can be improved, and the situation that the questions are answered by the user without the benefit is avoided.

In some embodiments, the determining whether the current problem in the dialogue content has a parent topic, to obtain a determination result, includes: confirming that the current problem exists in the parent topic, and judging that the current problem exists; the obtaining the to-be-replied problem according to the judging result comprises the following steps: and after generating the reply content corresponding to the current problem, callback to the problem content corresponding to the parent level topic, wherein the problem content is used as the problem to be replied.

According to the method and the device for processing the problems, after confirming that the current problems have the parent topics, the problems corresponding to the parent topics are returned after the reply contents of the current problems are processed, so that the problems to be replied can be obtained, the subsequent reply accuracy can be improved, and the situation of answering questions not being in question is avoided.

In some embodiments, the determining whether the current problem in the dialogue content has a parent topic, to obtain a determination result, includes: confirming that the parent topic does not exist in the current problem, and judging that the current problem does not exist in the judgment result; the obtaining the to-be-replied problem according to the judging result comprises the following steps: and taking the current question as the question to be replied.

According to the method and the device for replying to the current problem, after confirming that the current problem does not have the parent topic, the current problem can be replied, and the accuracy of subsequent replied contents is ensured.

In some embodiments, the generating a model RAG based on the retrieval enhancement obtains a plurality of answer utterances matching the question to be replied to, including: retrieving a reply phone set matched with the to-be-recovered problem from a service knowledge base by using a similarity algorithm; filtering the telephone operation in the reply telephone operation set to obtain the plurality of reply telephone operations.

According to the method and the device, after the reply phone operation set is retrieved from the service knowledge base, a plurality of reply phone operations are obtained through filtering, so that the diversity and the reliability of the retrieval result can be improved.

In some embodiments, the filtering the utterances in the set of reply utterances to obtain the plurality of reply utterances includes: obtaining an embedded vector corresponding to a telephone in the reply telephone set; and inputting the embedded vectors corresponding to the speech operation into a greedy algorithm, and outputting the plurality of reply speech operations.

Some embodiments of the present application screen out multiple reply utterances from a collection of reply utterances by a greedy algorithm, avoiding redundancy and lack of diversity.

In a second aspect, some embodiments of the present application provide an apparatus for improving the quality of large model speech generation, comprising: the determining module is used for determining a problem to be replied based on the current problem in the dialogue content; the acquisition module is used for searching an enhanced generation model RAG to acquire a plurality of answer calls matched with the to-be-replied problem; and the generating module is used for inputting the to-be-replied problem and the plurality of reply dialects into a target large language model and generating a target reply dialects.

In some embodiments, the determining module 510 is configured to determine whether a parent topic exists in the current problem in the dialog content, so as to obtain a determination result; and acquiring the problem to be replied according to the judging result.

In some embodiments, the determining module 510 is configured to confirm that the parent topic exists in the current problem, and the determination result is that the parent topic exists; and after generating the reply content corresponding to the current problem, callback to the problem content corresponding to the parent level topic, wherein the problem content is used as the problem to be replied.

In some embodiments, the determining module 510 is configured to confirm that the parent topic does not exist in the current problem, and the determination result is that the parent topic does not exist; and taking the current question as the question to be replied.

In some embodiments, the obtaining module 520 is configured to retrieve, from a service knowledge base, a set of answer dialects that matches the question to be answer, using a similarity algorithm; filtering the telephone operation in the reply telephone operation set to obtain the plurality of reply telephone operations.

In some embodiments, an obtaining module 520 is configured to obtain an embedded vector corresponding to a phone in the reply phone set; and inputting the embedded vectors corresponding to the speech operation into a greedy algorithm, and outputting the plurality of reply speech operations.

In a third aspect, some embodiments of the application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs a method according to any of the embodiments of the first aspect.

In a fourth aspect, some embodiments of the application provide an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is operable to implement a method according to any of the embodiments of the first aspect when executing the program.

In a fifth aspect, some embodiments of the application provide a computer program product comprising a computer program, wherein the computer program, when executed by a processor, is adapted to carry out the method according to any of the embodiments of the first aspect.

Drawings

In order to more clearly illustrate the technical solutions of some embodiments of the present application, the drawings that are required to be used in some embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be construed as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort to those of ordinary skill in the art.

FIG. 1 is a system diagram of a method for improving large model speech generation quality provided by some embodiments of the present application;

FIG. 2 is a flowchart of a method for improving large model speech generation quality according to some embodiments of the present application;

FIG. 3 is a second flowchart of a method for improving large model speech generation quality according to some embodiments of the present application;

FIG. 4 is a schematic diagram of RAG-based reply session generation according to some embodiments of the present application;

FIG. 5 is a block diagram of an apparatus for improving large model speech generation quality according to some embodiments of the present application;

Fig. 6 is a schematic diagram of an electronic device according to some embodiments of the present application.

Detailed Description

The technical solutions of some embodiments of the present application will be described below with reference to the drawings in some embodiments of the present application.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.

In the related art, in order to improve service efficiency to users, intelligent customer service robots are widely used in various business fields. In an actual interaction scenario, the process of the intelligent customer service robot and the user communicating the product service requirement often cannot be completed in one sentence, i.e. multiple conversations are needed. However, there may also be contextual associations and logic between multiple rounds of conversations. In the prior art, when replying to text content input by a user, keywords are extracted from the text content, and corresponding reply dialogs are generated based on keyword analysis. However, the analysis method cannot analyze the association between the contexts, so that the logic association between the reply phone operation and the history dialogue is poor, the reply quality is poor, and the user experience is affected.

In view of this, some embodiments of the present application provide a method for improving the quality of large model speech generation, which can determine a problem to be replied through a current problem in dialogue content of a user, then match a plurality of reply speech corresponding to the problem to be replied, and then input the plurality of reply speech and the problem to be replied into a target large language model to generate a target reply speech. Some embodiments of the application may analyze the logic between the current problem and the historical dialog, enhancing the facts of the answer speech using the retrieval enhancement generation model RAG (RETRIEVAL-Augmented Generation) technique; by means of the reasoning capability of the target large model, the reasoning efficiency, quality and effect of the reply speech operation are improved. The method for improving the generation quality of the large model speech surgery is based on RAG for further processing, and the generation effect and quality are ensured.

The overall composition of the system for improving mass model speech generation quality provided by some embodiments of the present application is described below by way of example with reference to fig. 1.

As shown in fig. 1, some embodiments of the present application provide a system for improving the quality of large model speech generation, the system for improving the quality of large model speech generation comprising: a terminal 100 and a generation server 200. A user may interact with the intelligent customer service robot deployed within the generation server 200 through the terminal 100. The generating server 200 may determine the to-be-replied question from the current question in the dialogue questions of the terminal 100, then the generating server 200 may screen out a plurality of reply dialects from the service knowledge base of the internal deployment, and finally input the plurality of reply dialects and the to-be-replied question into the target large language model of the internal deployment to generate the target reply dialects.

In some embodiments of the present application, the terminal 100 may be a mobile terminal or a non-portable computer terminal, and embodiments of the present application are not limited herein. And the target large language model can be obtained by adopting different business data for fine adjustment aiming at different business scenes, so that different target reply dialects can be generated aiming at different business scenes.

The implementation of the enhanced large model speech generation quality performed by the generation server 200 provided by some embodiments of the present application is described below by way of example in conjunction with fig. 2.

Referring to fig. 2, fig. 2 is a flowchart of a method for improving the quality of large model speech generation according to some embodiments of the present application, where the method for improving the quality of large model speech generation may include:

S210, determining a problem to be replied to based on the current problem in the dialogue content.

For example, in some embodiments of the present application, during multiple conversations between a user and an intelligent customer service robot, the intelligent customer service robot needs to reason about the replies to each conversation. The application utilizes LLM (target large language model) Agent technology to identify the current problem to be replied according to the current problem of the user.

In some embodiments of the present application, S210 may include: judging whether a parent topic exists in the current problem in the dialogue content to obtain a judging result; and acquiring the problem to be replied according to the judging result.

For example, in some embodiments of the present application, LLM Agent technology analyzes logic between a current question and a history dialogue, i.e., whether the current question is affiliated with a parent topic (as a specific example of a parent topic) system, to determine a question to reply to.

As an example, the mapping relationship between a common parent level topic and a corresponding child level topic included in the service scene is listed in the prompt of the LLM, and the LLM Agent can semantically infer the current question to be replied based on the context dialogue (i.e. the history dialogue), and then make a semantic matching judgment on whether the current question belongs to the child level topic. A parent-child problem is enumerated as follows:

The user asks how to promote the quota;

The sub-level topic is a product operation problem which a user may encounter in the process of increasing the quota operation; for example, one of the methods of forehead raising, i.e. the operation related problem of the accumulation fund forehead raising (the other three are individual tax forehead raising, payment treasures forehead raising and WeChat forehead raising, each forehead raising operation may encounter problems, and each forehead raising operation can help users raise the amount successfully).

In some embodiments of the present application, S210 may include: confirming that the current problem exists in the parent topic, and judging that the current problem exists; and after generating the reply content corresponding to the current problem, callback to the problem content corresponding to the parent level topic, wherein the problem content is used as the problem to be replied.

For example, in some embodiments of the present application, if the current problem is a child of a parent topic, the problem to be replied to of the parent topic is called back after the communication of the child topic is completed. That is, the topics are directly diverged to communicate backwards without replying to the parent level topic. When the sub-level topic is processed, detection such as reference resolution is performed for the current problem in the sub-level topic to be processed. For example: when a user asks a question, a large-scale problem is asked, and steps 1 and 2 are required to be operated firstly to solve the problem; carrying out the process of operating the step 1 by a user, wherein the user encounters an operation query in the step 1 link; at this time, after the operations in step 1 and step 2 are completed, the user can return to the previous reply of the wide-range question, i.e. the wide-range question is the question to be replied.

In other embodiments of the present application, S210 may include: confirming that the parent topic does not exist in the current problem, and judging that the current problem does not exist in the judgment result; and taking the current question as the question to be replied.

For example, in some embodiments of the present application, if it is detected that the current question does not have a parent topic, the current question is directly regarded as a question to be replied.

S220, obtaining a plurality of answer dialogs matched with the to-be-replied problem based on the retrieval enhancement generation model RAG.

For example, in some embodiments of the present application, after determining that a question is to be answered, multiple answer calls that match the user's question may be matched by the RAG (RETRIEVAL-Augmented Generation).

In some embodiments of the present application, S220 may include: retrieving a reply phone set matched with the to-be-recovered problem from a service knowledge base by using a similarity algorithm; filtering the telephone operation in the reply telephone operation set to obtain the plurality of reply telephone operations.

For example, in some embodiments of the present application, multiple pieces of related business knowledge Context (as a specific example of a set of answer utterances) may be recalled from a pre-built business knowledge base by the questions to be replied. Then, the knowledge screening module can filter knowledge Context with similar content in a plurality of related business knowledge contexts, so that the knowledge Context redundancy is avoided and the knowledge Context lacks diversity.

Specifically, the service knowledge base is question-answer knowledge pre-constructed according to service scenes to be served by the intelligent customer service robot, and the function of the service knowledge base is to adapt to the service scenes for retrieval knowledge enhancement, so that the LLM Agent is enabled to have more service basis when reasoning is carried out subsequently. One engineering implementation of the business knowledge base is to sort a piece of business knowledge into a form document in question-and-answer format, and then encode the form document into a "embedding knowledge index" for retrieval by using tools such as Faiss. When the to-be-replied question is taken as a retrieval parameter and the 'embedding knowledge index' is retrieved, the previous M pieces of related business knowledge Context (namely a plurality of pieces of related business knowledge Context) can be recalled by utilizing algorithms such as similarity and the like.

In some embodiments of the present application, S220 may include: obtaining an embedded vector corresponding to a telephone in the reply telephone set; and inputting the embedded vectors corresponding to the speech operation into a greedy algorithm, and outputting the plurality of reply speech operations.

For example, in some embodiments of the present application, in order to reduce redundancy caused by the high similarity of the selected multiple related business knowledge contexts, a filtering is performed by a K-CENTER GREEDY algorithm (i.e., a greedy algorithm), so that under the condition of ensuring relevance and maximizing diversity, the knowledge Context set is minimized, and multiple answer calls are obtained. In practical application, the input of the K-CENTER GREEDY algorithm is embedding vectors (as a specific example of embedded vectors) of the recalled first M pieces of related business knowledge Context, and the output is a plurality of answer dialogs mapped after screening.

Specifically, the K-CENTER GREEDY algorithm can consider these embedding vectors as points in a high-dimensional space, i.e., a knowledge Context point set, and the specific algorithm steps are as follows:

[ s1.] from the knowledge Context point set, the initialization center point c0 is selected randomly or according to some policy.

[ S2.] in the remaining knowledge Context point set, the point furthest from the currently selected center points is selected as the new center point ci.

[ S3.] the operations in [ s2.] are repeated until K center points are selected.

The K center points obtained finally are K pieces of knowledge Context to be reserved (as a specific example of multiple answer phone). The policy in [ s1 ] may be formulated according to a business rule, and is not particularly limited herein.

The K-CENTER GREEDY algorithm has the following effects: under the condition that the searched knowledge items Context have correlation with the problem to be replied, the diversity among the knowledge items is maximized, redundancy is avoided when the knowledge items with similar meanings are input into the LLM, the condition that the replying quality is not reduced can be ensured, the text length of the input LLM Agent is reduced, and the reasoning efficiency of the target large language model is improved.

S230, inputting the to-be-replied question and the plurality of reply dialects into a target large language model to generate a target reply dialects.

For example, in some embodiments of the present application, based on the K pieces of knowledge Context after filtering, the questions to be replied are input into LLM in combination, so that in the case of LLM reasoning only supporting limited-length text input, high-quality target reply utterances are given.

The following is an exemplary description of a specific process for improving the quality of large model speech generation provided by some embodiments of the present application in conjunction with fig. 3.

Referring to fig. 3, fig. 3 is a flowchart illustrating a method for improving the quality of large model speech generation according to some embodiments of the present application.

The above-described process is exemplarily set forth below.

S310, acquiring the current problem in the dialogue content input by the user.

S320, judging whether the current problem has a parent topic, if so, executing S321, otherwise, executing S322.

For example, as a specific example of the present application, FIG. 3 provides a block diagram of the generation of a reply session. the topic tracking module can judge whether a corresponding parent topic exists in the history dialogue aiming at the current problem, and further can identify the problem to be replied.

S321, after generating the reply content corresponding to the current problem, callback to the problem content corresponding to the parent level topic, wherein the problem content is the problem to be replied.

S322, taking the current question as the question to be replied.

S330, retrieving a reply phone set matched with the to-be-recovered problem from the service knowledge base by using a similarity algorithm.

For example, as a specific example of the present application, the knowledge base in the RAG module in fig. 4 is a service knowledge base, from which a answer call set Context matched by the question to be replied can be retrieved.

S340, obtaining the embedded vector corresponding to the phone in the reply phone set.

S350, the embedded vectors corresponding to the speech operation are input into a greedy algorithm, and a plurality of reply speech operations are output.

For example, as a specific example of the present application, the K-CENTER GREEDY algorithm in the knowledge filtering module in fig. 4 may process the embedded vector corresponding to the phone in Context, and output multiple reply phones.

S360, inputting the to-be-replied question and the multiple reply utterances into the target large language model, and generating the target reply utterances.

For example, as a specific example of the present application, the target large language model deployed in the LLM reply inference module in fig. 4 may generate a target reply phone through the question to be replied and the multiple reply phones output by the knowledge filtering module obtained by the topic tracking module.

It will be appreciated that the specific implementation of S310 to S360 may refer to the method embodiments provided above, and detailed descriptions are omitted here as appropriate to avoid repetition. The algorithms referred to above may be selected based on actual business scenarios, and embodiments of the present application are not limited in this regard.

Referring to fig. 5, fig. 5 is a block diagram illustrating an apparatus for improving the quality of large model speech generation according to some embodiments of the present application. It should be understood that the apparatus for improving the quality of large model speech production corresponds to the above method embodiments, and can perform the steps related to the above method embodiments, and specific functions of the apparatus for improving large model speech production may be referred to the above description, and detailed descriptions thereof are omitted herein as appropriate to avoid redundancy.

The apparatus for enhancing large model speech generation quality of fig. 5 includes at least one software functional module capable of being stored in a memory in the form of software or firmware or being solidified in the apparatus for enhancing large model speech generation quality, the apparatus for enhancing large model speech generation quality comprising: a determining module 510, configured to determine a question to be replied to based on a current question in the dialogue content; an obtaining module 520, configured to obtain a plurality of answer dialogs matched with the to-be-replied question based on a search enhancement generation model RAG; the generating module 530 is configured to input the to-be-replied question and the multiple reply utterances into a target large language model, and generate a target reply utterances.

In some embodiments of the present application, the determining module 510 is configured to determine whether a parent topic exists in the current problem in the dialog content, so as to obtain a determination result; and acquiring the problem to be replied according to the judging result.

In some embodiments of the present application, a determining module 510 is configured to confirm that the current problem exists in the parent topic, and the determination result is that the current problem exists; and after generating the reply content corresponding to the current problem, callback to the problem content corresponding to the parent level topic, wherein the problem content is used as the problem to be replied.

In some embodiments of the present application, a determining module 510 is configured to confirm that the parent topic does not exist in the current question, and the determination result is that the parent topic does not exist; and taking the current question as the question to be replied.

In some embodiments of the present application, the obtaining module 520 is configured to retrieve, from a service knowledge base, a set of reply utterances that matches the question to be replied, using a similarity algorithm; filtering the telephone operation in the reply telephone operation set to obtain the plurality of reply telephone operations.

In some embodiments of the present application, an obtaining module 520 is configured to obtain an embedding vector corresponding to a phone in the reply phone set; and inputting the embedded vectors corresponding to the speech operation into a greedy algorithm, and outputting the plurality of reply speech operations.

It will be clear to those skilled in the art that, for convenience and brevity of description, reference may be made to the corresponding procedure in the foregoing method for the specific working procedure of the apparatus described above, and this will not be repeated here.

Some embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the operations of the method according to any of the above-described methods provided by the above-described embodiments.

Some embodiments of the present application also provide a computer program product, where the computer program product includes a computer program, where the computer program when executed by a processor may implement operations of a method corresponding to any of the above embodiments of the above method provided by the above embodiments.

As shown in fig. 6, some embodiments of the present application provide an electronic device 600, the electronic device 600 comprising: memory 610, processor 620, and a computer program stored on memory 610 and executable on processor 620, wherein processor 620 may implement a method as in any of the embodiments described above when reading a program from memory 610 and executing the program via bus 630.

The processor 620 may process the digital signals and may include various computing structures. Such as a complex instruction set computer architecture, a reduced instruction set computer architecture, or an architecture that implements a combination of instruction sets. In some examples, the processor 620 may be a microprocessor.

Memory 610 may be used for storing instructions to be executed by processor 620 or data related to execution of the instructions. Such instructions and/or data may include code to implement some or all of the functions of one or more of the modules described in embodiments of the present application. The processor 620 of the disclosed embodiments may be configured to execute instructions in the memory 610 to implement the methods shown above. Memory 610 includes dynamic random access memory, static random access memory, flash memory, optical memory, or other memory known to those skilled in the art.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A method for improving mass model speech production quality, comprising:

determining a problem to be replied based on the current problem in the dialogue content;

acquiring a plurality of answer dialects matched with the to-be-replied problem based on a retrieval enhancement generation model RAG;

and inputting the to-be-replied question and the plurality of reply dialects into a target large language model to generate a target reply dialects.

2. The method of claim 1, wherein the determining a question to reply to based on a current question in the conversation content comprises:

judging whether a parent topic exists in the current problem in the dialogue content to obtain a judging result;

And acquiring the problem to be replied according to the judging result.

3. The method as claimed in claim 2, wherein said determining whether a parent topic exists for a current problem in the dialog content, to obtain a determination result, comprises:

Confirming that the current problem exists in the parent topic, and judging that the current problem exists;

the obtaining the to-be-replied problem according to the judging result comprises the following steps:

And after generating the reply content corresponding to the current problem, callback to the problem content corresponding to the parent level topic, wherein the problem content is used as the problem to be replied.

4. The method as claimed in claim 2, wherein said determining whether a parent topic exists for a current problem in the dialog content, to obtain a determination result, comprises:

confirming that the parent topic does not exist in the current problem, and judging that the current problem does not exist in the judgment result;

And taking the current question as the question to be replied.

5. The method according to any of claims 1-4, wherein the retrieving enhancement generation model-based RAG obtains a plurality of answer calls matching the question to be answer, comprising:

retrieving a reply phone set matched with the to-be-recovered problem from a service knowledge base by using a similarity algorithm;

filtering the telephone operation in the reply telephone operation set to obtain the plurality of reply telephone operations.

6. The method of claim 5, wherein filtering the utterances in the set of answer utterances to obtain the plurality of answer utterances comprises:

Obtaining an embedded vector corresponding to a telephone in the reply telephone set;

and inputting the embedded vectors corresponding to the speech operation into a greedy algorithm, and outputting the plurality of reply speech operations.

7. An apparatus for improving mass model speech surgery generation quality, comprising:

the determining module is used for determining a problem to be replied based on the current problem in the dialogue content;

The acquisition module is used for acquiring a plurality of answer dialogs matched with the to-be-replied problem based on a retrieval enhancement generation model RAG;

And the generating module is used for inputting the to-be-replied problem and the plurality of reply dialects into a target large language model and generating a target reply dialects.

8. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program, wherein the computer program when run by a processor performs the method according to any of claims 1-6.

9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and running on the processor, wherein the computer program when run by the processor performs the method of any one of claims 1-6.

10. A computer program product, characterized in that the computer program product comprises a computer program, wherein the computer program, when run by a processor, performs the method according to any of claims 1-6.