CN117807964A

CN117807964A - Chat corpus labeling method and device, storage medium and computer equipment

Info

Publication number: CN117807964A
Application number: CN202311847536.2A
Authority: CN
Inventors: 邓其春; 马金龙; 吴文亮; 熊佳; 张政统; 黎子骏; 曾锐鸿; 王伟喆; 盘子圣; 马飞; 徐志坚; 谢睿; 陈光尧
Original assignee: Guangzhou Quyan Network Technology Co ltd
Current assignee: Guangzhou Quyan Network Technology Co ltd
Priority date: 2023-12-28
Filing date: 2023-12-28
Publication date: 2024-04-02

Abstract

In the chat corpus labeling method, the chat corpus labeling device, the storage medium and the computer equipment provided by the application, a target scoring model is adopted to score a plurality of initial chat corpora, so that scoring content and scoring of each initial chat corpus are obtained; determining chat corpora to be evaluated according to the score of each initial chat corpus, and determining chat corpora to be converted according to scoring content of each chat corpus to be evaluated; extracting system setting content, dialogue content and negative evaluation content of each chat corpus to be converted to obtain chat corpus to be improved; labeling each chat corpus to be improved by adopting a target dialogue labeling model to obtain labeled chat corpus; extracting system setting content and labeling content of each labeling chat corpus to obtain a labeling corpus to be evaluated; and determining the target chat corpus according to the score obtained by scoring each to-be-evaluated labeling corpus by the target scoring model. Thus, the quality of the sample labeling can be improved.

Description

Chat corpus labeling method and device, storage medium and computer equipment

Technical Field

The present disclosure relates to the field of sample labeling technologies, and in particular, to a chat corpus labeling method, a chat corpus labeling device, a storage medium, and a computer device.

Background

With the development of the heat of the dialog content generated by artificial intelligence, anthropomorphic chat products are layered endlessly. However, to achieve higher quality levels for these products, the key is the quality and number of training samples. Therefore, the realization of automated sample labeling and sample evaluation is an important research direction.

The prior art mainly relies on a large model to moisten dialogue contents, but has some disadvantages. The most important problem is that the sample labeling and the evaluation are relatively independent, and an effective association is lacked, so that a beneficial positive feedback loop cannot be formed, and the labeled sample has lower quality.

Disclosure of Invention

The present application aims to solve at least one of the above technical drawbacks, and in particular, the technical drawbacks of the prior art that the sample labeling and the evaluation are relatively independent, lack of effective correlation, and cannot form a beneficial positive feedback loop, resulting in lower sample quality after labeling.

In a first aspect, the present application provides a chat corpus labeling method, where the method includes:

scoring the acquired multiple initial chat corpora by adopting a trained target scoring model to obtain scoring content and score of each initial chat corpus;

Determining at least one chat corpus to be evaluated in each initial chat corpus according to the score of each initial chat corpus, performing quality evaluation on each chat corpus to be evaluated according to scoring content of each chat corpus to be evaluated, and determining at least one chat corpus to be converted in each chat corpus to be evaluated according to quality evaluation results;

extracting system setting content, dialogue content and negative evaluation content of each chat corpus to be converted to obtain chat corpus to be improved of each chat corpus to be converted;

labeling each chat corpus to be improved by adopting a trained target dialogue labeling model to obtain a plurality of labeled chat corpora;

extracting system setting content and labeling content of each labeling chat corpus to obtain to-be-evaluated labeling corpus of each labeling chat corpus;

scoring each annotation corpus to be evaluated by adopting the target scoring model to obtain the score of each annotation corpus to be evaluated, and determining at least one target chat corpus in each annotation corpus to be evaluated according to the score of each annotation corpus to be evaluated.

In one embodiment, the training process of the target scoring model includes:

randomly selecting a plurality of training chat corpora from all the initial chat corpora;

transmitting each training chat corpus to a user side so that the user side scores each training chat corpus;

training a pre-constructed initial scoring model by using the scored training chat corpus, and obtaining the target scoring model when the trained initial scoring model meets a first preset training ending condition.

In one embodiment, the score of each of the initial chat corpora includes a conversation quality score, a person-set consistency score, and a role reply definition score;

the step of determining at least one chat corpus to be evaluated in each initial chat corpus according to the score of each initial chat corpus comprises the following steps:

screening each initial chat corpus according to the dialogue quality score of each initial chat corpus to obtain at least one first chat corpus;

setting consistency scores according to the people of each first chat corpus, and screening each first chat corpus to obtain at least one second chat corpus;

And screening each second chat corpus according to the role reply limiting score of each second chat corpus to obtain at least one chat corpus to be evaluated.

In one embodiment, the scoring content of each chat corpus to be evaluated comprises a dialogue quality negative evaluation, a person-setting consistency negative evaluation and a role reply definition negative evaluation;

the step of performing quality evaluation on each chat corpus to be evaluated according to scoring content of each chat corpus to be evaluated, and determining at least one chat corpus to be converted in each chat corpus to be evaluated according to quality evaluation results comprises the following steps:

for each chat corpus to be evaluated, determining the number of evaluation items of dialogue quality negative evaluation, the number of evaluation items of person-set consistency negative evaluation and the number of evaluation items of character reply limit negative evaluation in the chat corpus to be evaluated, and taking the chat corpus to be evaluated as the chat corpus to be converted if the number of evaluation items of dialogue quality negative evaluation in the chat corpus to be evaluated is not greater than a preset dialogue quality negative evaluation threshold, the number of evaluation items of person-set consistency negative evaluation is not greater than a preset person-set consistency negative evaluation threshold, and the number of evaluation items of character reply limit negative evaluation is not greater than a preset character reply limit negative evaluation threshold.

In one embodiment, the training process of the target dialogue annotation model includes:

randomly sampling each chat corpus to be improved to obtain a plurality of first training chat corpora to be improved;

transmitting each first training chat corpus to be improved to a user side so that the user side marks the dialogue content of each first training chat corpus to be improved;

SFT training is carried out on the pre-built initial dialogue annotation model by adopting each first training chat corpus to be improved and each annotated first training chat corpus to be improved;

and determining the target dialogue annotation model according to the initial dialogue annotation model trained by the SFT.

In one embodiment, the step of determining the target session annotation model according to the SFT-trained initial session annotation model includes:

randomly sampling each chat corpus to be improved to obtain a plurality of second training chat corpora to be improved;

performing PPO fine adjustment on the initial dialogue annotation model after SFT training by adopting each chat corpus to be improved in the second training to obtain an optimized initial dialogue annotation model, wherein the target scoring model participates in the PPO fine adjustment to provide optimization parameters for optimizing the initial dialogue annotation model;

And when the optimized initial dialogue annotation model meets a second preset training ending condition, obtaining the target dialogue annotation model.

In one embodiment, the score of each annotation corpus to be evaluated includes a dialogue quality score, a person-setting consistency score, and a role reply definition score;

the step of determining at least one target chat corpus in each annotation corpus to be evaluated according to the score of each annotation corpus to be evaluated comprises the following steps:

and for each annotation corpus to be evaluated, if the dialogue quality score, the person setting consistency score and the role reply limit score of the annotation corpus to be evaluated do not reach the preset score threshold, rejecting the annotation corpus to be evaluated, otherwise, taking the annotation corpus to be evaluated as the target chat corpus.

In a second aspect, the present application provides a chat corpus labeling apparatus, the apparatus including:

the initial chat corpus scoring module is used for scoring the acquired multiple initial chat corpora by adopting a trained target scoring model to obtain scoring content and score of each initial chat corpus;

the to-be-converted chat corpus acquisition module is used for determining at least one to-be-evaluated chat corpus in each initial chat corpus according to the score of each initial chat corpus, performing quality evaluation on each to-be-evaluated chat corpus according to the scoring content of each to-be-evaluated chat corpus, and determining at least one to-be-converted chat corpus in each to-be-evaluated chat corpus according to the quality evaluation result;

The chat corpus to be improved is obtained by extracting system setting content, dialogue content and negative evaluation content of each chat corpus to be converted;

the marked chat corpus acquisition module is used for marking each chat corpus to be improved by adopting a trained target dialogue marking model to obtain a plurality of marked chat corpora;

the to-be-evaluated annotation corpus acquisition module is used for extracting system setting content and annotation content of each annotation chat corpus to obtain to-be-evaluated annotation corpus of each annotation chat corpus;

the target chat corpus acquisition module is used for scoring each annotation corpus to be evaluated by adopting the target scoring model, obtaining the score of each annotation corpus to be evaluated, and determining at least one target chat corpus in each annotation corpus to be evaluated according to the score of each annotation corpus to be evaluated.

In a third aspect, the present application provides a storage medium characterized in that: the storage medium has stored therein computer readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the chat corpus labeling method as described in any of the embodiments above.

In a fourth aspect, the present application provides a computer device comprising: one or more processors, and memory;

the memory has stored therein computer readable instructions that, when executed by the one or more processors, perform the steps of the chat corpus labeling method as described in any of the embodiments above.

From the above technical solutions, the embodiments of the present application have the following advantages:

in the chat corpus labeling method, the chat corpus labeling device, the storage medium and the computer equipment, a trained target scoring model is adopted to score a plurality of acquired initial chat corpora, and scoring content and score of each initial chat corpus are obtained; determining at least one chat corpus to be evaluated in each initial chat corpus according to the score of each initial chat corpus, performing quality evaluation on each chat corpus to be evaluated according to scoring content of each chat corpus to be evaluated, and determining at least one chat corpus to be converted in each chat corpus to be evaluated according to quality evaluation results; extracting system setting content, dialogue content and negative evaluation content of each chat corpus to be converted to obtain chat corpus to be improved of each chat corpus to be converted; marking each chat corpus to be improved by adopting a trained target dialogue marking model to obtain a plurality of marked chat corpora; extracting system set content and labeling content of each labeling chat corpus to obtain to-be-evaluated labeling corpus of each labeling chat corpus; scoring each annotation corpus to be evaluated by adopting a target scoring model to obtain the score of each annotation corpus to be evaluated, and determining at least one target chat corpus in each annotation corpus to be evaluated according to the score of each annotation corpus to be evaluated. In the method, the initial chat corpus is scored by adopting the target scoring model, so that the quality of the initial chat corpus can be rapidly and automatically evaluated, the burden of manual marking is reduced, and subjective factors in the marking process are reduced. And carrying out quality evaluation on the chat corpus to be evaluated according to scoring content of the chat corpus to be evaluated, so that the quality of the labeling sample is further improved. Scoring models are adopted before and after marking of the chat corpus, so that high-quality target chat corpus is screened out, and therefore effective association can be formed between marking and evaluation, and the quality of sample marking is further improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive faculty for a person skilled in the art.

Fig. 1 is a flow chart of a chat corpus labeling method provided in an embodiment of the present application;

fig. 2 is a flowchart illustrating a chat corpus determining step to be evaluated according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating a training process of a target dialogue annotation model according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a chat corpus labeling apparatus according to an embodiment of the present application;

fig. 5 is a schematic diagram of an internal structure of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The application provides a chat corpus labeling method. The following embodiments will be described by taking the application of the method to a computer device as an example, and it will be understood that the computer device may be various devices with data processing functions, and may be, but not limited to, a single server, a server cluster, a personal notebook, a desktop computer, and the like. As shown in fig. 1, the present application provides a chat corpus labeling method, which includes:

s101: and scoring the acquired multiple initial chat corpora by adopting a trained target scoring model to obtain scoring content and score of each initial chat corpus.

Wherein the scoring model is a machine learning model for evaluating the quality of chat corpora. Chat corpus refers to text data that contains dialog content. Scoring content includes scoring various metrics of the chat corpus, such as grammar correctness, logical consistency, information accuracy, and the like.

It can be understood that the chat corpus is scored by adopting the scoring model, so that the quality of the chat corpus can be rapidly and automatically evaluated, thereby screening out high-quality dialogue samples and improving the quality of generated dialogue content.

Specifically, firstly, a trained target scoring model is imported into a program, and then, a plurality of obtained initial chat corpora are input into the scoring model to obtain a scoring result of each chat corpus. The scoring result is typically a score or a range of scores. The scoring content comprises scoring conditions of various indexes of the chat corpus.

S102: according to the score of each initial chat corpus, at least one chat corpus to be evaluated is determined in each initial chat corpus, quality evaluation is carried out on each chat corpus to be evaluated according to scoring content of each chat corpus to be evaluated, and at least one chat corpus to be converted is determined in each chat corpus to be evaluated according to quality evaluation results.

It can be understood that by scoring the preliminary chat corpus, finding out the portion with higher quality as the chat corpus to be evaluated, further performing quality evaluation on the portion with higher quality, and selecting the portion with higher quality as the chat corpus to be converted, thereby ensuring the quality of the generated dialogue content.

Specifically, according to the scoring result, selecting an initial chat corpus with higher score as the chat corpus to be evaluated. A threshold may also be set, and only chat corpora that score above the threshold will be selected as chat corpora to be evaluated. When the chat corpus to be evaluated is evaluated in quality, manual auditing can be adopted or the evaluation can be performed again by using a scoring model. If the manual auditing is adopted, the chat corpus to be evaluated can be given to an auditor for auditing, and then the chat corpus can be determined to enter the next step according to the auditing result. According to the quality evaluation result, selecting the chat corpus to be evaluated with higher quality as the chat corpus to be converted, and screening and evaluating by using a manual auditing or scoring model.

S103: and extracting system setting content, dialogue content and negative evaluation content of each chat corpus to be converted to obtain chat corpus to be improved of each chat corpus to be converted.

It can be appreciated that extracting the system set content, dialogue content and negative evaluation content of the chat corpus to be converted can further understand the needs, problems and dissatisfaction of the user.

Specifically, the system setting content involved in the dialogue is extracted by analyzing the system response or prompt in the dialogue, wherein the system setting content can comprise functions, rules, constraint conditions and the like of the system. One approach is to extract system settings by matching specific keywords or phrases using natural language processing techniques. The problems, expressed intentions and responses given by the system are extracted from the dialogue, so that the context and the context of the dialogue can be known in depth, and the possible problems or improvement spaces in the dialogue can be analyzed. And extracting negative evaluation content by analyzing feedback or evaluation of the user. This may be a problem, dissatisfaction, or dissatisfaction with the system response encountered in the user dialog experience. Text classification or emotion analysis techniques may be used to identify negatively rated content of a user.

S104: and labeling each chat corpus to be improved by adopting a trained target dialogue labeling model to obtain a plurality of labeled chat corpora.

In this step, the chat corpus to be improved can be arranged into a format suitable for model input. Typically, this requires converting each dialog into a pair of question-answer forms, where the question is the question posed by the user and the answer is a reply to the system. And selecting a proper target dialogue annotation model, and loading the selected target dialogue annotation model into the environment. Each chat corpus to be improved is input into a model, and the model automatically predicts the label of each sentence in the dialogue, such as user questions, system replies, other types and the like. Further, the labeling results can be arranged into a format suitable for further analysis and processing, and the labels are associated with the original chat corpus to form a chat conversation data set with complete labels.

S105: and extracting the system set content and the labeling content of each labeling chat corpus to obtain the labeling corpus to be evaluated of each labeling chat corpus.

It can be understood that after the target dialogue labeling model, dialogue content on the chat corpus to be improved is labeled, namely the dialogue content is changed, and the negative evaluation of the chat corpus to be improved is in a corresponding relation with the dialogue content, after the dialogue content is changed, the original negative evaluation also loses meaning, but the system setting content cannot be changed due to the change of the dialogue content. Therefore, only the system setting content and the labeling content of the labeling chat corpus need to be extracted.

Specifically, natural language processing techniques may be used to extract system settings in the labeled chat corpus by matching specific keywords or phrases, which may include functions, rules, constraints, etc. of the system. For each labeled chat corpus, the labeled chat corpus is respectively input into a target dialogue labeling model, and the model automatically predicts the label of each sentence in the dialogue. For each sentence, the label to which the sentence belongs can be used as the labeling content for labeling the chat corpus.

S106: scoring each annotation corpus to be evaluated by adopting a target scoring model to obtain the score of each annotation corpus to be evaluated, and determining at least one target chat corpus in each annotation corpus to be evaluated according to the score of each annotation corpus to be evaluated.

In the step, for each annotation corpus to be evaluated, inputting the annotation corpus into a target scoring model to obtain a score. The score may represent a degree of matching of the annotation corpus with the target chat corpus or other relevant indicators. Depending on the design of the model, the score may be a continuous value, such as a floating point number between 0 and 1, or a discrete value, such as an integer between 1 and 5. And sorting according to the score of each labeling corpus to be evaluated from high to low or from low to high, so that the labeling corpus which is most in line with the target can be determined according to the quality of the score. According to the requirements, the labeling corpus with the highest score to be evaluated can be selected as the target chat corpus, screening can be performed according to the score, a plurality of labeling corpuses with higher scores are selected as the target chat corpus, a score threshold can be set, and only the labeling corpuses with the score higher than the threshold are selected as targets.

In the above embodiment, the trained target scoring model is adopted to score the acquired multiple initial chat corpora, so as to obtain scoring content and score of each initial chat corpus; determining at least one chat corpus to be evaluated in each initial chat corpus according to the score of each initial chat corpus, performing quality evaluation on each chat corpus to be evaluated according to scoring content of each chat corpus to be evaluated, and determining at least one chat corpus to be converted in each chat corpus to be evaluated according to quality evaluation results; extracting system setting content, dialogue content and negative evaluation content of each chat corpus to be converted to obtain chat corpus to be improved of each chat corpus to be converted; marking each chat corpus to be improved by adopting a trained target dialogue marking model to obtain a plurality of marked chat corpora; extracting system set content and labeling content of each labeling chat corpus to obtain to-be-evaluated labeling corpus of each labeling chat corpus; scoring each annotation corpus to be evaluated by adopting a target scoring model to obtain the score of each annotation corpus to be evaluated, and determining at least one target chat corpus in each annotation corpus to be evaluated according to the score of each annotation corpus to be evaluated. In the method, the initial chat corpus is scored by adopting the target scoring model, so that the initial chat corpus with higher quality can be automatically screened out, the burden of manual marking is reduced, and subjective factors in the marking process are reduced. And carrying out quality evaluation on the chat corpus to be evaluated according to scoring content of the chat corpus to be evaluated, so that the quality of the labeling sample is further improved. Scoring models are adopted before and after marking of the chat corpus, so that high-quality target chat corpus is screened out, and therefore effective association can be formed between marking and evaluation, and the quality of sample marking is further improved.

In one embodiment, the training process of the target scoring model includes:

training a pre-constructed initial scoring model by using the scored training chat corpus, and obtaining a target scoring model when the trained initial scoring model meets a first preset training ending condition.

It can be appreciated that to ensure that the training data is representative and diverse, conversations of various types and contexts can be covered to train a more comprehensive and robust target scoring model, multiple training chat corpora can be randomly selected from the initial chat corpora. And sending the training chat corpus to the user side so that the user side scores each training chat corpus, and then receiving the scored training chat corpus sent by the user side so as to train the pre-built initial scoring model according to the scored training chat corpus. The initial scoring model is trained by the scored training chat corpus, so that the initial scoring model can output corresponding scores according to the input dialogue content.

Specifically, a certain number of conversations may be selected as training data from the initial chat corpus according to a certain rule or in a random manner. And sending the selected training chat corpus to a user side, and requiring the user to score each dialogue, wherein the scoring mode can be subjective evaluation or objective evaluation and is determined according to actual conditions. The labeled training data is used for training the initial target scoring model, deep learning, machine learning and other methods can be adopted, and proper model structures and algorithms can be selected according to specific requirements. The scoring model may be a large language model, but the network structure is not limited to llama, llama2, or an excellent large model of open source.

In the embodiment, the scored training chat corpus is used for training, so that the accuracy and generalization capability of the target scoring model can be improved, the target scoring model is better adapted to actual application scenes, subjective feeling and requirements of users can be considered, and the model is more in line with actual use conditions.

As shown in fig. 2, in one embodiment, the score for each initial chat corpus includes a conversation quality score, a person-to-person consistency score, and a role reply definition score;

Determining at least one chat corpus to be evaluated in each initial chat corpus according to the score of each initial chat corpus, wherein the step comprises the following steps:

s201: screening each initial chat corpus according to the dialogue quality score of each initial chat corpus to obtain at least one first chat corpus;

s202: setting consistency scores according to the people of each first chat corpus, and screening each first chat corpus to obtain at least one second chat corpus;

s203: and screening each second chat corpus according to the role reply limit score of each second chat corpus to obtain at least one chat corpus to be evaluated.

It will be appreciated that the step-wise screening process can help ensure that the selected chat corpus to be evaluated has a high conversation quality, consistency of personal settings, and character recovery limitations. Through gradual screening, the quality of chat corpus to be evaluated can be improved, and therefore the accuracy and generalization capability of the target scoring model are improved. Because the importance degree of each score of the chat corpus is that the dialogue quality score > the consistency score > the role reply limit score is set by the person, and the filtering proportion is that the dialogue quality score > the consistency score > the role reply limit score is set by the person, redundant filtering operations can be avoided by filtering in sequence.

Specifically, the initial chat corpus is filtered according to the dialogue quality scores, and the dialogues with higher scores are selected as the first chat corpus. And carrying out personnel consistency scoring on the first chat corpus, and screening out a second chat corpus which meets the personnel consistency requirement. And performing role reply limiting scoring on the second chat corpus, and screening out the chat corpus to be evaluated, which meets the role reply limiting requirements.

In this embodiment, through stepwise screening, low-quality dialogues can be eliminated, and chat corpora that perform better in terms of dialog quality, consistency of person settings, and definition of character replies are selected for evaluation. By selecting high-quality chat corpus to be evaluated, the judgment capability of the target scoring model on conversation quality, consistency of human setting and role reply limitation can be improved, and the scoring result of the model is more accurate. Through screening of a plurality of steps, the chat corpus to be evaluated can be ensured to have higher performance in different aspects, so that the target scoring model can work well under different situations and requirements.

In one embodiment, scoring content for each chat corpus to be evaluated includes a conversation quality negative rating, a person-set consistency negative rating, and a role reply definition negative rating;

The method comprises the steps of carrying out quality evaluation on each chat corpus to be evaluated according to scoring content of each chat corpus to be evaluated, and determining at least one chat corpus to be converted in each chat corpus to be evaluated according to quality evaluation results, wherein the steps comprise:

for each chat corpus to be evaluated, determining the number of evaluation bars of the conversation quality negative evaluation, the number of evaluation bars of the person-set consistency negative evaluation and the number of evaluation bars of the character reply limit negative evaluation in the chat corpus to be evaluated, and taking the chat corpus to be evaluated as the chat corpus to be converted if the number of evaluation bars of the conversation quality negative evaluation in the chat corpus to be evaluated is not greater than a preset conversation quality negative evaluation bar threshold, the number of evaluation bars of the person-set consistency negative evaluation is not greater than a preset person-set consistency negative evaluation bar threshold, and the number of evaluation bars of the character reply limit negative evaluation is not greater than a preset character reply limit negative evaluation bar threshold.

It can be understood that the chat corpus to be converted is screened out through the dialogue quality negative evaluation, the person setting consistency negative evaluation and the role reply limit negative evaluation, so that the target conversion chat corpus can be ensured to have higher dialogue quality, person setting consistency and role reply limit.

Specifically, the number of negative evaluation items in terms of dialog quality, person-to-person consistency, and character reply definition are first determined. And reading the chat corpus to be evaluated, and traversing each dialogue record. For each dialogue record, the text content is compared with defined negative evaluation items, and the number of the negative evaluation items is calculated. And accumulating the number of negative evaluation items of each dialogue record to obtain the dialogue quality, the consistency of the personnel setting and the number of the negative evaluation items in the aspect of role reply limit of the whole chat corpus to be evaluated. And respectively comparing the calculated dialogue quality, the consistency of the personnel setting and the number of negative evaluation items in the role reply limit with preset thresholds. If the number of the items of the dialogue quality negative evaluation is not more than a preset dialogue quality negative evaluation threshold, the number of the items of the consistency negative evaluation set by a person is not more than a preset consistency negative evaluation threshold, and the number of the items of the character reply limiting negative evaluation is not more than a preset character reply limiting negative evaluation threshold, the chat corpus to be evaluated is considered to meet the requirement, and the chat corpus to be evaluated can be used as the target conversion chat corpus. If the chat corpus to be evaluated does not meet the requirements, the chat corpus to be evaluated is not suitable as the target conversion chat corpus.

Further, the conversation quality needs to be evaluated in whole, the consistency and the role reply are defined as local problems, and the frequency of problems in chat corpus is low due to the role reply, so that the preset conversation quality negative evaluation threshold value is smaller than the preset consistency negative evaluation threshold value, and the preset consistency negative evaluation threshold value is smaller than the preset role reply limiting negative evaluation threshold value.

In this embodiment, by screening chat corpora to be evaluated, contents with better dialogue quality, consistency of setting by people and definition of role reply can be selected as target conversion chat corpora, so that quality of the conversion chat corpora is improved. By selecting high-quality target conversion chat corpus, the judgment capability of the model in terms of conversation quality, consistency of human setting and role reply limit can be improved, so that the conversion effect of the model is more accurate and reliable.

As shown in FIG. 3, in one embodiment, the training process of the target dialog annotation model includes:

s301: randomly sampling each chat corpus to be improved to obtain a plurality of first training chat corpora to be improved;

s302: transmitting each first training chat corpus to be improved to a user side so that the user side marks the dialogue content of each first training chat corpus to be improved;

S303: carrying out SFT training on a pre-constructed initial dialogue labeling model by adopting each first training chat corpus to be improved and each labeled first training chat corpus to be improved;

s304: and determining a target dialogue annotation model according to the initial dialogue annotation model after SFT training.

It can be understood that, by labeling the dialogue content at the user end, high-quality labeling data can be obtained, and model training is performed by using the high-quality labeling data, so that an accurate and effective target dialogue labeling model is finally obtained.

Specifically, each chat corpus to be improved is randomly sampled, so that a plurality of first training chat corpora to be improved are obtained. The first training chat corpus to be improved is representative and can cover various situations and dialogue scenes that may be encountered by the chat system to be improved. And sending the chat corpora to be improved of the first training to the user side so that the user side marks the dialogue content of each chat corpus to be improved of the first training. The process may be implemented through a labeling interface or other interactive means. And carrying out SFT (Supervisory Fine-Tuning) training on the pre-constructed initial dialogue annotation model by adopting the first training chat corpus to be improved and the annotated first training chat corpus to be improved. And determining a final target dialogue annotation model according to the initial dialogue annotation model trained by the SFT. Through iterative training, the model gradually improves performance and accuracy. After the model reaches a certain performance index, the model can be used as a final target dialogue annotation model.

In this embodiment, by using the SFT training method, iterative training may be performed using the labeled data and the existing model, so as to gradually improve performance and accuracy of the dialogue labeling model.

In one embodiment, the step of determining the target session annotation model based on the SFT-trained initial session annotation model comprises:

performing PPO fine adjustment on the initial dialogue annotation model after SFT training by adopting each second training chat corpus to be improved so as to obtain an optimized initial dialogue annotation model, wherein the target scoring model participates in PPO fine adjustment, and optimizing parameters are provided for optimizing the initial dialogue annotation model;

and when the optimized initial dialogue annotation model meets a second preset training ending condition, obtaining a target dialogue annotation model.

Specifically, a plurality of samples are randomly selected from the existing chat corpus to be improved as the chat corpus to be improved for the second training, so that the diversity and the representativeness of the samples are ensured. And (3) using an initial dialogue annotation model after SFT training, combining the chat corpus to be improved through second training, and optimizing model parameters by adopting a fine tuning algorithm such as PPO (Proximal Policy Optimization). PPO is a reinforcement learning algorithm that can be used to fine tune parameters of a neural network model to improve model performance. The fine-tuned initial dialogue annotation model is applied to a series of test data, the scoring model serves as a reward model, replies are evaluated according to preset evaluation indexes, and a score or a reward value is given and used for guiding the PPO fine-tuning process. Setting a second preset training ending condition, such as model accuracy reaching a specific threshold, performance stabilizing for a period of time, or training rounds reaching a preset value, etc. And when the fine-tuned initial dialogue annotation model meets the preset training ending condition, obtaining the target dialogue annotation model.

In one example, in the training process, multiple labeling results generated by the trimmed initial dialogue labeling model are scored, which is equivalent to the training process: and high-quality labeling results generated by the fine-tuned initial dialogue labeling model are subjected to high-volume rewarding, poor labeling results are generated without rewarding, and labeling results with general quality are properly rewarded, so that the fine-tuned initial dialogue labeling model finally tends to generate high-scoring labeling results.

In this embodiment, by using the scoring model as the rewarding model, the reinforcement learning algorithm can be introduced into the fine tuning process of the dialogue model, and the model can be gradually improved and adapted to different dialogue environments and requirements, so that the fine tuning direction better meets the expected dialogue quality. By using the scoring model as the rewarding model, the dependence on manual evaluation can be reduced, the deviation caused by subjective evaluation is reduced, and the training efficiency and the cost efficiency are improved.

In one embodiment, the score of each annotation corpus to be evaluated includes a dialog quality score, a person-set consistency score, and a role reply definition score;

according to the score of each annotation corpus to be evaluated, determining at least one target chat corpus in each annotation corpus to be evaluated, wherein the method comprises the following steps:

For each annotation corpus to be evaluated, if the dialogue quality score, the person setting consistency score and the role reply limit score of the annotation corpus to be evaluated do not reach the preset score threshold, rejecting the annotation corpus to be evaluated, otherwise, taking the annotation corpus to be evaluated as a target chat corpus.

Specifically, specific evaluation standards of dialogue quality scores, person-set consistency scores and role reply definition scores are determined, a preset score threshold is set for each score index, and the definition can be carried out according to specific requirements and tasks. And judging whether the dialogue quality score, the person setting consistency score and the role reply limit score of each annotation corpus to be evaluated all reach a preset score threshold value. If all scores of the annotation corpus to be evaluated do not reach the preset score threshold, eliminating the annotation corpus and not serving as the target chat corpus. If at least one of the dialogue quality score, the person setting consistency score and the role reply limit score of the annotation corpus to be evaluated reaches a preset score threshold, the annotation corpus is used as a target chat corpus for subsequent training and improvement.

In this embodiment, removing the labeled corpus which does not reach the preset scoring threshold value can exclude low-quality dialogue content, so that the overall data quality is improved, and only the labeled corpus with high quality is selected as the target chat corpus, so that the data used in training the model can be ensured to have better accuracy and reliability. By setting the dialogue quality scoring threshold value, the labeling corpus with poor dialogue quality is removed, so that unreasonable or wrong dialogue modes can be prevented from being learned by the model, the dialogue quality generated by the model can be improved, and the interaction experience with a user can be enhanced. The consistency scoring threshold value is set by people, so that the selected labeling corpus is consistent in terms of person setting and modeling, generated replies are more consistent with the set character characteristics and individuality, and the performance of the model on the specific character is improved. By setting the role reply limiting scoring threshold, the labeling corpus meeting the requirements can be screened out, so that the reply generated by the model meets the expectations in the aspects of role setting and reply limiting, and the adaptability and the performance of the model under a specific scene can be improved.

The chat corpus labeling device provided by the embodiment of the application is described below, and the chat corpus labeling device described below and the chat corpus labeling method described above can be referred to correspondingly. As shown in fig. 4, the present application provides a chat corpus labeling apparatus, which includes:

an initial chat corpus scoring module 401, configured to score the obtained multiple initial chat corpora by using a trained target scoring model, so as to obtain scoring content and score of each initial chat corpus;

the to-be-converted chat corpus acquisition module 402 is configured to determine at least one to-be-evaluated chat corpus among the initial chat corpora according to the score of each initial chat corpus, perform quality evaluation on each to-be-evaluated chat corpus according to the scoring content of each to-be-evaluated chat corpus, and determine at least one to-be-converted chat corpus among each to-be-evaluated chat corpus according to the quality evaluation result;

the chat corpus to be improved obtaining module 403 is configured to extract system setting content, dialogue content and negative evaluation content of each chat corpus to be converted, so as to obtain a chat corpus to be improved of each chat corpus to be converted;

the labeling chat corpus acquisition module 404 is configured to label each chat corpus to be improved by using a trained target dialogue labeling model, so as to obtain a plurality of labeling chat corpora;

The to-be-evaluated annotation corpus obtaining module 405 is configured to extract system setting content and annotation content of each annotation chat corpus, and obtain to-be-evaluated annotation corpus of each annotation chat corpus;

the target chat corpus obtaining module 406 is configured to score each of the to-be-evaluated annotation corpora by using a target scoring model, obtain a score of each of the to-be-evaluated annotation corpora, and determine at least one target chat corpus in each of the to-be-evaluated annotation corpora according to the score of each of the to-be-evaluated annotation corpora.

In one embodiment, the initial chat corpus scoring module 401 includes:

the training chat corpus selecting unit is used for randomly selecting a plurality of training chat corpora from all the initial chat corpora;

the training chat corpus transmitting unit is used for transmitting each training chat corpus to the user side so that the user side scores each training chat corpus;

the training unit of the target scoring model is used for training the initial scoring model built in advance by adopting the scored training chat corpus, and when the trained initial scoring model meets the first preset training ending condition, the target scoring model is obtained.

In one embodiment, the score for each initial chat corpus includes a conversation quality score, a person-set consistency score, and a role reply definition score;

The chat corpus to be converted obtaining module 402 includes:

the first chat corpus acquisition unit is used for screening each initial chat corpus according to the dialogue quality score of each initial chat corpus to obtain at least one first chat corpus;

the second chat corpus acquisition unit is used for setting consistency scores according to the people of each first chat corpus, screening each first chat corpus and obtaining at least one second chat corpus;

the chat corpus to be evaluated is used for filtering each second chat corpus according to the role reply limit score of each second chat corpus to obtain at least one chat corpus to be evaluated.

the chat corpus to be converted obtaining module 402 includes:

the chat corpus to be converted is used for determining the number of evaluation items of the conversation quality negative evaluation, the number of evaluation items of the person set consistency negative evaluation and the number of evaluation items of the character reply limit negative evaluation in the chat corpus to be evaluated for each chat corpus to be evaluated, and if the number of evaluation items of the conversation quality negative evaluation in the chat corpus to be evaluated is not greater than a preset conversation quality negative evaluation threshold value, the number of evaluation items of the person set consistency negative evaluation is not greater than a preset person set consistency negative evaluation threshold value, and the number of evaluation items of the character reply limit negative evaluation is not greater than a preset character reply limit negative evaluation threshold value, the chat corpus to be evaluated is used as the chat corpus to be converted.

In one embodiment, the annotated chat corpus acquisition module 404 includes:

the first training chat corpus to be improved is used for randomly sampling each chat corpus to be improved to obtain a plurality of first training chat corpora to be improved;

the first training chat corpus to be improved sending unit is used for sending each first training chat corpus to be improved to the user side so that the user side marks the dialogue content of each first training chat corpus to be improved;

the initial dialogue annotation model training unit is used for carrying out SFT training on the pre-constructed initial dialogue annotation model by adopting each first training chat corpus to be improved and each annotated first training chat corpus to be improved;

and the target dialogue annotation model determining unit is used for determining the target dialogue annotation model according to the initial dialogue annotation model trained by the SFT.

In one embodiment, the target dialog annotation model determination unit comprises:

the second training chat corpus to be improved is obtained by a subunit, and is used for randomly sampling each chat corpus to be improved to obtain a plurality of second training chat corpora to be improved;

the target dialogue annotation model training unit is used for performing PPO fine adjustment on the initial dialogue annotation model after SFT training by adopting each second training to-be-improved chat corpus so as to obtain an optimized initial dialogue annotation model, wherein the target scoring model participates in PPO fine adjustment and provides optimization parameters for optimizing the initial dialogue annotation model; and when the optimized initial dialogue annotation model meets a second preset training ending condition, obtaining a target dialogue annotation model.

the target chat corpus acquisition module 406 includes:

the target chat corpus acquisition unit is used for eliminating the annotation corpus to be evaluated if the dialogue quality score, the person setting consistency score and the role reply limit score of each annotation corpus to be evaluated do not reach the preset score threshold, otherwise, taking the annotation corpus to be evaluated as the target chat corpus.

In one embodiment, the present application further provides a storage medium having stored therein computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of the chat corpus tagging method as in any of the embodiments above.

In one embodiment, the present application further provides a computer device, in which computer readable instructions are stored, which when executed by one or more processors, cause the one or more processors to perform the steps of the chat corpus labeling method as in any of the embodiments above.

Schematically, as shown in fig. 5, fig. 5 is a schematic internal structure of a computer device according to an embodiment of the present application, and the computer device 500 may be provided as a server. Referring to FIG. 5, a computer device 500 includes a processing component 502 that further includes one or more processors and memory resources represented by memory 501 for storing instructions, such as applications, executable by the processing component 502. The application program stored in the memory 501 may include one or more modules each corresponding to a set of instructions. Further, the processing component 502 is configured to execute instructions to perform the chat corpus tagging method of any of the embodiments described above.

The computer device 500 may also include a power supply component 503 configured to perform power management of the computer device 500, a wired or wireless network interface 504 configured to connect the computer device 500 to a network, and an input output (I/O) interface 505. The computer device 500 may operate based on an operating system stored in memory 501, such as Windows Server TM, mac OS XTM, unix TM, linux TM, free BSDTM, or the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 5 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Herein, "a," "an," "the," and "the" may also include plural forms, unless the context clearly indicates otherwise. Plural means at least two cases such as 2, 3, 5 or 8, etc. "and/or" includes any and all combinations of the associated listed items.

In the present specification, each embodiment is described in a progressive manner, and each embodiment focuses on the difference from other embodiments, and may be combined according to needs, and the same similar parts may be referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The chat corpus labeling method is characterized by comprising the following steps of:

2. The chat corpus labeling method according to claim 1, wherein the training process of the target scoring model comprises:

3. The chat corpus labeling method of claim 1, wherein the score of each of the initial chat corpora includes a conversation quality score, a person-set consistency score, and a role reply definition score;

4. The chat corpus labeling method according to claim 1, wherein scoring content of each chat corpus to be evaluated comprises dialogue quality negative evaluation, person-set consistency negative evaluation and role reply definition negative evaluation;

5. The chat corpus labeling method according to claim 1, wherein the training process of the target dialogue labeling model comprises the following steps:

6. The chat corpus labeling method according to claim 5, wherein the step of determining the target dialogue labeling model from the SFT-trained initial dialogue labeling model comprises:

7. The chat corpus labeling method according to claim 1, wherein the score of each of the to-be-evaluated labeling corpuses comprises a dialogue quality score, a person-setting consistency score and a role reply definition score;

8. A chat corpus labeling apparatus, the apparatus comprising:

9. A storage medium, characterized by: the storage medium having stored therein computer readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the chat corpus labeling method of any of claims 1 to 7.

10. A computer device, comprising: one or more processors, and memory;

stored in the memory are computer readable instructions which, when executed by the one or more processors, perform the steps of the chat corpus labeling method of any of claims 1 to 7.