CN116955543A

CN116955543A - Continuity evaluation model training and continuity evaluation method, device and equipment

Info

Publication number: CN116955543A
Application number: CN202211425813.6A
Authority: CN
Inventors: 陆柳村; 王硕佳; 赵瑞辉; 郑冶枫
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-11-14
Filing date: 2022-11-14
Publication date: 2023-10-27

Abstract

The application relates to the technical field of artificial intelligence, and provides a method, a device and equipment for training a consistency evaluation model, which are used for improving the consistency evaluation efficiency of a conversation, wherein the training method comprises the following steps: obtaining coding features of each training sample belonging to a plurality of continuity categories, then carrying out classified coding comparison on the obtained coding features to obtain comparison loss, then carrying out continuity evaluation on each coding feature to obtain continuity evaluation values of each training sample, carrying out classified sorting comparison on the obtained continuity evaluation values to obtain sorting loss, and finally carrying out model parameter adjustment by combining the comparison loss and the sorting loss. Therefore, by means of contrast learning, self-supervision automatic continuity assessment is achieved, and therefore continuity testing efficiency is improved.

Description

Continuity evaluation model training and continuity evaluation method, device and equipment

Technical Field

The application relates to the technical field of artificial intelligence, and provides a method, a device and equipment for training a consistency evaluation model and evaluating consistency.

Background

With the rapid development of dialog systems, it is now a focus how to evaluate the consistency that dialog systems exhibit during the course of a dialog. Because the manual evaluation method cannot meet the frequent requirement in the development process of the dialogue system, the automatic dialogue consistency evaluation method becomes a research hot spot in recent years.

In the related art, a two-stage training mode of pre-training and fine-tuning is generally adopted to train the dialogue consistency evaluation model, wherein in the pre-training stage, a self-supervision learning mode is adopted to train the model, and in the fine-tuning training stage, labeled sample data is adopted to train the model trained in the pre-training stage further.

However, the manual annotation data is adopted for training in the fine adjustment stage, so that the dependence on the manual annotation data is not completely eliminated, and a great amount of time is still required for manual annotation, so that the automatic test efficiency of the dialogue system is affected.

Disclosure of Invention

The embodiment of the application provides a method, a device and equipment for training a consistency evaluation model and evaluating consistency, which are used for improving the automatic test efficiency of a dialogue system.

In a first aspect, an embodiment of the present application provides a method for training a continuity evaluation model, where the continuity evaluation model at least includes an encoder and a predictor, the method includes:

respectively inputting each extracted training sample into the encoder for feature extraction to obtain corresponding coding features; wherein each training sample belongs to a plurality of consistency categories, each training sample comprises dialogue information and response information, and each consistency category represents a consistency between the corresponding dialogue information and the response information;

Based on the obtained coding features and the plurality of consistency categories, carrying out classified coding comparison to obtain comparison loss;

inputting each coding feature into the predictor for consistency evaluation to obtain corresponding consistency evaluation values, and carrying out classification and sorting comparison based on each obtained consistency evaluation value and each corresponding consistency category to obtain sorting loss;

parameter adjustments are made to the encoder and the predictor based on the contrast loss and the ordering loss.

In a second aspect, an embodiment of the present application provides a method for evaluating consistency, including:

constructing an application model based on an encoder and a predictor included in a target continuity assessment model, wherein the target continuity assessment model is trained based on the method in the first aspect;

inputting the acquired dialogue to be evaluated and the acquired reply to be evaluated into an encoder of the application model for feature extraction to obtain corresponding coding features;

and inputting the coding features into a predictor of the application model for carrying out continuity evaluation to obtain a target continuity evaluation value.

In a third aspect, an embodiment of the present application provides a device for training a continuity evaluation model, where the continuity evaluation model at least includes an encoder and a predictor, including:

The feature extraction unit is used for respectively inputting each extracted training sample into the encoder to perform feature extraction to obtain corresponding coding features; wherein each training sample belongs to a plurality of consistency categories, each training sample comprises dialogue information and response information, and each consistency category represents a consistency between the corresponding dialogue information and the response information;

the comparison loss calculation unit is used for carrying out classified coding comparison based on the obtained coding features and the plurality of consistency categories to obtain comparison loss;

the sorting loss calculation unit is used for respectively inputting the coding features into the predictor for carrying out continuity evaluation to obtain corresponding continuity evaluation values, and carrying out sorting and sorting comparison based on the obtained continuity evaluation values and the corresponding continuity categories to obtain sorting loss;

and the parameter adjustment unit is used for carrying out parameter adjustment on the encoder and the predictor based on the contrast loss and the sorting loss.

In a fourth aspect, an embodiment of the present application provides a continuity evaluation device, including:

a model construction unit for constructing an application model based on an encoder and a predictor included in a target continuity evaluation model, wherein the target continuity evaluation model is trained based on the method in the first aspect;

The feature extraction unit is used for inputting the acquired dialogue to be evaluated and the response to be evaluated into the encoder of the application model to perform feature extraction so as to obtain corresponding coding features;

and the consistency evaluation unit is used for inputting the coding characteristic into a predictor of the application model to perform consistency evaluation so as to obtain a target consistency evaluation value.

In a fifth aspect, an embodiment of the present application provides an electronic device comprising a processor and a memory, wherein the memory stores a computer program which, when executed by the processor, causes the processor to perform the steps of the method in the first or second aspect.

In a sixth aspect, an embodiment of the application provides a computer readable storage medium comprising a computer program for causing an electronic device to perform the steps of the method of the first or second aspect when the computer program is run on the electronic device.

In a seventh aspect, embodiments of the present application provide a computer program product comprising a computer program stored in a computer readable storage medium, from which computer readable storage medium a processor of an electronic device reads and executes the computer program, causing the electronic device to perform the steps of the method in the first or second aspect.

In the embodiment of the application, a consistency evaluation model at least comprises an encoder and a predictor, in each iteration process, firstly, coding features of each training sample are obtained through the encoder, wherein each training sample is classified into a plurality of consistency categories, each training sample comprises dialogue information and response information, each consistency category represents one consistency between the corresponding dialogue information and the response information, secondly, a comparison loss is obtained based on the obtained coding features and the consistency category to which the corresponding dialogue information belongs, then, a consistency evaluation value of each training sample is obtained through the predictor based on each coding feature, a sorting loss is obtained based on the obtained consistency evaluation values and the corresponding consistency categories, and further, model parameter adjustment is performed based on the comparison loss and the sorting loss.

In this way, in the embodiment of the application, the coding features are subjected to constraint of comparison loss, so that the consistency evaluation model can distinguish the same type of samples from different types of samples, and meanwhile, the model outputs the consistency evaluation value with corresponding size to dialogue information and response information belonging to different consistency categories by constraint of sorting loss of the consistency evaluation value output by the predictor, thereby realizing self-supervision automatic consistency evaluation and further improving consistency test efficiency.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;

FIG. 2A is a schematic illustration of a dialogue provided in an embodiment of the present application;

FIG. 2B is a schematic illustration of another dialog provided in an embodiment of the present application;

FIG. 3 is a schematic diagram of a first consistency assessment model according to an embodiment of the present application;

FIG. 4 is a flowchart of a first model training method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a spliced training sample according to an embodiment of the present application;

FIG. 6 is a flow chart of a comparative loss obtaining method according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a logic diagram for obtaining contrast loss according to an embodiment of the present application;

FIG. 8 is a schematic diagram of logic for obtaining a loss of ordering according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a second consistency assessment model according to an embodiment of the present application;

FIG. 10 is a flowchart of a second model training method according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a logic for obtaining classification loss according to an embodiment of the present application;

FIG. 12 is a logic diagram of a model training provided in an embodiment of the present application;

FIG. 13 is a diagram of a coding feature distribution provided in an embodiment of the present application;

FIG. 14 is a graph showing a target consistency evaluation value according to an embodiment of the present application;

FIG. 15 is a schematic structural diagram of a training device for a consistency evaluation model according to an embodiment of the present application;

FIG. 16 is a schematic diagram of a continuity evaluation device according to an embodiment of the present application;

fig. 17 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the technical solutions of the present application, but not all embodiments. All other embodiments, based on the embodiments described in the present document, which can be obtained by a person skilled in the art without any creative effort, are within the scope of protection of the technical solutions of the present application.

The terms related to the present application will be explained first.

Contrast study: the contrast learning is an encoder network learning mode by pulling the coding distance of the same kind of data in the feature space and pulling the coding distance of different kinds of data.

Self-supervision study: from the large-scale unsupervised data, the network is trained through the constructed supervision information, thereby learning the characterization of the adaptation downstream tasks.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Key technologies to the speech technology (Speech Technology) are automatic speech recognition technology (ASR) and speech synthesis technology (TTS) and voiceprint recognition technology. The method can enable the computer to listen, watch, say and feel, is the development direction of human-computer interaction in the future, and voice becomes one of the best human-computer interaction modes in the future.

Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.

The scheme provided by the embodiment of the application mainly relates to an artificial intelligence machine learning technology, and particularly comprises a model training stage and a model application stage aiming at a consistency evaluation model. Wherein, the continuity evaluation model at least comprises an encoder and a predictor. In the model training stage, iterative training is carried out on the continuity evaluation model to be trained. In the model application stage, based on the encoder and the predictor contained in the target consistency evaluation model obtained through training, the dialogue to be evaluated and the reply to be evaluated are evaluated, and corresponding target consistency evaluation values are obtained. Specific implementations of the model training phase and the model application phase are described below and are not described in detail herein.

With the rapid development of dialogue systems, how to evaluate the continuity of dialogue systems in the process of conducting dialogue has become a focus at present, and the automatic dialogue continuity evaluation method is a more recent research hotspot.

In the related art, a two-stage training mode of pre-training and fine-tuning is generally adopted to train the dialogue consistency evaluation model, wherein in the pre-training stage, a self-supervision learning mode is adopted to train the model, and in the fine-tuning training stage, labeled sample data is adopted to train the model trained in the pre-training stage further. However, the manual annotation data is adopted for training in the fine tuning stage, so that the dependence on the manual annotation data is not completely eliminated, and a great deal of time is still required for manual annotation, so that the test efficiency of the dialogue system is affected.

Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application. The application scenario diagram includes a terminal device 110 and a server 120. Communication between the terminal device 110 and the server 120 may be performed through a communication network. The communication network may be a wired network or a wireless network.

The terminal device 110 is a computer device used by a user, and the computer device includes, but is not limited to, a personal computer, a mobile phone, a tablet computer, a notebook, an electronic book reader, an intelligent voice interaction device, an intelligent home appliance, a vehicle-mounted terminal, an aircraft, and the like.

The server 120 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content delivery network (Content Delivery Network, CDN), basic cloud computing services such as big data and an artificial intelligence platform.

It should be noted that the illustration in fig. 1 is merely an example, and the number of terminal devices and servers is not limited in practice.

In the embodiment of the present application, a dialogue client may be installed in the terminal device 110, where the dialogue client is used to implement a man-machine interaction dialogue. The dialogue client may be software for performing man-machine interaction dialogue, for example, intelligent home software, intelligent medical software, intelligent customer service software, virtual assistant software and other software supporting man-machine interaction dialogue, and may also be applet, web page and the like, which are not limited in detail herein.

It should be noted that, the dialogue client in the embodiment of the present application may also refer to various applications supporting man-machine interaction dialogue, such as education, messages, travel, listening books, advertisement, etc., which may be applied on a vehicle.

The server 120 may be a background server corresponding to the session client, where the server 120 is configured to determine context information of the input information after receiving the input information of the user, and determine response information corresponding to the input information according to the context information of the input information.

It should be noted that in the implementation of the present application, the response information may be text, image, video, etc., or may be a message card, where the message card is a message type carrying rich graphic contents and interactive behaviors, but is not limited thereto.

For example, fig. 2A is a schematic diagram of a first man-machine interaction session according to an embodiment of the application. The terminal device is provided with intelligent medical software, the intelligent medical software supports medical diagnosis, the input information of the current round of dialogue input of a user is 'men, 25 years old', the context information comprises multi-round dialogue data generated before the user inputs 'men, 25 years old', and the multi-round dialogue data comprises: "you can describe the symptoms as you would you recommend proper department … … to you," severe hair loss "to the user, and" your gender and age? ". After the terminal device obtains the input information input by the user, the input information is sent to the server, and the server determines the context information of the input information according to the received input information of the user and determines that the response information corresponding to the input information is "recommend your department … …".

For another example, fig. 2B is a schematic diagram of a second man-machine interaction session according to an embodiment of the application. The terminal equipment is provided with a health consultation applet, the health consultation applet supports intelligent vaccine question-answering, the input information input by the current round of dialogue input of a user is 'whether the user needs to take xx vaccine', the context information comprises multi-round dialogue data generated before the user inputs 'whether the user needs to take xx vaccine', and the multi-round dialogue data comprises: the message card 1 sent by the terminal device to the user contains contents such as how to safely vaccine and how to reserve the inoculation surface, and the like in the message card 1. After the terminal device obtains the input information input by the user, the input information is sent to the server, the server determines the context information of the input information according to the received input information of the user, and determines the response information corresponding to the input information as a message card 2 and a message card 3 according to the context information of the input information, wherein the message card 1 is text information, and the message card 2 is "you may ask me … …".

The server 120 may be further configured to iteratively train the continuity assessment model to be trained based on the training sample set to obtain a target continuity assessment model. See below for specific model training procedures.

In the embodiment of the present application, the recommendation model may be deployed in a background server for training, may be deployed in a terminal device 110 for training, and may be deployed in other computing devices for training, including, but not limited to, devices with computing functions such as a terminal device or a server. In addition, in the embodiment of the present application, the model training process and the model application process may be implemented in the same computing device, or may be implemented in different computing devices, which is not limited thereto.

The model training method provided by the embodiment of the application can be applied to various application scenes supporting man-machine interaction dialogue, including but not limited to cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like, and training samples used in different scenes are different and are not listed here.

The model training method provided by the exemplary embodiment of the present application will be described below with reference to the accompanying drawings in conjunction with the application scenario described above, and it should be noted that the application scenario described above is only shown for the convenience of understanding the spirit and principle of the present application, and the embodiment of the present application is not limited in any way in this respect.

Referring to fig. 3, a schematic structure diagram of a first continuity evaluation model according to an embodiment of the present application is shown, where the continuity evaluation model includes an encoder and a predictor.

Wherein an encoder is used for feature extraction, the encoder may employ, but is not limited to BERT (BidirectionalEncoder Representations from Transformer), by way of example.

The predictor is used to predict the continuity assessment value, and is illustratively implemented by a multi-layer perceptron (Multilayer Perceptron, MLP). Illustratively, the MLP is a three-tier fully connected network, two of which are Exponential Linear Units (ELUs) and one of which is a Sigmoid function.

Referring to fig. 4 in conjunction with the continuity evaluation model in fig. 3, a first model training method provided in an embodiment of the present application is applied to an electronic device, which may be a server or a terminal device.

In the iterative training process, all training samples are divided into designated batches, and training is performed based on training samples of respective sub-batches, and since steps performed in training for each batch in each iterative process are similar, training for one batch is described here as an example.

S401, respectively inputting the extracted training samples into an encoder for feature extraction to obtain corresponding coding features.

In the embodiment of the application, a batch contains a set number of training samples, each training sample belongs to a plurality of consistency categories, each training sample contains dialogue information and response information, and each consistency category represents a consistency between the corresponding dialogue information and the response information.

In the embodiment of the present application, as one possible implementation manner, two consistency categories may be set, where the two consistency categories are respectively representing coherence and incoherence, but considering that the user has a multi-level characteristic for the judgment of the coherence assessment, as another possible implementation manner, more than two consistency categories may be set. In the following, only three consecutive categories are described as examples, and the three consecutive categories are respectively denoted as L1, L2, and L3, where the consecutive between L1, L2, and L3 is L1, L2, and L3 in order from small to large.

The dialogue information in the training samples is dialogue data in the actual application scene. In the embodiment of the application, the intelligent medical scene is taken as an example for illustration.

The dialogue information of the training sample contains input information. For example, a user may input information in the form of voice, text, pictures, etc. on a terminal on which a conversation application is installed. The dialogue information also contains context information corresponding to the input information, wherein the context information refers to dialogue information between the object to be evaluated and the target object. The history dialogue information may be dialogue information transmitted by the first object to the second object, dialogue information transmitted by the second object to the first object, or dialogue information for performing interactive dialogue between the first object and the second object. The object to be evaluated refers to a device for providing a dialog function.

It should be noted that, in the embodiment of the present application, the types of dialogue information include but are not limited to text, voice, image, etc., and the types of response information include but are not limited to text, voice, image, etc.

For example, the dialogue information 1 is "XX character in a certain movie may also be" and the response information 1 is "XX character acts as a leader, a teach or a protector of a safe zone in the movie, she is struggling with a weak body and a strong aggression in japan", and the consistency level is L2.

For another example, dialogue information 2 is "a: the alopecia is serious; b: your sex and age; a: male, 23 years old, answer message 2 is "recommend you to go to the department below", and the consistency level is L3.

In one embodiment, each training sample may be a training sample containing the same dialogue information extracted from the training sample set, and each training sample may have the same dialogue information but different response information, and specifically, each training sample may be obtained by:

taking the dialogue information C as an example, the dialogue information C is any dialogue information in the training data set, and the training data set contains the dialogue information C and response information belonging to different consistency categories corresponding to the dialogue information C.

For the dialogue information c, a set number of response information is obtained from the training data set for each of a plurality of consistency categories, and the obtained response information is respectively spliced with the dialogue information c to obtain training samples belonging to the plurality of consistency categories.

Referring to fig. 5, assuming that the set number is 5, for the dialogue information c, response information corresponding to each of the plurality of consistency categories is acquired, where the response information belonging to L1 includes: response information 1, response information 2, response information 3, response information 4, response information 5, and response information belonging to L2 includes: response information 6, response information 7, response information 8, response information 9, response information 10, and response information belonging to L3 includes: the response information 11, the response information 12, the response information 13, the response information 14 and the response information 15 are spliced with the obtained response information respectively to obtain training samples, wherein the training samples belonging to the L1 comprise: (dialogue information c), answer information 1), (dialogue information c, answer information 2), (dialogue information c, answer information 3), (dialogue information c, answer information 4), (dialogue information c, answer information 5), the training samples assigned to L2 include: (dialogue information c), answer information 6), (dialogue information c, answer information 7), (dialogue information c, answer information 8), (dialogue information c, answer information 9), (dialogue information c, answer information 10), the training samples assigned to L3 include: (dialogue information c), answer information 11, dialogue information c, answer information 12, dialogue information c, answer information 13, dialogue information c, answer information 14, dialogue information c, answer information 15). Hereinafter, the foregoing 15 training samples will be described as an example only.

And extracting the characteristics of each training sample through the BERT model to obtain corresponding coding characteristics until all the training samples are processed. Specifically, for each training sample, equation (1) may be adopted to obtain corresponding coding features, and then:

f＝BERT([c；r]) (1)

wherein, [ c; r is (r)]For representing a training sample, f represents the corresponding coding feature of the training sample, c represents dialogue information in the training sample, r represents response information in the training sample, and c= { c is assumed ₁ ,…,c _m }，r＝{r ₁ ,…,r _n }，c ₁ ,…,c _m Is a token of dialogue information, r ₁ ,…,r _n C is a token of response information, and c and r are spliced to obtain [ c ]; r is (r)]Is { [ CLS ]],c ₁ ,…,c _m ,[SEP],r ₁ ,…,r _n ,[SEP]}，[CLS]The start character is indicated as such,[SEP]representing the separator.

S402, performing classified coding comparison based on the obtained coding features and a plurality of consistency categories to obtain comparison loss.

Specifically, when S402 is performed, the following steps may be adopted, but are not limited to:

step A, classifying each training sample based on the coherence category to which each training sample belongs, so as to obtain each similar sample and each different sample of each training sample;

step B, based on the obtained classification results, performing similarity comparison on the coding features to obtain the corresponding contrast sub-loss of each training sample;

And C, obtaining contrast loss based on the calculated contrast sub-loss.

In one embodiment, classifying each training sample based on a consistency class to which each training sample belongs, to obtain each similar sample and each dissimilar sample of each training sample, including:

for each training sample in the training samples, the following operations are respectively executed:

taking a training sample as an anchor point sample, and taking a consistency category to which the anchor point sample belongs as an anchor point category;

screening training samples belonging to the anchor point category except the anchor point sample from the training samples based on the anchor point category, and taking the training samples as similar samples corresponding to the anchor point sample;

based on the anchor point category, screening out training samples belonging to the category except the anchor point from the training samples, and taking the training samples as different types of samples corresponding to the anchor point samples.

In the following, taking a training sample x as an example, a determination process of calculating a similar sample and a heterogeneous sample is described, wherein the training sample x is any one training sample in each training sample.

Specifically, each of the homogeneous samples and each of the heterogeneous samples of the training sample x may be obtained by, but not limited to, the following steps:

And A1, taking the training sample x as an anchor point sample, and taking the consistency category of the anchor point sample as an anchor point category.

Taking training sample x as an example of training sample (dialogue information c, response information 1), taking training sample (dialogue information c, response information 1) as an anchor sample, and taking consistency class L1 to which training sample (dialogue information c, response information 1) belongs as an anchor class.

And A2, determining each similar sample corresponding to the anchor point sample from each training sample based on the anchor point category.

The similar samples refer to training samples except for the training sample x, which belong to the anchor point category in each training sample. In one embodiment, training samples belonging to the anchor point category except for the training sample x can be screened out from the training samples based on the anchor point category and used as the same class samples corresponding to the training sample x.

For example, referring to fig. 6, assume that training sample x is a training sample (dialogue information c, response information 1), and training samples other than (dialogue information c, response information 1) belonging to L1 are selected from the training samples based on anchor point class L1: each of the same class samples corresponding to the dialogue information c, the response information 1) is: (dialogue information c, answer information 2), (dialogue information c, answer information 3), (dialogue information c, answer information 4), (dialogue information c, answer information 5), and (dialogue information c, answer information 2), (dialogue information c, answer information 3), (dialogue information c, answer information 4), (dialogue information c, answer information 5) are taken as each class sample (dialogue information c, answer information 1).

And A3, determining various samples corresponding to the anchor point samples from the training samples based on the anchor point types.

Heterogeneous samples refer to training samples belonging to a non-anchor class. In one embodiment, training samples belonging to a category other than the anchor point category are selected from the training samples based on the anchor point category, and the training samples are used as different types of samples corresponding to the training sample x.

For example, still referring to fig. 6, based on the anchor point class L1, training samples belonging to L2 and L3 are screened from the training samples: (dialogue information c, answer information 6), (dialogue information c, answer information 7), (dialogue information c, answer information 8), (dialogue information c, answer information 9), (dialogue information c, answer information 10), (dialogue information c, answer information 11), (dialogue information c, answer information 12), (dialogue information c, answer information 13), (dialogue information c, answer information 14), (dialogue information c, answer information 15), and (dialogue information c, answer information 6), (dialogue information c, answer information 7), (dialogue information c, answer information 8), (dialogue information c, answer information 9), (dialogue information c, answer information 10), (dialogue information c, answer information 11), (dialogue information c, answer information 12), (dialogue information c, answer information 13), (dialogue information c, answer information 14), (dialogue information c, answer information 15) as the respective different types of samples corresponding to (dialogue information c, answer information 1).

The order of execution between the step A2 and the step A3 is not limited, and the step A2 may be executed first, then the step A3 may be executed, or the step A3 may be executed first, then the step A2 may be executed.

In one embodiment, based on the obtained classification results, performing similarity comparison on the coding features to obtain respective contrast sub-losses corresponding to the training samples, including:

taking a training sample as an anchor sample, and respectively calculating first similarity between the anchor sample and each similar sample and second similarity between the anchor sample and each different sample based on each obtained coding characteristic;

and taking the sum of the first similarity and the second similarity as the total similarity sum, respectively calculating the ratio between the first similarity and the total similarity, and obtaining the contrast sub-loss corresponding to the anchor point sample based on the calculated ratios.

Taking training sample x as an example, the contrast loss corresponding to training sample x may be obtained by, but not limited to, the following steps:

and B1, taking the training sample x as an anchor sample, and respectively calculating the first similarity between the anchor sample and each similar sample and the second similarity between the anchor sample and each different sample based on each obtained coding characteristic.

The first similarity and the second similarity may be euclidean distance, cosine distance, and hamming distance, but are not limited thereto.

Still taking the training sample x as the training sample (dialogue information c, answer information 1), based on the obtained coding features, respectively calculating a first similarity between (dialogue information c, answer information 1) and (dialogue information c, answer information 2), (dialogue information c, answer information 3), (dialogue information c, answer information 4), (dialogue information c, answer information 5), and respectively calculating a second similarity between (dialogue information c, answer information 1) and (dialogue information c, answer information 6), (dialogue information c, answer information 7), (dialogue information c, answer information 8), (dialogue information c, answer information 9), (dialogue information c, answer information 10), (dialogue information c, answer information 11), (dialogue information c, answer information 12), (dialogue information c, answer information 13), (dialogue information c, answer information 14), (dialogue information c, answer information 15).

And B2, obtaining the contrast sub-loss corresponding to the anchor point sample based on the obtained first similarity and the second similarity.

Specifically, the sum of the first similarities and the second similarities can be used as a total similarity sum, the ratio between the first similarities and the total similarity is calculated, and the contrast sub-loss corresponding to the anchor point sample is obtained based on the calculated ratios.

It should be noted that, in the embodiment of the present application, the contrast loss may be a triplet loss, a SupCon loss, or a noise contrast estimation (info Noise Contrastive Estimation, infoNCE) loss, but is not limited thereto.

In one embodiment, for each training sample, after classifying each training sample based on the consistency class to which the training sample belongs to obtain each similar sample and each dissimilar sample corresponding to the training sample, first, based on each obtained coding feature, respectively calculating a first similarity between the training sample and each corresponding similar sample, respectively calculating a second similarity between the training sample and each corresponding dissimilar sample, respectively, and second, calculating a sum of each first similarity and each second similarity, wherein the sum of each first similarity and each second similarity can be referred to as a total similarity sum, then, respectively calculating a ratio between each first similarity and the total similarity, and then, summing each calculated ratio to obtain a contrast sub-loss corresponding to the training sample. And finally, summing the contrast sub-losses corresponding to each training sample, and obtaining the contrast loss of the current batch.

Specifically, the contrast loss may be, but is not limited to, a SupCon loss, which may be calculated using equation (2):

wherein L is ^sup Representing a contrast loss, I represents a set of each training sample acquired in S401, I represents an anchor sample, P (I) represents a set of each class of samples of the anchor sample, a (I) represents a set of other training samples in each training sample except the anchor sample, the other training samples comprise each class of samples and each class of samples of the anchor sample, z _i Representing the coding characteristics of anchor samples, z _p Representing the coding characteristics of samples of the same type, z _a Representing the coding features of the training samples except the anchor sample, exp represents an exponential function, "·" represents an inner product, z _i ·z _p I.e. the first similarity, z _i ·z _a Namely, the second similarity. τ is a super parameter, τ can be called a temperature parameter, τ is used to scale or expand data, and τ can be adjusted in combination with an actual training effect.

By comparing the constraint of the loss function, the samples of the same category can be pulled to a short distance in the feature space, namely the similarity is maximized, and the samples of different categories are pulled to a large distance, namely the similarity is as small as possible, so that the continuity evaluation efficiency is ensured, and the accuracy of the continuity evaluation is improved.

S403, inputting each coding characteristic into a predictor for consistency evaluation, and obtaining a corresponding consistency evaluation value.

The predictor is configured to output a consistency evaluation value, where the consistency evaluation value may be represented by a numerical value or may be represented by a level, which is not limited. Only numerical representations of the continuity assessment values are described herein as examples.

In one embodiment, for each training sample, a continuity evaluation is performed by the MLP to obtain a corresponding continuity evaluation value until all training samples are processed. Specifically, the continuity evaluation value of the training sample may be calculated using the formula (3):

s=mlp (f) formula (3)

Where s represents a continuity evaluation value and f represents an input coding feature.

S404, sorting and sorting comparison is carried out based on the obtained continuity evaluation values and the corresponding continuity categories thereof, and sorting loss is obtained.

In order to realize that training samples with high consistency degree obtain high scores and samples with low consistency degree obtain high scores, the consistency evaluation values of the training samples of all consistency classes are subjected to constraint of sequencing loss. It should be noted that the higher the continuity evaluation value, the higher the degree of continuity is, but not limited to this.

Specifically, when the sorting loss is obtained by comparing the sorting orders based on the obtained continuity evaluation values and the corresponding continuity categories thereof, the following steps may be adopted but are not limited to:

step one, obtaining a class total evaluation value corresponding to each of a plurality of continuity classes based on the obtained continuity evaluation values and the corresponding continuity classes.

For example, referring to fig. 7, taking a percentage as an example, the continuity evaluation values corresponding to each of the 15 training samples are sequentially 1 to 15, based on the obtained continuity evaluation values and the continuity categories corresponding to each of the 15 training samples, a total category evaluation value corresponding to L1 is 1+2+3+4+5=15, a total category evaluation value corresponding to L2 is 6+7+8+9+10=40, and a total category evaluation value corresponding to L3 is 11+12+13+14+15=65.

And step two, obtaining a category evaluation difference value between every two adjacent continuity categories in the plurality of continuity categories according to a preset arrangement sequence among the plurality of continuity categories based on the obtained total evaluation values of the plurality of categories.

It should be noted that, in the embodiment of the present application, the arrangement order among the plurality of continuity categories may be arranged according to the continuity of each of the plurality of continuity categories, the arrangement order may be in the order of continuity from small to large, and the order may be in the order of continuity from large to small, which is only illustrated by taking the arrangement order as the continuity from small to large as an example.

For example, still referring to fig. 7, based on the obtained total evaluation values of the categories of L1, L2 and L3, the category evaluation difference between L1 and L2 and the category evaluation difference between L1 and L3 are obtained according to the preset arrangement sequence of L1, L2 and L3, wherein the category evaluation difference between L1 and L2 and the category evaluation difference between L1 and L3 are both 25.

And thirdly, respectively carrying out numerical adjustment on the obtained at least one category evaluation difference value based on the obtained at least one category evaluation difference value to obtain the sorting loss.

As a first implementation, the obtained at least one category evaluation difference may be directly summed to obtain the ranking penalty. The specific calculation manner of the first implementation manner is not limited, and illustratively, a modification formula of the formula (4) may be adopted, where scale is not included in the modification formula, and the formula (4) is described below.

As a second implementation manner, considering that the first implementation manner can only achieve simple sorting, for the automatic continuity testing method, samples of different continuity categories have a certain fraction difference, so sorting loss in the embodiment of the application adds a category interval parameter, so that certain interval values exist between the sample fractions of different continuity categories while correctly sorting.

Specifically, when the third step is executed, firstly, respectively carrying out numerical adjustment on the obtained at least one category evaluation difference value based on a preset category interval parameter to obtain at least one candidate difference value; secondly, screening out candidate difference values with values larger than a preset value threshold value from at least one candidate difference value, and taking the screened at least one candidate difference value as at least one target difference value; finally, a ranking penalty is obtained based on the at least one target difference.

In one embodiment, the ranking penalty may be obtained by summing the values based on at least one target difference.

For example, assuming that the preset value threshold is 0, the category interval parameter is 5, the category evaluation difference between L1 and L2, and the category evaluation difference between L2 and L3 are all 25, firstly, based on the preset category interval parameter, numerical adjustment is performed on the category evaluation difference between L1 and L2, and the category evaluation difference between L2 and L3, respectively, to obtain 2 candidate differences, the 2 candidate differences are all 25+5=30, secondly, candidate differences with a value greater than 0 are screened out of the 2 candidate differences, the screened candidate differences are used as target differences, the target differences comprise 30 and 30, and finally, based on the determined target differences, the sorting loss is 60.

In one embodiment, for each two adjacent consecutive categories, calculating a category evaluation difference value between the two adjacent consecutive categories, then, based on a preset category interval parameter, performing numerical adjustment on the category evaluation difference value between the two adjacent consecutive categories to obtain a candidate difference value between the two adjacent consecutive categories, comparing the candidate difference value between the two adjacent consecutive categories with a preset value threshold, and if the candidate difference value is greater than the preset value threshold, taking the candidate difference value as a target difference value, otherwise, taking the value of the preset value threshold as the target difference value. And after obtaining the target difference values of all adjacent two adjacent continuity categories, summing the target difference values to obtain the sorting loss.

Specifically, the ordering loss function may be calculated using equation (4):

wherein e represents a continuity evaluation value,representing class assessment differences between adjacent consecutive classes, the max function is used to take a maximum value and scale represents a class interval parameter. The class interval parameter represents a predefined minimum interval value between different consecutive classes.

By introducing the sorting loss, the score of the dialogue information and the response information with better consistency is higher than that of the dialogue information and the response information with worse consistency, and a certain interval value is arranged between the dialogue information and the response information, so that the score is not easy to distinguish due to too small difference, and the evaluation accuracy is improved.

S405, performing parameter adjustment on the encoder and the predictor based on the contrast loss and the sorting loss. Specifically, the sum of the contrast loss and the sorting loss can be used as a model total loss, and parameter adjustment is performed based on the model total loss.

S406, judging whether the model convergence condition is reached, if so, executing S407, otherwise, executing S401.

In an embodiment of the present application, the convergence condition may include, but is not limited to, at least one of the following conditions:

(1) The total loss of the model is not greater than a preset loss value threshold.

(2) The iteration number reaches a preset number upper limit value.

S407, outputting the target continuity evaluation model.

In the embodiment of the present application, S402 may be executed first, and then S403 to S404 may be executed, or S403 to S404 may be executed first, and then S402 may be executed.

Referring to fig. 8, a schematic structural diagram of a second consistency assessment model according to an embodiment of the present application is shown, where the consistency assessment model includes an encoder, a predictor, and a classifier.

Wherein the encoder is used for feature extraction. Illustratively, the encoder may employ, but is not limited to, BERT.

The predictor is used for predicting the continuity evaluation value. Illustratively, the predictor is implemented using a multi-layer perceptron (MLP). Illustratively, the MLP is a three-tier fully connected network, with three tier activation functions including two ELUs and one Sigmoid function.

The classifier is used for predicting classification results. Illustratively, the classifier is a Linear (Linear) full-connected layer network, and the coding feature f obtained by coding is mapped into a low-dimensional vector.

Referring to fig. 9 in conjunction with the continuity assessment model shown in fig. 8, a second model training method provided in an embodiment of the present application is applied to an electronic device, and the method specifically includes:

and S901, respectively inputting the extracted training samples into an encoder to perform feature extraction, and obtaining corresponding coding features. As in S401, see S401 in particular.

S902, performing classified coding comparison based on the obtained coding features and a plurality of consistency categories to obtain comparison loss. As with S402, see in particular S402.

S903, inputting each coding feature into a predictor for consistency evaluation to obtain corresponding consistency evaluation values. As in S403, see in particular S403.

S904, sorting and sorting comparison is carried out based on the obtained continuity evaluation values and the corresponding continuity categories, and sorting loss is obtained. As in S404, see in particular S404.

S905, inputting each coding feature into a classifier to conduct consistency class prediction, and obtaining a corresponding classification result.

Illustratively, using k to represent the low-dimensional vector output by the classifier, i.e., the classification result, the coding feature f can be calculated using equation (5):

k=linear (f) formula (5)

S906, comparing the classification accuracy based on the obtained classification results and the consistency category to which each training sample belongs, and obtaining the classification loss.

Specifically, in the embodiment of the present application, for each coding feature, the classification result output by the classifier includes multiple classification sub-results, where each classification sub-result is used to characterize whether the training sample belongs to a consistency class, and when S906 is executed, the multiple classification sub-results of each coding feature are respectively compared with the consistency class to which each coding feature belongs, so as to obtain respective classification sub-loss of each coding feature, and further based on the respective classification sub-loss of each coding feature, the classification loss is obtained.

Wherein the classification loss may employ cross entropy loss, but is not limited thereto.

For example, referring to FIG. 10, for each of the coding features (Session information c, response information 1), (Session information c, response information 2), (Session information c, response information 3), (Session information c, response information 4), (Session information c, response information 5), (Session information c, response information 6), (Session information c, response information 7), (Session information c, response information 8), (Session information c, response information 9), (Session information c, response information 10), (Session information c, response information 11), (Session information c, response information 12), (Session information c, response information 13), (Session information c, response information 14), (Session information c, response information 15), each of the coding features is input to a classifier to obtain a corresponding classification result, and the classification result of (Session information c, response information 1) is used as an example, the classification result of (Session information c, response information 1) contains 3 classification sub-results, the 3 classification sub-results are the classification feature belonging to L1, the classification feature belonging to L2 and the classification feature belonging to L3, respectively, then the 3 classification results are respectively based on the respective coding feature, 3 sub-results (Session information c, L1, L2, each of which is based on the respective coding feature, the respective coding feature is further based on whether each of the classification result is lost, L1, L2, L2, each of the classification sub-results is further based on the respective coding feature is obtained, a classification penalty is obtained.

In one embodiment, when comparing the respective multiple classification sub-results of each coding feature with the respective belonging coherence class to obtain the respective classification sub-loss of each coding feature, the respective multiple classification sub-results of each coding feature may be compared with the respective multiple real labels of each coding feature to obtain the respective classification sub-loss of each coding feature, where each real label representation is used to indicate whether the corresponding training sample belongs to one coherence class. Then, the respective classification sub-losses of the coding features are summed to obtain a classification loss.

Specifically, when cross entropy loss is used, the classification loss can be calculated by using formula (6):

wherein L is ^CE Represents the classification loss, N represents the number of training samples acquired, C represents the number of consistency classes, w _c To set the weight, k _n,c Representing the classification sub-result corresponding to the c-th consistency class of the n-th training sample, y representing the consistency class label, y _n,c The true label representing the c-th consistency class of the nth training sample is exemplified by that if the true consistency class of the nth training sample is the c-th consistency class, then the value of y is 1, otherwise, is 0.

S907, obtaining a model total loss based on the comparison loss, the sorting loss and the classification loss, and performing parameter adjustment on the encoder, the predictor and the classifier based on the model total loss.

In one embodiment, the ranking, comparison, and classification losses are summed to obtain a model total loss.

Specifically, the total model loss can be calculated using equation (7):

L＝L ^sup +L ^rank +L ^CE formula (7)

Wherein L represents the total loss of the model, L ^sup Represents contrast loss, L ^rank Representing the loss of ordering, L ^CE Representing the classification loss.

S908, judging whether the model convergence condition is met, if so, executing S909, otherwise, executing S901.

In the embodiment of the application, the model convergence condition is not limited. For example, the model convergence condition may be that the current iteration number reaches a preset iteration number, and the total model loss is smaller than a preset loss value threshold, but is not limited thereto.

S909, outputting the target continuity evaluation model.

Through the implementation mode, by introducing the classification loss, the training data of each category can be ensured to return to the same consistency category, and the accurate classification of dialogue information and response information belonging to different consistency categories is realized, so that the accuracy of model output is improved.

In the embodiment of the present application, the execution order among S902, S903 to S904, and S905 to S906 is not limited. Illustratively, the steps may be performed in the order 902, S903-S904, S905-S906, or may be performed in the order S903-S904, S902, S905-S906, but are not limited thereto.

In one embodiment, the trained target continuity assessment model is obtained by iteratively training the continuity assessment model to be trained. After the target continuity evaluation model is obtained, an application model can be constructed based on the encoder and the predictor contained in the target continuity evaluation model, the encoder and the predictor contained in the application model are used for continuity evaluation, and then the application model is used for continuity evaluation.

Referring to fig. 11, a flow chart of a method for evaluating consistency provided in an embodiment of the present application is shown, where the method is applied to an electronic device, and the electronic device may be a terminal device or a server, and the specific flow chart is as follows:

s1101, constructing an application model based on the encoder and the predictor contained in the target continuity evaluation model. The application model comprises an encoder and a predictor of the target continuity assessment model.

S1102, inputting the acquired dialogue to be evaluated and the response to be evaluated into an encoder of the application model for feature extraction, and obtaining corresponding coding features.

S1103, inputting the coding features into a predictor of the application model for continuity evaluation to obtain a target continuity evaluation value.

Specifically, the dialog under evaluation may be obtained by, but is not limited to, the following:

responding to the input operation of a target object in a target dialogue application, acquiring input information, and acquiring historical dialogue information of the target object, wherein the historical dialogue information comprises multiple rounds of dialogue data generated before the target object inputs the input information;

and determining context information corresponding to the input information based on the historical dialogue information, and taking the context information and the input information as a dialogue to be evaluated, wherein the context information is at least one round of dialogue data in multiple rounds of dialogue data.

The context information may be the latest dialogue data in the multiple rounds of dialogue data, or may be at least one round of dialogue data in which the duration of the input time of the input information is within a preset duration range in the multiple rounds of dialogue data, but is not limited to this. For example, a round of dialog may include a question message and a response message.

Taking an intelligent medical scene as an example, responding to input operation of a user in an intelligent diagnosis guiding application, acquiring input information of 'men' and '25 years old', acquiring historical dialogue information of the user, wherein the historical dialogue information comprises multiple rounds of dialogue data generated before a target object inputs the input information, and determining context information corresponding to the input information based on the historical dialogue information, wherein the context information comprises: "you can describe the symptoms as such" you will recommend proper department … … for you, "severe alopecia" for users to send to the intelligent triage application, and "your gender and age? The context information and the input information are then taken as the dialog to be evaluated. Assuming that the to-be-evaluated reply is "recommended to you to go to the following department … …", the acquired dialog to be evaluated and the reply to be evaluated are input into an encoder of the application model to obtain corresponding coding features, and then the coding features are input into a predictor of the application model to obtain a target continuity evaluation value of 90 minutes.

Through the implementation mode, the conversation consistency test aiming at different conversation applications can be realized, and the automatic test efficiency is improved.

The model training process and the model application process in the present application will be described below with reference to two specific embodiments.

Embodiment one: shopping consultation scene

The continuity evaluation model is applied to a shopping consultation scene of the customer online consultation intelligent customer service commodity information. For a shopping consultation scene, the dialogue information in each training sample is dialogue text between a customer and an intelligent customer service, and the response information in each training sample is response text returned by the intelligent customer service to the customer.

Referring to fig. 12, in the model training stage, based on the training sample set, iterative training is performed on the continuity evaluation model to be trained to obtain a target continuity evaluation model, and in each iteration process, the following operations are performed:

firstly, a batch of training samples are extracted from a training sample set, 15 training samples in the batch of training samples have the same dialogue information but different response information, 5 training samples in the 15 training samples belong to a consistency class L1,5 training samples belong to a consistency class L2, and 5 training samples belong to a consistency class L3. And respectively inputting the 15 training samples into the BERT for feature extraction, and obtaining the coding features of the 15 training samples.

And secondly, obtaining contrast loss by adopting a formula (2) based on the coding characteristics of each of the 15 training samples and the coherence category to which each training sample belongs.

And then, respectively inputting the coding features of each of the 15 training samples into a predictor for continuity evaluation to obtain respective continuity evaluation values of the 15 training samples, and obtaining the sorting loss by adopting a formula (4) based on the obtained respective continuity evaluation values and the respective corresponding continuity categories.

And then, respectively inputting the coding features of each of the 15 training samples into a classifier for carrying out consistency class prediction, obtaining the classification result of each of the 15 training samples, and obtaining the classification loss by adopting a formula (6) based on each obtained classification result and the consistency class to which each training sample belongs.

And finally, taking the sum of the comparison loss, the sorting total loss and the classification loss as the model total loss, and carrying out model parameter adjustment based on the model total loss.

In the model application stage, an application model is constructed based on the encoder and the predictor included in the target continuity evaluation model, and the encoder and the predictor included in the target continuity evaluation model are included in the application model.

The electronic equipment inputs the acquired dialogue to be evaluated and the response to be evaluated into an encoder of the application model to obtain corresponding coding features, and inputs the coding features into a predictor of the application model to obtain a target consistency evaluation value.

Embodiment two: virtual assistant scenarios

Still referring to FIG. 12, a consistency assessment model is applied to a speech dialog scenario of a user with a virtual assistant. For a voice conversation scene, the conversation information in each training sample is conversation voice between the user and the virtual assistant, and the response information is response voice replied to the user by the virtual assistant.

In the model training stage, based on a training sample set, iterative training is performed on the continuity evaluation model to be trained shown in fig. 9, so as to obtain a target continuity evaluation model, and in each iteration process, the following operations are performed:

firstly, a batch of training samples are extracted from a training sample set, 20 training samples in the batch of training samples have the same dialogue information but different response information, 4 training samples in the 20 training samples belong to a consistency class 1,4 training samples belong to a consistency class 2,4 training samples belong to a consistency class 3,4 training samples belong to a consistency class 4, and 4 training samples belong to a consistency class 5, wherein the consistency between the consistency classes 1-5 is sequentially from small to large, namely the consistency class 1, the consistency class 2, the consistency class 3, the consistency class 4 and the consistency class 5. And respectively inputting the 20 training samples into the BERT for feature extraction, and obtaining the coding features of the 20 training samples.

And secondly, obtaining contrast loss by adopting a formula (2) based on the coding characteristics of each of the 20 training samples and the coherence category to which each training sample belongs.

And then, respectively inputting the coding features of each of the 20 training samples into a predictor for carrying out continuity type prediction to obtain respective continuity evaluation values of the 15 training samples, and obtaining the sorting loss by adopting a formula (4) based on the obtained continuity evaluation values and the respective corresponding continuity types.

And then, respectively inputting the coding features of the 20 training samples into a classifier for carrying out consistency class prediction, obtaining the classification results of the 20 training samples, and obtaining the classification loss by adopting a formula (6) based on the obtained classification results and the consistency classes to which the classification results belong.

In the embodiment of the application, the design of a dialogue consistency automatic test method is considered, experiments are carried out on a plurality of manual labeling scoring data sets, and the evaluated index (Metric) is the correlation between the consistency automatic test method and the manual labeling scoring data, and comprises pearson, spearman, kendall correlation coefficients, wherein a baseline model for comparison with the target consistency evaluation model comprises GRADE and QuantiDCE models.

In the first set of experiments, the manual annotation scoring dataset was a ConvAI2 dataset, and the comparison results are shown in table 1:

table 1 comparison results of the first set of experiments

In the second set of experiments, the artificial annotation score dataset was an empathetics dataset, and the comparison results are shown in table 2:

table 2 comparison results of the second set of experiments

/>

It is evident that in the first set of experiments, the target continuity assessment model in the present application performed better on the ConvAI2 dataset than GRADE, and not up or down compared to QuantiDCE. In the second set of experiments, the target consistency assessment model of the present application performs slightly weaker than QuantiDCE on the empathetic dialogues dataset, but the QuantiDCE model is a model architecture based on pre-training and fine-tuning implementation, so the QuantiDCE model still relies on manually-labeled training data.

Referring to fig. 13, the visual result of the coding feature output by the target continuity assessment model on the test set of dailydialog++ data set is shown, wherein a circle represents the coding feature of the test data belonging to the continuity level L1, a triangle represents the coding feature of the test data belonging to the continuity level L2, a rectangle represents the coding feature of the test data belonging to the continuity level L3, the coding feature is represented by two dimensions, the value range of dimension 1 is-1.0 to 1.0, and the value range of dimension 2 is-1.5 to 1.5. It can be seen that the coding features of the test data belonging to the consistency level L1 are concentrated in the lower left region, the coding features of the test data belonging to the consistency level L2 are concentrated in the right region, and the coding features of the test data belonging to the consistency level L2 are concentrated in the upper left region, and obviously, the target consistency evaluation model can be well distinguished from the coding features of the test data belonging to the three consistency categories.

Referring to fig. 14, the results of the visualization of the continuity evaluation value outputted by the target continuity evaluation model on the test set of the dailydialog++ data set are shown in the coordinate system of fig. 14, and the abscissa is L1, L2, and L3, where L1 represents the test data belonging to the continuity level L1, L2 represents the test data belonging to the continuity level L2, L3 represents the test data belonging to the continuity level L3, and the ordinate is the continuity evaluation value. It can be seen that the value range of the continuity evaluation value output by the target continuity evaluation model is mainly 0.3 to 0.7, the continuity evaluation value corresponding to the test data belonging to the continuity level L1 is approximately concentrated at 0.3 to 0.4, the continuity evaluation value corresponding to the test data belonging to the continuity level L2 is approximately concentrated at 0.5 to 0.6, the continuity evaluation value corresponding to the test data belonging to the continuity level L3 is approximately concentrated at 0.6 to 0.7, obviously, the continuity evaluation value output by the target continuity evaluation model not only well distinguishes the test data belonging to different continuity categories, but also is higher than the continuity evaluation value corresponding to the test data belonging to the continuity level L2, is obviously higher than the continuity evaluation value corresponding to the test data belonging to the continuity level L1, and the three continuity evaluation values can be arranged according to the same consistency category, and the consistency evaluation characteristics are low.

Based on the same inventive concept, the embodiment of the application provides a consistency evaluation model training device. As shown in fig. 15, which is a schematic structural diagram of a continuity evaluation model training device 1500, the continuity evaluation model at least includes an encoder and a predictor, and the continuity evaluation model training device includes:

the feature extraction unit 1501 is configured to input each extracted training sample into the encoder to perform feature extraction, so as to obtain corresponding coding features; wherein each training sample belongs to a plurality of consistency categories, each training sample comprises dialogue information and response information, and each consistency category represents a consistency between the corresponding dialogue information and the response information;

a contrast loss calculation unit 1502, configured to obtain contrast loss by performing classified coding comparison based on the obtained coding features and the plurality of consistency classes;

the sorting loss calculation unit 1503 is configured to input the coding features into the predictor respectively for performing a consistency evaluation, obtain corresponding consistency evaluation values, and perform a sorting comparison based on the obtained consistency evaluation values and the corresponding consistency categories thereof, so as to obtain a sorting loss;

A parameter adjustment unit 1504 for performing parameter adjustment on the encoder and the predictor based on the contrast loss and the ordering loss.

As a possible implementation manner, when the classifying code alignment is performed based on the obtained coding features and the plurality of continuity categories, and the contrast loss is obtained, the contrast loss calculation unit 1502 is specifically configured to:

classifying each training sample based on the coherence category to which each training sample belongs to, so as to obtain each similar sample and each different sample of each training sample;

and comparing the similarity of the coding features based on the obtained classification results to obtain the contrast sub-loss corresponding to each training sample, and obtaining the contrast loss based on each obtained contrast sub-loss.

As a possible implementation manner, when classifying the training samples based on the coherence class to which the training samples belong to obtain the same class sample and different class sample of the training samples, the contrast loss calculation unit 1502 is specifically configured to:

and screening training samples belonging to the category except the anchor point from the training samples based on the anchor point category, and taking the training samples as different samples corresponding to the anchor point sample.

As a possible implementation manner, when performing similarity comparison on each coding feature based on each obtained classification result to obtain the contrast sub-loss corresponding to each training sample, the contrast loss calculation unit 1502 is specifically configured to:

And taking the sum of the first similarity and the second similarity as a total similarity sum, respectively calculating the ratio between the first similarity and the total similarity, and obtaining the contrast sub-loss corresponding to the anchor point sample based on the calculated ratios.

As a possible implementation manner, the sorting and sorting comparison is performed based on the obtained continuity evaluation values and the corresponding continuity categories thereof, and when a sorting loss is obtained, the sorting loss calculating unit 1503 is specifically configured to:

based on the obtained continuity evaluation values and the corresponding continuity categories, obtaining a category total evaluation value corresponding to each of the plurality of continuity categories;

based on the obtained total evaluation values of the plurality of categories, obtaining category evaluation differences between every two adjacent continuity categories in the plurality of continuity categories according to a preset arrangement sequence among the plurality of continuity categories;

and respectively carrying out numerical adjustment on the obtained at least one category evaluation difference value based on a preset category interval parameter to obtain the sorting loss.

As a possible implementation manner, the ranking loss calculating unit 1503 is specifically configured to:

Based on preset category interval parameters, respectively carrying out numerical adjustment on the obtained at least one category evaluation difference value to obtain at least one candidate difference value;

and screening out candidate difference values with values larger than a preset value threshold value from the at least one candidate difference value, taking the screened at least one candidate difference value as at least one target difference value, and obtaining the sorting loss based on the at least one target difference value.

As a possible implementation manner, the consistency evaluation model to be trained further includes a classifier, and the apparatus further includes a classification loss calculation unit 1505, where the classification loss calculation unit 1505 is configured to:

inputting the coding features into the classifier for carrying out consistency class prediction to obtain corresponding classification results, and carrying out classification accuracy comparison based on the obtained classification results and the consistency classes to which the training samples belong respectively to obtain classification losses;

the parameter adjustment unit 1504 is specifically configured to, when performing parameter adjustment on the encoder and the predictor based on the contrast loss and the ordering loss:

and obtaining a model total loss based on the comparison loss, the sorting loss and the classification loss, and performing parameter adjustment on the encoder, the predictor and the classifier based on the model total loss.

As a possible implementation manner, the classification result of each coding feature includes a plurality of classification sub-results, each classification sub-result is used for indicating whether the corresponding training sample belongs to a consistency class, the classification accuracy evaluation is performed based on the obtained classification results and the consistency class to which the training samples belong, and the classification loss calculation unit 1505 is specifically configured to, when obtaining the classification loss:

comparing the respective multiple classification sub-results of each coding feature with the respective belonging coherence category to obtain respective classification sub-loss of each coding feature;

and carrying out loss statistics based on the respective classification sub-losses of the coding features to obtain classification losses.

Based on the same inventive concept, the embodiment of the application provides a consistency evaluation model training device. As shown in fig. 16, which is a schematic structural diagram of a continuity evaluation device 1600, includes:

a model construction unit 1601 for constructing an application model based on an encoder and a predictor included in a target continuity evaluation model, wherein the target continuity evaluation model is trained based on the method of any one of claims 1 to 8;

The feature extraction unit 1602 is configured to input the obtained dialogue to be evaluated and the obtained reply to be evaluated into an encoder of the application model for feature extraction, so as to obtain corresponding coding features;

a continuity evaluation unit 1603, configured to input the coding feature into a predictor of the application model for continuity evaluation, so as to obtain a target continuity evaluation value.

As a possible implementation, the feature extraction unit 1602 is further configured to:

responding to input operation of a target object in a target dialogue application, acquiring input information, and acquiring historical dialogue information of the target object, wherein the historical dialogue information comprises multiple rounds of dialogue data generated before the input information is input by the target object;

and determining context information corresponding to the input information based on the historical dialogue information, and taking the context information and the input information as the dialogue to be evaluated, wherein the context information is at least one round of dialogue data in the multiple rounds of dialogue data.

For convenience of description, the above parts are described as being functionally divided into modules (or units) respectively. Of course, the functions of each module (or unit) may be implemented in the same piece or pieces of software or hardware when implementing the present application.

The specific manner in which the respective units execute the requests in the apparatus of the above embodiment has been described in detail in the embodiment concerning the method, and will not be described in detail here.

In the embodiment of the application, in order to adapt to the data input of multiple consistency categories, on the characteristic level of model output, the coding characteristics of model output are subjected to constraint of contrast learning, positive samples are enhanced samples of anchor point samples and similar samples, and negative samples are other types of samples in the same batch. Because the comparison learning constraint features are only used for distinguishing the same type of samples from different types of samples, and the samples with high consistency degree and the samples with low consistency degree cannot be evaluated as high scores, a predictor for predicting the consistency evaluation value is added into the model, and the high scores and the low scores corresponding to the samples with different consistency are realized by constraining the sorting loss of the consistency evaluation value. Furthermore, considering that samples with different continuity have a certain fractional gap, an interval value can be added into the sorting loss, so that a certain interval exists between the samples with different continuity when the continuity evaluation values of the samples with different continuity are correctly sorted. Furthermore, by introducing a classifier and classification loss, accurate classification of samples with different continuity is realized.

Those skilled in the art will appreciate that the various aspects of the application may be implemented as a system, method, or program product. Accordingly, aspects of the application may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

Based on the same inventive concept, the embodiment of the application also provides electronic equipment. In one embodiment, the electronic device may be a server or a terminal device. Referring to fig. 17, which is a schematic structural diagram of one possible electronic device provided in an embodiment of the present application, in fig. 17, an electronic device 1700 includes: a processor 1710 and a memory 1720.

The memory 1720 stores a computer program executable by the processor 1710, and the processor 1710 can execute the steps of the model training method by executing the instructions stored in the memory 1720.

Memory 1720 may be a volatile memory (RAM), such as random-access memory (RAM); the Memory 1720 may also be a non-volatile Memory (non-volatile Memory), such as Read-Only Memory (ROM), flash Memory (flash Memory), hard disk (HDD) or Solid State Drive (SSD); or memory 1720, is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. Memory 1720 may also be a combination of the above.

The processor 1710 may include one or more central processing units (central processing unit, CPU) or a digital processing unit, or the like. Processor 1710, when executing the computer program stored in memory 1720, implements the model training method described above.

In some embodiments, processor 1710 and memory 1720 may be implemented on the same chip, or they may be implemented separately on separate chips in some embodiments.

The specific connection medium between the processor 1710 and the memory 1720 is not limited in this embodiment. In the embodiment of the present application, the processor 1710 and the memory 1720 are connected by a bus, which is depicted in fig. 17 by a thick line, and the connection manner between other components is only schematically illustrated, but not limited to. The buses may be divided into address buses, data buses, control buses, etc. For ease of description, only one thick line is depicted in fig. 17, but only one bus or one type of bus is not depicted.

Based on the same inventive concept, an embodiment of the present application provides a computer readable storage medium comprising a computer program for causing an electronic device to perform the steps of the above-described model training method when the computer program is run on the electronic device. In some possible embodiments, aspects of the model training method provided by the present application may also be implemented in the form of a program product comprising a computer program for causing an electronic device to perform the steps of the model training method described above, when the program product is run on the electronic device, e.g. the electronic device may perform the steps as shown in fig. 4.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, a RAM, a ROM, an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (Compact Disk Read Only Memory, CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product of embodiments of the present application may take the form of a CD-ROM and comprise a computer program and may run on an electronic device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a computer program for use by or in connection with a command execution system, apparatus, or device.

The readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave in which a readable computer program is embodied. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a computer program for use by or in connection with a command execution system, apparatus, or device.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method for training a consistency assessment model, wherein the consistency assessment model at least comprises an encoder and a predictor, the method comprising:

2. The method of claim 1, wherein said performing a classification code comparison based on the resulting code features and the plurality of coherence categories to obtain a contrast penalty comprises:

3. The method of claim 2, wherein classifying the training samples based on the respective assigned consistency class of the training samples to obtain respective homogeneous samples and respective heterogeneous samples of the training samples, comprises:

4. The method of claim 2, wherein the comparing the similarity between the coding features based on the obtained classification results to obtain the respective contrast sub-loss for each training sample comprises:

5. The method of any of claims 1-4, wherein said performing a classification ranking comparison based on each obtained continuity assessment value and its respective corresponding continuity category to obtain a ranking penalty comprises:

6. The method of claim 5, wherein the performing numerical adjustment on the obtained at least one category evaluation difference value based on the preset category interval parameter to obtain the ranking loss includes:

7. The method of any one of claims 1-4, wherein the consistency assessment model further comprises a classifier;

then before the model parameter adjustment, based on the contrast loss and the ordering loss, further comprises:

said parameter adjusting said encoder and said predictor based on said contrast loss and said ordering loss comprises:

8. The method of claim 7, wherein the classification result of each coding feature comprises a plurality of classification sub-results, each classification sub-result being used to indicate whether a corresponding training sample belongs to a consistency class, the classifying accuracy evaluation based on the obtained classification results and the consistency class to which the training samples belong, and the obtaining the classification loss comprises:

9. A method of continuity assessment, comprising:

constructing an application model based on an encoder and a predictor contained in a target continuity assessment model, wherein the target continuity assessment model is trained based on the method of any one of claims 1 to 8;

10. The method of claim 9, wherein the dialog under evaluation is obtained by:

11. A continuity assessment model training apparatus, wherein the continuity assessment model comprises at least an encoder and a predictor, the apparatus comprising:

12. A continuity evaluation device, comprising:

a model construction unit for constructing an application model based on an encoder and a predictor included in a target continuity evaluation model, wherein the target continuity evaluation model is trained based on the method of any one of claims 1 to 8;

13. An electronic device comprising a processor and a memory, wherein the memory stores a computer program which, when executed by the processor, causes the processor to perform the steps of the method of any one of claims 1 to 8 or the steps of the method of any one of claims 9 to 10.

14. A computer readable storage medium, characterized in that it comprises a computer program for causing an electronic device to perform the steps of the method of any one of claims 1-8 or the steps of the method of any one of claims 9-10 when said computer program is run on the electronic device.

15. A computer program product, characterized in that it comprises a computer program stored in a computer readable storage medium, from which computer readable storage medium a processor of an electronic device reads and executes the computer program, causing the electronic device to perform the steps of the method according to any one of claims 1-8 or to perform the steps of the method according to any one of claims 9-10.