CN116501592A

CN116501592A - Man-machine interaction data processing method and server

Info

Publication number: CN116501592A
Application number: CN202310721009.0A
Authority: CN
Inventors: 张一昌; 林俊旸; 周畅; 周靖人
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2023-06-19
Filing date: 2023-06-19
Publication date: 2023-07-28
Anticipated expiration: 2043-06-19
Also published as: CN116501592B

Abstract

The method comprises the steps of determining marked upper information of each round of input information based on multi-round interaction information of any session marked in advance, and outputting a response result of each round of input information according to each round of input information and the marked upper information through a human-computer interaction model, so that the upper information of the same input information in the multi-round interaction of each human-computer interaction model can be ensured to be consistent; according to the input information, the above information and the response result of each round in at least one session, the multi-round interaction capability evaluation information of the man-machine interaction model is determined, the multi-round interaction capability of the man-machine interaction model can be evaluated fairly by using unified pre-labeled multi-round interaction information, the multi-round interaction capability evaluation information of the man-machine interaction model is obtained, the on-line judgment of the man-machine interaction model can be guided, or the optimized version of the man-machine interaction model can be updated, and the man-machine interaction quality of the man-machine interaction model can be improved.

Description

Man-machine interaction data processing method and server

Technical Field

The present disclosure relates to computer technology, and in particular, to a data processing method and a server for man-machine interaction.

Background

Natural language is an important carrier of human logic and thinking, and has great significance in the field of man-machine interaction and even general artificial intelligence. However, because of the complexity and ambiguity of natural language, there has been a lack of machine facilities that are directly oriented to unconstrained natural language.

With the development of artificial intelligence, a large model is widely applied to man-machine interaction in the field of natural language processing. Large models refer to large scale deep learning models, such as large scale language models, multimodal models, etc., having large scale model parameters, typically comprising hundreds of millions, billions, trillions, and even more than billions of model parameters.

The multi-round interaction capability is an important value capability of the current man-machine interaction models, and how to objectively evaluate the multi-round interaction capability of the man-machine interaction models is very important. For the same input instruction, as the results produced by different human-computer interaction models are different, a set of pre-prepared instructions are difficult to integrate when evaluating the multi-round interaction capability of the human-computer interaction model and the human, so that the multi-round interaction capability of different human-computer interaction models is difficult to fairly evaluate, the quality model with better multi-round interaction capability is not favorable for selecting the high-quality model with better multi-round interaction capability in model iteration, and the quality of multi-round interaction of the online model is not favorable for controlling, so that the human-computer interaction quality is poor.

Disclosure of Invention

The application provides a data processing method and a server for man-machine interaction, which are used for solving the problems that the quality of man-machine interaction is poor because the multi-round interaction capability of different man-machine interaction models cannot be evaluated fairly, a high-quality model with better multi-round interaction capability is not favorable for model iteration selection, and the quality of multi-round interaction by an online model is not favorable for control.

In a first aspect, the present application provides a method for processing data of human-computer interaction, including: acquiring the multi-round interaction information of at least one pre-labeled session, wherein the multi-round interaction information of one session comprises multi-round input information and pre-labeled response information of each round of input information; determining the upper information of each round of input information in a conversation according to the multi-round input information of any one time of conversation which is marked in advance and the response information which corresponds to each round of input information, inputting the upper information of each round of input information and the input information in the conversation into a man-machine interaction model, and outputting the response result of each round of input information through the man-machine interaction model; determining evaluation information of the multi-round interaction capability of the man-machine interaction model according to the input information of each round, the above information of the input information of each round and the response result in the at least one session; and outputting evaluation information of the multi-round interaction capability of the man-machine interaction model.

In a second aspect, the present application provides a method for processing data of human-computer interaction, applied to a server, including: receiving multiple rounds of interactive capability assessment requests sent by the terminal side equipment for multiple language models; acquiring multi-round interaction information of at least one pre-labeled session and multi-round interaction capability covered by each session, wherein the multi-round interaction information comprises multi-round input information and pre-labeled response information of each round of input information; determining the upper information of each round of input information in a conversation according to the multi-round input information of any one-time conversation which is marked in advance and the response information of each round of input information, inputting each round of input information and the upper information of the input information in the conversation into each language model, and outputting the response result of each round of input information through each language model; determining evaluation information of the multiple language models in various multiple interactive capability dimensions according to the input information of each round, the above information of the input information of each round, the response result of the input information of each round output by each language model in the at least one session and the multiple interactive capability covered by each session; and outputting evaluation information of the multiple language models in various multi-round interaction capability dimensions to the end-side equipment.

In a third aspect, the present application provides a data processing method for man-machine interaction, applied to a server, including: acquiring an evaluation model to be tested provided by end-side equipment; responding to a multi-round interaction capability assessment request sent by end side equipment to the to-be-assessed model, acquiring pre-labeled multi-round interaction information of at least one session, wherein the multi-round interaction information of one session comprises multi-round input information and pre-labeled response information of the input information; determining the upper information of each round of input information in a conversation according to the multi-round input information of any one time of conversation which is marked in advance and the response information of the input information which is marked in advance, inputting the upper information of each round of input information and the input information in the conversation into the evaluation model to be tested, and outputting the response result of each round of input information through the evaluation model to be tested; determining evaluation information of the multi-round interaction capability of the to-be-evaluated model according to the input information of each round, the above information of the input information of each round and the response result in the at least one session; and outputting evaluation information of the multi-round interaction capability of the to-be-evaluated model to the end-side equipment.

In a fourth aspect, the present application provides a server comprising: a processor, and a memory communicatively coupled to the processor; the memory stores computer-executable instructions; the processor executes computer-executable instructions stored in the memory to implement the method as described in any one of the preceding aspects.

According to the man-machine interaction data processing method and the server, the multi-round interaction information of at least one pre-labeled session is obtained, and the multi-round interaction information of one session comprises multi-round input information and pre-labeled response information of each round of input information; determining the upper information of each round of input information in the conversation according to the pre-marked multi-round interaction information of any one conversation, inputting each round of input information and the upper information of the input information in the conversation into a human-computer interaction model, and outputting a response result of each round of input information through the human-computer interaction model, so that when the pre-marked multi-round interaction information of any one conversation is applied to evaluating a plurality of different human-computer interaction models, the historical upper information of the same input information in the multi-round interaction of the conversation of each human-computer interaction model is ensured to be completely consistent; further, according to the input information of each round, the above information of the input information of each round and the response result in at least one session, the evaluation information of the multi-round interaction capability of the man-machine interaction model is determined, unified pre-labeled multi-round interaction information can be used for fairly and objectively evaluating the multi-round interaction capability of different man-machine interaction models, the evaluation information of the multi-round interaction capability of the man-machine interaction model is output, the evaluation information is used for guiding the online judgment of the man-machine interaction model or updating the optimized version of the man-machine interaction model, a high-quality model can be accurately selected in man-machine interaction model iteration, the multi-round interaction quality of the man-machine interaction model obtained by iteration updating is improved, and the multi-round interaction quality of the online model is improved, so that the quality of multi-round interaction in the man-machine interaction is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

FIG. 1 is a schematic diagram of an exemplary system architecture to which the present application is applicable;

FIG. 2 is a flowchart of a method for processing data of human-computer interaction according to an exemplary embodiment of the present application;

FIG. 3 is an exemplary diagram of a front-end interactive interface provided in an exemplary embodiment of the present application;

FIG. 4 is a flowchart of a method for processing human-computer interaction data according to another exemplary embodiment of the present application;

FIG. 5 is an interaction flow chart of a data processing method of human-computer interaction according to an exemplary embodiment of the present application;

fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application.

Specific embodiments thereof have been shown by way of example in the drawings and will herein be described in more detail. These drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but to illustrate the concepts of the present application to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.

The terms referred to in this application are explained first:

a session: the term computer refers to a process of communicating between an end user and a man-machine interaction system, for example, a session is performed from when the user enters the man-machine interaction system to when the user pushes out the man-machine interaction system to finish man-machine interaction. During a session, the user inputs an instruction, the man-machine interaction system gives a reply to the instruction, which is a round of dialogue, and one or more rounds of dialogue between the user and the man-machine interaction system can be included in a session.

Visual question-answering task: from the input image and the question, an answer to the question is determined from visual information of the input image.

Image description task: descriptive text of the input image is generated.

Visual implication task: the semantic relativity of the input image and the text, namely implication, neutrality or contradiction, is predicted.

Refer to the expression and understanding task: and positioning an image area corresponding to the input text in the input image according to the input text.

Image generation tasks: an image is generated based on the entered descriptive text.

Text-based emotion classification tasks: emotion classification information of the input text is predicted.

Text summarization task: summary information of the input text is generated.

Multimodal tasks: the input/output data refers to downstream tasks of various modal data such as images, texts and the like, such as a visual question-answering task, an image description task, a visual implication task, a presentation and understanding task, an image generation task and the like.

Multimodal pre-training model: the method is characterized in that the input and output data relates to a pre-training model of various modal data such as images, texts and the like, and the pre-training model can be applied to multi-modal task processing after fine-tuning training.

Large models refer to deep-learning models with large-scale model parameters, typically comprising hundreds of millions, billions, trillions, or even more than billions of model parameters. The large Model can be called as a Foundation Model, a training Model is performed by using a large-scale unlabeled corpus, a pre-training Model with more than one hundred million parameters is produced, the Model can adapt to a wide downstream task, and the Model has better generalization capability, such as a large-scale language Model (Large Language Model, LLM), a multi-mode pre-training Model and the like.

When the large model is actually applied, the pretrained model can be applied to different tasks by only slightly adjusting a small number of samples, the large model can be widely applied to the fields of natural language processing (Natural Language Processing, NLP for short), computer vision and the like, and particularly can be applied to the tasks of the computer vision fields such as visual question and answer (Visual Question Answering, VQA for short), image description (IC for short), image generation and the like, and the tasks of the natural language processing fields such as emotion classification based on texts, text abstract generation, machine translation and the like, and main application scenes of the large model comprise digital assistants, intelligent robots, searching, online education, office software, electronic commerce, intelligent design and the like.

When applied to a human-computer interaction scene (such as an intelligent robot), the large model generates a reply based on instructions input by a user, which is also called a human-computer interaction model. In the iteration process of the man-machine interaction model, the advantages and disadvantages of the man-machine interaction models of different versions need to be evaluated so as to realize the iteration update of the man-machine interaction model. Before the human-computer interaction model is online, whether the performance of the human-computer interaction model meets the online requirement or not needs to be evaluated, and the human-computer interaction model with excellent online performance is used for avoiding the human-computer interaction model with poor online performance. The multi-round interaction capability is an important value capability of the current man-machine interaction models, and how to objectively evaluate the multi-round interaction capability of the man-machine interaction models is very important. For the same input instruction, as the results produced by different human-computer interaction models are different, a set of pre-prepared instructions are difficult to integrate when evaluating the multi-round interaction capability of the human-computer interaction model and the human, so that the multi-round interaction capability of different human-computer interaction models is difficult to fairly and objectively evaluate, the quality model with better multi-round interaction capability is not favorable for selecting in model iteration, and the quality of multi-round interaction by controlling an online model is not favorable, and the human-computer interaction quality is poor.

The application provides a man-machine interaction data processing method, which comprises the steps of obtaining multi-round interaction information of at least one session marked in advance, wherein the multi-round interaction information comprises multi-round input information and pre-marked response information of each round of input information; according to the pre-labeled multi-round input information of any one time session and the pre-labeled response information of each round of input information, determining the above information of each round of input information in the session, inputting each round of input information and the above information of each round of input information in the session into a human-computer interaction model, and outputting the response result of each round of input information through the human-computer interaction model, so that when the pre-labeled multi-round interaction information of any one time session is applied to evaluating a plurality of different human-computer interaction models, the historical above information of the same input information in the multi-round interaction of each human-computer interaction model in one time session is ensured to be completely consistent; further, according to the input information of each round, the above information of the input information of each round and the response result in at least one session, the evaluation information of the multi-round interaction capability of the man-machine interaction model is determined, unified pre-labeled multi-round interaction information can be used for fairly and objectively evaluating the multi-round interaction capability of different man-machine interaction models, the evaluation information of the multi-round interaction capability of the man-machine interaction model is output, the evaluation information is used for guiding the online judgment of the man-machine interaction model or updating the optimized version of the man-machine interaction model, a high-quality model can be accurately selected in man-machine interaction model iteration, the multi-round interaction quality of the man-machine interaction model obtained by iteration updating is improved, and the multi-round interaction quality of the online model is improved, so that the quality of multi-round interaction in the man-machine interaction is improved.

The human-computer interaction model in the application can be a large-scale pre-training language model, a multi-mode pre-training model and the like, and is not particularly limited herein.

FIG. 1 is a schematic diagram of an example system architecture to which the present application is applicable. As shown in fig. 1, the system architecture includes a first server responsible for evaluating multi-round interactive capabilities of a human-computer interaction model, a second server running the human-computer interaction model, and end-side devices. The first server and the second server are provided with a communication link capable of communicating, and communication connection between the first server and the second server can be achieved. The first server and the end-side device are provided with a communication link which can be communicated, so that communication connection between the first server and the end-side device can be realized.

The second server may be a server cluster deployed in the cloud, or a device with computing capability locally. The second server is in charge of operating the man-machine interaction model to be evaluated, and outputting a response result based on given input information. One or more man-machine interaction models can be deployed on one second server, and for a plurality of man-machine interaction models to be evaluated, one or more second servers can be deployed.

The terminal side device is an electronic device used by a user, and specifically may be a hardware device having a network communication function, an operation function and an information display function, including but not limited to a smart phone, a tablet computer, a desktop computer, a server, and the like. And the user sends a human-computer interaction model evaluation request to the first server through the terminal side equipment, wherein the evaluation request contains information of one or more human-computer interaction models to be evaluated.

The first server may be a cluster of servers deployed in the cloud or a device with computing capabilities locally. The first server is responsible for evaluating the multi-round interaction capability of the human-computer interaction model, generating evaluation information of the multi-round interaction capability of the human-computer interaction model, guiding the online judgment of the human-computer interaction model, updating the optimized version of the human-computer interaction model, or selecting a high-quality human-computer interaction model with strong multi-round interaction capability.

In an example scenario, before a human-computer interaction model of human-computer interaction is online, a user sends an evaluation request of multiple rounds of interaction capability of the human-computer interaction model to be online to a first server through an end-side device, wherein the evaluation request contains relevant information of the human-computer interaction model to be evaluated, such as an application program interface for calling the human-computer interaction model, an access address of the human-computer interaction model and the like. The method comprises the steps that a first server responds to an evaluation request to obtain multi-round interaction information of at least one pre-labeled session, the multi-round interaction information comprises multi-round input information and pre-labeled response information of each round of input information, the upper information of each round of input information in the session is determined according to the multi-round input information of any pre-labeled session and the pre-labeled response information of each round of input information, the upper information of each round of input information and the upper information of the input information in the session are input into a human-computer interaction model, response results of each round of input information are output through the human-computer interaction model, and evaluation information of multi-round interaction capability of the human-computer interaction model is determined according to the upper information and the response results of each round of input information in the at least one session, so that accurate evaluation of the multi-round interaction capability of the human-computer interaction model is achieved.

Further, the evaluation information of the multi-round interaction capability of the man-machine interaction model can be used for guiding the online judgment of the man-machine interaction model. Optionally, the first server determines whether the man-machine interaction model meets the on-line condition according to the evaluation information of the multi-round interaction capability of the man-machine interaction model, and outputs on-line prompt information of the man-machine interaction model, wherein the on-line prompt information indicates whether the man-machine interaction model meets the on-line condition. Optionally, the first server sends evaluation information of the multi-round interaction capability of the man-machine interaction model to the end-side device. The terminal side equipment outputs evaluation information of the multi-round interaction capability of the man-machine interaction model so as to guide a user to judge whether the man-machine interaction model meets the online condition; or, the terminal side equipment determines whether the human-computer interaction model meets the online condition according to the evaluation information of the multi-round interaction capability of the human-computer interaction model, and outputs the online prompt information of the human-computer interaction model, wherein the online prompt information indicates whether the human-computer interaction model meets the online condition. Optionally, the terminal side device may further output evaluation information of multiple rounds of interaction capability of the human-computer interaction model.

In another example scenario, the new version obtained is evaluated during an iterative optimization process of the human-computer interaction model. A user can send an evaluation request of the multi-round interaction capability of the new-version human-computer interaction model to the first server through the terminal side equipment, wherein the evaluation request contains relevant information of the new-version human-computer interaction model, such as an application program interface for calling the human-computer interaction model, an access address of the human-computer interaction model and the like. The first server responds to the evaluation request to acquire the multi-round interaction information of at least one pre-labeled session, wherein the multi-round interaction information comprises multi-round input information and pre-labeled response information of each round of input information; based on the related information of the new version of human-computer interaction model, determining the upper information of each round of input information in the current session according to the pre-labeled multi-round interaction information of any one time session, inputting each round of input information and the upper information of the input information in the current session into the new version of human-computer interaction model, and outputting the response result of each round of input information through the new version of human-computer interaction model; and determining evaluation information of the multi-round interaction capability of the new-version man-machine interaction model according to the input information of each round, the above information of the input information of each round and the response result in at least one session so as to accurately evaluate the multi-round interaction capability of the new-version man-machine interaction model.

Further, the evaluation information of the multi-round interaction capability of the new-version man-machine interaction model can be used for guiding the updating of the optimized version of the man-machine interaction model. Optionally, the first server compares the evaluation information of the multiple-round interaction capability of the new version and the previous version according to the evaluation information of the multiple-round interaction capability of the new version of the human-computer interaction model and the evaluation information of the multiple-round interaction capability of the previous version to obtain a comparison result, and the comparison result is used for guiding updating of the optimized version of the human-computer interaction model. Specifically, the first server may send the comparison result to the end-side device. And the terminal side equipment outputs comparison results of evaluation information of the multi-round interaction capability of the man-machine interaction models of different versions so as to guide a user to select an optimized version with stronger multi-round interaction capability to carry out iterative updating of the man-machine interaction model.

In another example scenario, a user may select a human-computer interaction model with stronger multi-round interaction capability for human-computer interaction based on evaluation information of multi-round interaction capabilities of a plurality of human-computer interaction models to be selected, so as to improve human-computer interaction quality. A user can send evaluation requests of multi-round interaction capability of a plurality of man-machine interaction models to a first server through end-side equipment, wherein the evaluation requests comprise relevant information of the plurality of man-machine interaction models, such as an application program interface for calling the man-machine interaction models, an access address of the man-machine interaction models and the like. The first server responds to the evaluation request to acquire the multi-round interaction information of at least one pre-labeled session, wherein the multi-round interaction information comprises multi-round input information and pre-labeled response information of each round of input information; determining the upper information of each round of input information in the current session according to the pre-labeled multi-round interaction information of any one time session, inputting each round of input information and the upper information of the input information in the current session into each human-computer interaction model, and outputting the response result of each round of input information through each human-computer interaction model; and determining evaluation information of the multi-round interaction capability of each man-machine interaction model according to the input information of each round, the above information of the input information of each round and the response result of the input information of each round output by each man-machine interaction model in at least one session so as to fairly and accurately evaluate the multi-round interaction capability of each man-machine interaction model.

Further, the first server compares the evaluation information of the multi-round interaction capability of each human-computer interaction model to obtain a comparison result of the evaluation information of the multi-round interaction capability of each human-computer interaction model. And the first server sends comparison results of the multi-round interaction capability assessment information of each man-machine interaction model to the terminal side equipment. The terminal side equipment outputs a comparison result to guide a user to select a man-machine interaction model with stronger multi-round interaction capability as the man-machine interaction model selected by the user. Optionally, the terminal device may select a human-computer interaction model with stronger multiple rounds of interaction capability based on the comparison result of the multiple rounds of interaction capability assessment information of each human-computer interaction model, and download and acquire the human-computer interaction model according to the related information of the selected human-computer interaction model, or use the human-computer interaction model to implement human-computer interaction.

The following describes the technical solutions of the present application and how the technical solutions of the present application solve the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 2 is a flowchart of a data processing method of man-machine interaction according to an exemplary embodiment of the present application. The execution body of the embodiment is the first server in the system architecture. As shown in fig. 2, the method specifically comprises the following steps:

step 201, multi-round interaction information of at least one pre-labeled session is obtained, wherein the multi-round interaction information of one session comprises multi-round input information and pre-labeled response information of each round of input information.

The multi-round interaction information of one session comprises multi-round input information (user instruction) and response information of each round of input information. In this embodiment, multiple rounds of interaction information of at least one session of man-machine interaction may be collected or manually constructed. The response information of the input information in the labeled multi-round interaction information is correct/standard response information of the input information by labeling the response information of the input information in the multi-round interaction information of at least one session.

An example of pre-noted multi-pass interaction information for a session (denoted as example 1) is illustratively as follows:

inputting information: what is the largest animal on earth?

Response information: blue whale, which is a marine mammal of the genus Cephalotaceae, can be up to 33 meters long and weigh more than 180 tons.

Inputting information: what is its food?

Response information: blue whales feed mainly on small crustaceans and small fish, sometimes squid.

Inputting information: where is the habitat?

Response information: blue whales are distributed worldwide, with the number of south pole sea areas being the greatest.

Inputting information: not counting this question, i ask a total of several questions?

Response information: the 3 questions are "what is the largest animal on earth? "," what is its food? "," where is the habitat? "

The multi-round interaction information in this example includes 4 rounds of conversations, i.e., including 4 rounds of input information, and pre-labeled response information for each round of input information.

In the follow-up step, the multi-round interaction capability of one or more man-machine interaction models is evaluated by utilizing the pre-labeled multi-round interaction information of at least one session, so as to obtain the evaluation information of the multi-round interaction capability of the man-machine interaction models.

Step S202, according to the pre-labeled multi-round input information of any one time of conversation and the pre-labeled response information corresponding to each round of input information, determining the upper information of each round of input information in the conversation, inputting each round of input information and the upper information of the input information in the conversation into a man-machine interaction model, and outputting the response result of each round of input information through the man-machine interaction model.

In the step, with the aid of the pre-labeled multi-round interaction information of each session, for any round of input information in the session, the upper information of the round of input information is determined based on the multi-round interaction information of the session, the round of input information and the upper information are input into a human-computer interaction model to be evaluated, and a response result of the round of input information is output through the human-computer interaction model.

In this embodiment, since the response information in the multi-round interaction information of each session is labeled in advance, based on the multi-round interaction information of any one session labeled in advance, for any round of input information, the context information of the input information may be determined, and the response information included in the context information is the accurate/standard response labeled in advance, and the context information may be understood as the context information labeled in advance. When the method is applied to evaluating a plurality of different man-machine interaction models, the above information of the same input information is fixed and pre-marked, so that the historical above information of the same input information in the multi-round interaction of the current session of each man-machine interaction model can be ensured to be completely consistent, and the unified pre-marked multi-round interaction information can be used for fairly and objectively evaluating the multi-round interaction capability of different man-machine interaction models.

In an optional embodiment, in this step, the first server may use, as the above information of the input information of the present round, the input information before the present round and the pre-labeled response information of the input information for the input information of any round according to the multiple-round interaction information of any one session. For the multi-round interaction information of each session, the 1 st round of input information does not have the above information, and the above information of the 1 st round of input information can be expressed as null.

For example, taking example 1 in the aforementioned step S201 as an example, the 1 st round of input information "what is the largest animal on earth? "there is no context information, i.e., the context information of the input information of round 1 is null. Round 2 of example 1 inputs information "what is its food? The above information of "includes round 1 input information and response information of round 1 input information. The 3 rd round of input information "where is the habitat? The above information of "includes the 1 st round input information, the response information of the 1 st round input information, the 2 nd round input information, and the response information of the 2 nd round input information. The 4 th round of input information in example 1 "do not account for this question, i ask several questions in total? The above information of "includes the 1 st round input information, the response information of the 1 st round input information, the 2 nd round input information, the response information of the 2 nd round input information, the 3 rd round input information, and the response information of the 3 rd round input information.

In another optional embodiment, the process of multiple rounds of interaction can be simulated according to the pre-labeled multiple rounds of interaction information of any one session, and the following interaction processing is sequentially performed on the input information of each round in the session: taking the context information of the current session as the context information of the input information of the current round, wherein the context information of the current session comprises the history interaction information of the current session; inputting the input information of the current round and the context information of the current session into a man-machine interaction model, outputting a response result of the input information of the current round through the man-machine interaction model, and updating the context information of the current session; and replacing the response result of the input information of the current round in the updated context information of the current session with the pre-marked response information of the input information of the current round.

In the embodiment, the multiple rounds of interaction of each session is executed through the human-computer interaction model, and the response result of the input information in the context information used by each round is forcedly corrected to be the pre-labeled response information of the input information, so that the consistency of the input information of each human-computer interaction model in each round of session of the session can be ensured, and the unified pre-labeled multiple rounds of interaction information can be used for fairly and objectively evaluating the multiple rounds of interaction capability of different human-computer interaction models.

In this embodiment, the human-computer interaction model is configured to generate and output a response result based on input information given by a user, so as to implement human-computer interaction. The man-machine interaction model can be used as an evaluation object to be tested, particularly can be a large-scale pre-training language model, a multi-mode pre-training model and the like, is particularly applied to the fields of Natural Language Processing (NLP), computer vision and the like, can be particularly applied to tasks in the field of intersection of NLP and computer vision such as visual question-answering (VQA), image description (IC), visual implication (VE), expression and understanding (REC) and the like, and can be applied to the tasks in the field of natural language processing such as emotion classification tasks based on texts, text abstract tasks and the like, and can be applied to various application scenes such as digital assistants, intelligent robots, searching, online education, office software, electronic commerce, intelligent design and the like.

Optionally, the first server may acquire an application program interface/man-machine interaction model service interface of the man-machine interaction model, input the input information and the above information into the man-machine interaction model by calling the application program interface/man-machine interaction model service interface of the man-machine interaction model, and receive a response result output by the man-machine interaction model. Optionally, the first server may obtain access address information of the human-computer interaction model, and the first server sends a reply generation request to the second server where the human-computer interaction model is located, where the request includes input information to be replied and the above information. The second server responds to the reply generation request, inputs the input information and the above information into the human-computer interaction model, acquires a response result of the input information output by the human-computer interaction model, and sends the response result to the first server.

Step S203, according to the input information of each round, the above information of the input information of each round and the response result in at least one session, determining the evaluation information of the multi-round interaction capability of the man-machine interaction model.

In this embodiment, based on the input information of each round in each session, the marked above information of the round input information, and the response result of the round input information output by the man-machine interaction model, the evaluation man-machine interaction model realizes the response quality of the response result in the multi-round interaction process, realizes the evaluation of the multi-round interaction capability of the man-machine interaction model, and obtains the evaluation information of the multi-round interaction capability of the man-machine interaction model.

And S204, outputting evaluation information of the multi-round interaction capability of the man-machine interaction model.

In this embodiment, after obtaining the evaluation information of the multi-round interaction capability of the man-machine interaction model, the first server performs visual output on the evaluation information of the multi-round interaction capability of the man-machine interaction model, so as to output the evaluation result of the multi-round interaction capability of the man-machine interaction model to the user, where the evaluation information of the multi-round interaction capability of the man-machine interaction model can instruct the user to make a determination whether the man-machine interaction model is on line; or, determining a high-quality man-machine interaction model version by comparing evaluation information of multi-round interaction capability of a plurality of man-machine interaction model versions, and performing iterative optimization on the man-machine interaction model; or, by comparing the evaluation information of the multi-round interaction capability of the man-machine interaction models, selecting a target man-machine interaction model with stronger multi-round interaction capability as the man-machine interaction model for realizing man-machine interaction.

For example, the first server may select one of the man-machine interaction models as the target man-machine interaction model according to the evaluation information of the multi-round interaction abilities of the plurality of man-machine interaction models, and output the information of the target man-machine interaction model to the end-side device. For example, the first server may select one of the versions as the optimized version according to the evaluation information of the multi-round interaction capability of the man-machine interaction models of the plurality of different versions, and update the optimized version of the man-machine interaction model.

The first server may determine whether the human-computer interaction model meets the online condition according to the evaluation information of the multiple-round interaction capability of the human-computer interaction model after obtaining the evaluation information of the multiple-round interaction capability of the human-computer interaction model; and outputting on-line prompt information of the man-machine interaction model and/or evaluation information of multi-round interaction capability of the man-machine interaction model. Wherein the online prompt information indicates whether the human-computer interaction model meets an online condition.

The online condition comprises a first threshold value of evaluation information of multi-round interaction capability of the human-computer interaction model, if the evaluation information of the multi-round interaction capability of the human-computer interaction model is larger than or equal to the first threshold value, the human-computer interaction model meets the online condition, otherwise, the human-computer interaction model does not meet the online condition. The first threshold in the online condition can be configured by a user in a customized way according to the needs of the specific application scene.

According to the scheme of the embodiment, the multi-round interaction information of at least one pre-labeled session is obtained, wherein the multi-round interaction information comprises multi-round input information and pre-labeled response information of each round of input information; determining the upper information of each round of input information in the conversation according to the pre-labeled multi-round input information of any one-time conversation and the pre-labeled response information of each round of input information, inputting the upper information of each round of input information and the upper information of the input information in the conversation into a human-computer interaction model, outputting the response result of each round of input information through the human-computer interaction model, and ensuring that the historical upper information of the same input information of each human-computer interaction model in the multi-round interaction of the conversation is completely consistent when the pre-labeled multi-round interaction information of any one-time conversation is applied to evaluating a plurality of different human-computer interaction models; further, according to the input information of each round, the above information of the input information of each round and the response result in at least one session, the evaluation information of the multi-round interaction capability of the man-machine interaction model is determined, unified pre-labeled multi-round interaction information can be used for fairly and objectively evaluating the multi-round interaction capability of different man-machine interaction models, the evaluation information of the multi-round interaction capability of the man-machine interaction model is output, the multi-round interaction capability evaluation information of the man-machine interaction model is used for guiding online judgment of the man-machine interaction model, updating the optimized version of the man-machine interaction model or selecting a target man-machine interaction model for use in man-machine interaction, a high-quality model can be accurately selected in man-machine interaction model iteration, and the multi-round interaction quality of the man-machine interaction model obtained by iteration update is improved, so that the quality of multi-round interaction in man-machine interaction is improved.

In an alternative embodiment, the human-computer interaction model to be evaluated may be provided by a third party to the first server through the end-side device. The first server acquires an evaluation model to be tested provided by the terminal side equipment. For example, the third party may upload the rating model to be tested to the first server through the end-side device, and the first server may deploy the rating model to be tested to the second server or the first server. The third party may send a download link of the model to be evaluated to the first server through the end-side device, and the first server obtains the model to be evaluated through downloading the download link, and deploys the model to be evaluated to the second server or the first server. The third party may send an Application Program Interface (API) or access address of the model under evaluation to the first server, for example, via the end-side device. The first server inputs the input information and the above information into the evaluation model to be tested through an Application Program Interface (API) or an access address, and receives a response result output by the evaluation model to be tested.

Further, the first server evaluates the multi-round interaction capability of the to-be-evaluated model through the method of steps S201-S204, and obtains evaluation information of the multi-round interaction capability of the to-be-evaluated model. The first server may output evaluation information of multi-round interaction capability of the evaluation model to be tested to the end-side device. Further, the user can be guided to make a judgment whether the man-machine interaction model is on line or not based on the evaluation information of the multi-round interaction capability of the evaluation model to be tested; or, determining a high-quality man-machine interaction model version by comparing evaluation information of multi-round interaction capability of a plurality of man-machine interaction model versions, and performing iterative optimization on the man-machine interaction model; or, by comparing the evaluation information of the multi-round interaction capability of the multiple man-machine interaction models, selecting the target man-machine interaction model with stronger multi-round interaction capability as the man-machine interaction model for realizing man-machine interaction, and referring specifically to the related content of step S204, the description is omitted here.

Illustratively, fig. 3 is an exemplary diagram of a front-end interactive interface provided in an exemplary embodiment of the present application, where, as shown in fig. 3, the front-end interactive interface may provide an input area of an evaluation model to be tested. The input area can be used for providing the to-be-tested evaluation model for the first server through modes of uploading the model, inputting a model calling interface, inputting a model downloading address and the like. After the first server completes the evaluation of the multi-round interaction capability of the to-be-evaluated model, the evaluation information of the multi-round interaction capability of the to-be-evaluated model can be output through the interaction interface. In a scene of judging whether the model can be online, online prompt information of the evaluation model to be tested can be output. In a scene of comparing a plurality of models, a comparison result of evaluation information of multi-round interaction capability of the plurality of models can be output or a high-quality target model can be recommended.

Fig. 4 is a flowchart of a data processing method of man-machine interaction according to another exemplary embodiment of the present application. On the basis of any one of the foregoing embodiments, a data processing method of man-machine interaction in this embodiment is described in detail. As shown in fig. 4, the method in this embodiment specifically includes the following steps:

step S401, multi-round interaction information of at least one session marked in advance is obtained, wherein the multi-round interaction information comprises multi-round input information and pre-marked response information of each round of input information.

This step is consistent with the specific implementation of step S201, and the details of the foregoing embodiment are referred to in detail, which is not described herein.

After the multi-round interaction information of at least one pre-labeled session is obtained, through steps S402-S406, according to the multi-round input information of any one pre-labeled session and the pre-labeled response information of each round of input information, the following interaction processing is sequentially performed on each round of input information in the session: taking the context information of the session as the context information of the input information of the present round, wherein the context information of the session comprises the history interaction information of the session; inputting the input information of the round and the context information of the conversation into a man-machine interaction model, and outputting a response result of the input information of the round through the man-machine interaction model; updating the context information of the session, replacing the response result of the input information of the current round in the updated context information of the session with the pre-marked response information of the input information of the current round, so as to determine the context information of the input information of each round in the session according to the pre-marked multi-round interaction information of any one time of the session in the step 202, inputting the input information of each round and the context information of the input information in the session into a man-machine interaction model, and outputting the response result of the input information of each round through the man-machine interaction model.

Step S402, sequentially taking the pre-labeled multi-round interaction information of each session as the multi-round interaction information of the current session.

In this embodiment, for the multi-round interaction information of at least one session that is labeled in advance, the multi-round interaction information of each session is sequentially used as the multi-round interaction information of the present session, steps S403 to S406 are executed based on the multi-round interaction information of the present session, the multi-round interaction of the present session is executed by using the man-machine interaction model to be evaluated, and the response result of each round of input information is output based on each round of input information and the pre-labeled above information of each round of input information.

Step S403, each round of input information in the current session is used as the input information of the current round in turn.

For multiple rounds of dialogue information (including input information and marked response information) in multiple rounds of interaction information of the current session, each round of input information in the current session is sequentially used as the current round of input information, and interaction processing in steps S404-S406 is executed based on the current round of input information by using a human-computer interaction model to be evaluated so as to generate a response result of the current round of input information based on the current round of input information and pre-marked above information.

Step S404, the context information of the current session is used as the context information of the input information of the current round, wherein the context information of the current session comprises the history interaction information of the current session.

In a multi-round interactive session, after outputting a response of any round of input information, the context information of the session is updated and stored. The context information of the current session records the history interaction information of the current session, and specifically comprises input information of one or more rounds processed in the current session and response results of the input information of each round. For any session, the context information for the session at the beginning is empty.

Step S405, inputting the input information of the current round and the context information of the current session into a man-machine interaction model, outputting a response result of the input information of the current round through the man-machine interaction model, and updating the context information of the current session.

In this embodiment, for the input information of the present round (the input information to be processed next), the stored current context information of the present session is used as the context information of the input information of the present round, and the input information of the present round and the context information of the present session are input into the man-machine interaction model to be evaluated. And the man-machine interaction model generates and outputs a response result of the input information of the round according to the input information and the context information.

The human-computer interaction model is used for generating and outputting a response result based on input information given by a user, so that human-computer interaction is realized. The human-computer interaction model can be specifically various language models, multi-mode pre-training models and the like as the evaluation object to be tested, and is not specifically limited herein.

Optionally, the first server may acquire an application program interface/a man-machine interaction model service interface of the man-machine interaction model, input the input information and the context information of the current session into the man-machine interaction model by calling the application program interface/the man-machine interaction model service interface of the man-machine interaction model, and receive a response result output by the man-machine interaction model.

Optionally, the first server may obtain access address information of the man-machine interaction model, and the first server sends a reply generation request to the second server where the man-machine interaction model is located, where the request includes input information to be replied and context information of the current session. The second server responds to the reply generation request, inputs the input information and the context information of the current session into the man-machine interaction model, acquires a response result of the input information output by the man-machine interaction model, and sends the response result to the first server.

Further, according to the response result of the input information of the current round output by the man-machine interaction model, updating the context information of the current session, so as to add the input information of the current round and the response result into the context information of the current session, and obtain updated context information.

Step S406, the response result of the input information of the round in the context information of the updated session is replaced by the pre-marked response information of the input information of the round.

In this embodiment, the pre-labeled response information of the input information of the current round is used to replace the updated response result of the input information of the current round in the context information of the current session, so that the response result of the input information of the current round in the context information of the current session is forcedly corrected to be the pre-labeled response information of the input information, and it can be ensured that the context information used by each man-machine interaction model in the next round of session is consistent, so that the context information of the input information used by each man-machine interaction model in processing the input information of each round is consistent, and unified pre-labeled multi-round interaction information can be used to fairly evaluate the multi-round interaction capability of different man-machine interaction models.

Through steps S402-S406, according to the multi-round interaction information of any one time of conversation marked in advance, the upper information of each round of input information in the conversation is determined, the input information of each round and the upper information of the input information in the conversation are input into a man-machine interaction model, and the response result of each round of input information is output through the man-machine interaction model.

Illustratively, based on the multi-turn interaction information of example 1 in the foregoing step S201, taking a human-computer interaction model as an example, the following response results shown in table 1 may be obtained through the processing of steps S403 to S406:

TABLE 1

The response results of any one round given in table 1 and the input data of the same row form three groups of samples to be evaluated. Based on the example of table 1, the response result of the input information of the current round in the updated context information of the current session is replaced by the pre-labeled response information of the input information of the current round, and the response result of the input information of each round in the context information of the current session is forcedly corrected to be the pre-labeled response information by introducing a Teacher Forcing (Teacher Forcing) mechanism, so that the context information used by each man-machine interaction model in each round of dialogue is consistent, the context information used by each man-machine interaction model in processing the input information of each round is consistent, and unified pre-labeled multi-round interaction information can be used to fairly evaluate the multi-round interaction capability of different man-machine interaction models.

In this embodiment, through the foregoing steps S401 to S406, a plurality of groups of samples to be evaluated are generated, where each group of samples to be evaluated includes input information of the human-computer interaction model, the above information of the input information, and a response result output by the human-computer interaction model. The evaluation of the multi-round interaction capability of the man-machine interaction model is disassembled and converted into the evaluation of response results generated by given input information and the above information of the input information in a single-round dialogue, and unified and pre-marked multi-round interaction information can be used for fairly evaluating the multi-round interaction capability of different man-machine interaction models.

Further, through steps S407-S408, the foregoing step S203 is implemented to determine evaluation information of multiple rounds of interaction capability of the human-computer interaction model according to the input information of each round, the above information of the input information of each round and the response result in at least one session.

Step S407, according to the input information of each round in at least one session, the above information of the input information of each round and the response result, determining the interaction quality information of the man-machine interaction model in each session.

In the step, the evaluation information of the response result of each round of input information can be determined according to the input information of each round, the above information of each round of input information and the response result in at least one session; and integrating evaluation information of response results of input information of each round in one session, and determining interaction quality information of the man-machine interaction model in the current session.

For example, three rounds of response results are given in table 1 as three results to be evaluated. For the input information of the current wheel shown in any row, different human-computer interaction models may output different response results, and for the different response results of the same input information output by each human-computer interaction model, the evaluation is performed respectively to determine the evaluation information of the response results of the same input information output by each human-computer interaction model.

Illustratively, when the evaluation information of the response result of the input information of the present round is determined according to the input information of any round, the context information of the input information and the response result of the input information of any round in any one session, the first server provides a first interactive interface, and the input information of the present round, the context information of the input information of the present round and the response result of the input information of the present round output by the man-machine interaction model are output on the first interactive interface; and the corresponding relation between the response result and each man-machine interaction model is not displayed on the first interaction interface. The first server receives a quality evaluation value of a response result to the input information, which is input at the first interactive interface.

The first server displays the first interactive interface, illustratively, through a client device running on the end-side device. The user can score the response quality of the response result of the current input information output by each man-machine interaction model in the first interaction interface, the evaluation information serving as the response result is submitted to the end-side device, and the end-side device sends the evaluation information of the response result output by each man-machine interaction model to the first server. Therefore, under the condition that a user is unknown from which human-computer interaction model the response result comes, the response quality of the response result output by each human-computer interaction model is manually marked, and under the double-blind condition, the response result of each human-computer interaction model is objectively evaluated. The first server acquires evaluation information of the response result marked by the person.

Optionally, when the response result is output, desensitizing treatment is performed on the response result to remove information related to the man-machine interaction model for generating the response result (such as information with self identification in the response result generated by the man-machine interaction model) in the response result, so that the annotators are prevented from knowing the corresponding relation between the response result and the man-machine interaction model, double-blind evaluation is realized under the condition that the annotators do not know the corresponding relation between the response result and the man-machine interaction model, and the objectivity of multi-round interaction capability evaluation of the man-machine interaction model is improved.

Further, according to the weight coefficient corresponding to each turn in one session, the evaluation information of the response result of each turn of input information in the same session can be weighted and averaged to obtain the interaction quality information of the man-machine interaction model in the session. Optionally, the evaluation information of the response result of each round of input information in the same session can be averaged to obtain the interaction quality information of the man-machine interaction model in the session.

The weight coefficients of different rounds in one session can be configured according to experience values, and can also be configured by user definition. Illustratively, the first server provides a configuration interface for the weighting coefficients of the different rounds in a session, the configuration interface being used to configure the weighting coefficients of the different rounds in a session. Wherein the weight coefficients of the same turn in different sessions are identical. The configuration interface may provide an input area for designating weight coefficients of rounds, and the weight coefficients corresponding to each round may be input in the input area corresponding to each round, and rounds not configured with weight coefficients are configured as default weight coefficients. The default weight coefficient may be configured according to an empirical value, for example, the default weight coefficient may be configured to be 1, 0.5, 0, or the like, which is not particularly limited herein. The first server obtains the weight coefficient of each round in one session configured on the configuration interface.

Step S408, according to the interaction quality information of the man-machine interaction model in each session, determining the evaluation information of the multi-round interaction capability of the man-machine interaction model.

In an optional implementation manner, the first server may calculate an average value of the interaction quality information of the man-machine interaction model in each session, and use the average value of the interaction quality information of the man-machine interaction model in each session as the evaluation information of the multi-round interaction capability of the man-machine interaction model.

In another optional embodiment, after the multi-round interaction information of at least one pre-labeled session is acquired, multi-round interaction capability covered by each session may be labeled, where the multi-round interaction capability at least includes the following categories: context dependent, context independent, context consistent. Where context-dependent refers to the fact that the context information in multiple rounds of interaction may be helpful (positively influenced) to the response to the current input information. Context-free means that the response of the context information to the current input information in multiple rounds of interaction has a negative impact. Context consistent means that the response of the current input information and the response of the context should remain consistent in multiple rounds of interaction. The multi-round interaction capability covered by each session can be obtained by manual labeling by a labeling person.

An example of multi-round interaction information covering different classes of multi-round interaction capabilities is given in table 2, by way of example:

TABLE 2

In the step, the first server can calculate the average value of the interaction quality information of the man-machine interaction model in the conversation covering the same type of multi-round interaction capability according to the multi-round interaction capability covered by at least one conversation and the interaction quality information of the man-machine interaction model in each conversation, so as to obtain the evaluation information of the man-machine interaction model in various multi-round interaction capability dimensions.

Optionally, the first server may sum the interaction quality information of the man-machine interaction models in the sessions covering the same type of multi-round interaction capability according to the multi-round interaction capability covered by at least one session and the interaction quality information of the man-machine interaction models in each session, so as to obtain evaluation information of the man-machine interaction models in various multi-round interaction capability dimensions.

Further, after the evaluation information of the human-computer interaction model in the multi-round interaction capability dimension is obtained, the first server can also weight and synthesize the evaluation information of the human-computer interaction model in the multi-round interaction capability dimension according to the weight coefficient corresponding to the multi-round interaction capability to obtain the comprehensive evaluation information of the multi-round interaction capability of the human-computer interaction model.

Optionally, when weighting and integrating the evaluation information of the man-machine interaction model in the multi-round interaction capability dimension according to the weight coefficient corresponding to the multi-round interaction capability, the first server may perform weighted summation on the evaluation information of the man-machine interaction model in the multi-round interaction capability dimension according to the weight coefficient corresponding to the multi-round interaction capability to obtain the comprehensive evaluation information of the multi-round interaction capability of the man-machine interaction model. Optionally, when weighting and integrating the evaluation information of the man-machine interaction model in the multi-round interaction capability dimension according to the weight coefficient corresponding to the multi-round interaction capability, the first server may weight and average the evaluation information of the man-machine interaction model in the multi-round interaction capability dimension according to the weight coefficient corresponding to the multi-round interaction capability to obtain the comprehensive evaluation information of the multi-round interaction capability of the man-machine interaction model.

Optionally, displaying a weight configuration interface of various multi-round interaction capabilities; and acquiring weight coefficients corresponding to various multi-round interaction capabilities configured on the weight configuration interface. The weight coefficient of each type of multi-round interaction capability can be configured according to an experience value, and can also be configured by user definition. Illustratively, the first server provides a configuration interface for the weighting coefficients of the various types of multi-round interaction capabilities, the configuration interface being used to configure the weighting coefficients of the various types of multi-round interaction capabilities. And displaying weight configuration interfaces of various multi-round interaction capabilities through the terminal equipment. The configuration interface can provide an input area of the weight coefficients of various multi-round interaction capabilities, and the weight coefficients corresponding to the various multi-round interaction capabilities can be input in the input area corresponding to the various multi-round interaction capabilities. The terminal side equipment acquires the weight coefficients of various multi-round interaction capabilities configured on the configuration interface and sends the weight coefficients to the first server. The first server receives weight coefficients of various rounds of interaction capability configured on the configuration interface, which are sent by the terminal side equipment.

In an optional embodiment, after obtaining the comprehensive evaluation information of the multi-round interaction capability of the man-machine interaction model, the first server may determine whether the man-machine interaction model meets the online condition according to the comprehensive evaluation information of the multi-round interaction capability of the man-machine interaction model; and outputting the on-line prompt information of the man-machine interaction model, wherein the on-line prompt information indicates whether the man-machine interaction model meets the on-line condition.

Optionally, when the online prompt information of the man-machine interaction model is output, evaluation information of the man-machine interaction model, such as comprehensive evaluation information of multi-round interaction capability of the man-machine interaction model and evaluation information of the man-machine interaction model in various multi-round interaction capability dimensions, can be output.

In an alternative embodiment, there may be multiple human-computer interaction models to be evaluated. In the step S203, when determining the evaluation information of the multi-round interaction capability of the man-machine interaction model according to the input information of each round, the above information of the input information of each round and the response result in at least one session, the response quality of the response result of the same input information output by each man-machine interaction model is ordered, and the relative evaluation information of the response quality of each man-machine interaction model can be determined based on the ordering result, and used as the evaluation information of the multi-round interaction capability of the man-machine interaction model.

Specifically, for the input information of any round in any session, the first server may provide a second interaction interface, and output the input information of the round, the above information of the input information of the round, and the response result of the input information of the round output by each man-machine interaction model through the second interaction interface. And the corresponding relation between the response result and each man-machine interaction model is not displayed on the second interaction interface. The annotator can sort the response results of the same input information displayed on the second interactive interface. The first server receives the sequencing result of the response result of the same input information output by each man-machine interaction model appointed in the second interaction interface; and calculating the relative evaluation information of the response quality of each man-machine interaction model according to the sequencing result of the response results of the same input information output by each man-machine interaction model.

The second interaction interface may output any round of input information and the above information of the input information of the present round, and output the response results of the input information of the present round output by each man-machine interaction model, where the response results output by different man-machine interaction models respectively correspond to the non-passing display areas. In the second interactive interface, the labeling personnel can change the ordering of the response results by dragging the position of the display area of the response results. In addition, the sequence of the response results can be changed by inputting the sequence value of the response results, and the position of the display area of each response result in the second interactive interface is automatically adjusted along with the change of the sequence value of the response results.

Optionally, a sequencing algorithm can be used to automatically sequence the response results of the same input information output by each man-machine interaction model, so that the efficiency of data processing can be greatly improved. Specifically, the first server uses a pre-trained response result ordering algorithm to order response results of the same input information output by each man-machine interaction model. Further, the first server calculates relative evaluation information of the response results output by the man-machine interaction models according to the ordering results of the response results of the same input information output by the man-machine interaction models. The ranking algorithm may be implemented by any existing method for ranking text based on text quality, which is not specifically limited herein.

When the relative evaluation information of the response quality of each man-machine interaction model is calculated according to the sequencing result of the response results output by each man-machine interaction model, the first server can calculate the frequency distribution of the response results output by the man-machine interaction model in each ranking according to the sequencing result of the response results output by each man-machine interaction model, and the higher the frequency of ranking is as the relative evaluation information of each man-machine interaction model, the better the response quality of the man-machine interaction model is, and the stronger the multi-round interaction capability of the man-machine interaction model is. The relative quality of any human-computer interaction model in a plurality of human-computer interaction models to be compared can be intuitively displayed by outputting the frequency distribution of the response results output by each human-computer interaction model and arranged at each ranking.

Optionally, according to the sequencing result of the response results of the same input information output by each man-machine interaction model, for any two man-machine interaction models, the condition of win, lose and peace between the two man-machine interaction models can be calculated and used as the relative evaluation information of each man-machine interaction model. For the response results of the two man-machine interaction models A and B aiming at the same input information, if the response result of the man-machine interaction model A is arranged in front of the response result of the man-machine interaction model B in the ordering result, the man-machine interaction model A is superior. And if the response result of the man-machine interaction model A in the sequencing result is ranked behind the response result of the man-machine interaction model B, the man-machine interaction model A is negative. If the response result of the man-machine interaction model A is parallel to the response result of the man-machine interaction model B in the sequencing result, the two man-machine interaction models are flat. Based on the multi-round input information, according to the sequencing result of the response results of the two man-machine interaction models aiming at the same input information, the number of times of wins, negatives and peaces of any man-machine interaction model can be calculated, or the wining rate of any man-machine interaction model can be calculated. The relative quality of the man-machine interaction model in a plurality of man-machine interaction models to be compared can be intuitively displayed by outputting the times of winning, negating and peacing of any man-machine interaction model or the winning rate of any man-machine interaction model.

Optionally, according to the sequencing result of the response result of each human-computer interaction model to the same input information, calculating the erlo rating (Elo rating) of each human-computer interaction model, and based on multiple rounds of input information, calculating the erlo rating (Elo rating) of each human-computer interaction model for multiple times and taking an average value as the relative evaluation information of the response quality of each human-computer interaction model. The relative quality of each human-computer interaction model in a plurality of human-computer interaction models to be compared can be intuitively displayed by outputting the average value of the erlo class fractions of each human-computer interaction model.

And S409, outputting evaluation information of the multi-round interaction capability of the man-machine interaction model.

The first server may determine whether the human-computer interaction model meets the online condition according to the evaluation information of the multiple-round interaction capability of the human-computer interaction model after obtaining the evaluation information of the multiple-round interaction capability of the human-computer interaction model; and outputting online prompt information of the man-machine interaction model and/or evaluation information of multi-round interaction capability of the man-machine interaction model, wherein the online prompt information indicates whether the man-machine interaction model meets online conditions. The online condition comprises a first threshold value of evaluation information of multi-round interaction capability of the human-computer interaction model, if the evaluation information of the multi-round interaction capability of the human-computer interaction model is larger than or equal to the first threshold value, the human-computer interaction model meets the online condition, otherwise, the human-computer interaction model does not meet the online condition. The first threshold in the online condition can be configured by a user in a customized way according to the needs of the specific application scene.

According to the scheme of the embodiment, through obtaining the multi-round interaction information of at least one pre-marked session, according to the multi-round interaction information of any pre-marked session, the following interaction processing is sequentially carried out on the input information of each round in the session: taking the context information of the current session as the context information of the input information of the current round, wherein the context information of the current session comprises the history interaction information of the current session; inputting the input information of the round and the context information of the current session into a man-machine interaction model, and outputting a response result of the input information of the round through the man-machine interaction model; the context information of the current session is updated, and the response result of the input information of the current session is replaced by the pre-marked response information of the input information of the current session, so that the response result of the input information of each session is obtained by the human-computer interaction model, and when the pre-marked multi-round interaction information of any one session is applied to evaluating a plurality of different human-computer interaction models, the historical context information of the same input information of each human-computer interaction model in the multi-round interaction of the current session is completely consistent, and the unified pre-marked multi-round interaction information can be used to fairly and objectively evaluate the multi-round interaction capability of different human-computer interaction models, so that the quality of multi-round interaction in the human-computer interaction based on the human-computer interaction model is improved.

Fig. 5 is an interaction flow chart of a data processing method of man-machine interaction according to an exemplary embodiment of the present application. In this embodiment, taking a man-machine interaction model for realizing man-machine interaction as a language model as an example, a flow of evaluating multiple rounds of interaction abilities of multiple language models is exemplarily described. As shown in fig. 5, the interaction flow between the first server and the end-side device is as follows:

in step S501, the end-side device sends a multi-round interactive capability assessment request for multiple language models to the first server.

The language models can be pre-trained, and can be specifically applied to fields of Natural Language Processing (NLP), computer vision and the like, and can be specifically applied to tasks in the field of crossing of NLP and computer vision such as visual question-answering (VQA), image description (IC), visual implication (VE), expression and understanding (REC), and the like, as well as tasks in the field of natural language processing such as emotion classification tasks based on texts and text summarization tasks, and can be applied to various application scenes such as digital assistants, intelligent robots, searching, online education, office software, electronic commerce, intelligent design and the like.

Step S502, a first server receives multiple rounds of interactive capability assessment requests for multiple language models sent by a terminal side device.

In step S503, the first server obtains the multi-round interaction information of at least one pre-labeled session and the multi-round interaction capability covered by each session, where the multi-round interaction information of one session includes multi-round input information and pre-labeled response information of each round of input information.

In this step, the implementation manner of the first server for obtaining the multi-round interaction information of the pre-labeled at least one session is consistent with the implementation manner of the foregoing step S201, and details of the related content of the foregoing embodiment are not described herein.

In this embodiment, the first server may further obtain a multi-round interaction capability covered by each pre-labeled session, where the multi-round interaction capability at least includes the following categories: context dependent, context independent, context consistent. Where context-dependent refers to the fact that the context information in multiple rounds of interaction may be helpful (positively influenced) to the response to the current input information. Context-free means that the response of the context information to the current input information in multiple rounds of interaction has a negative impact. Context consistent means that the response of the current input information and the response of the context should remain consistent in multiple rounds of interaction. The multi-round interaction capability covered by each session can be obtained by manual labeling by a labeling person. Examples of multi-round interaction information covering different kinds of multi-round interaction capabilities are referred to in table 2, and are not described here again.

Step S504, the first server determines the upper information of each round of input information in the current session according to the pre-labeled multi-round interaction information of any one time session, inputs each round of input information and the upper information of the input information in the current session into each language model, and outputs the response result of each round of input information through each language model.

The step is similar to the specific implementation manner of the step S202, and the man-machine interaction model to be evaluated in the step S504 is a plurality of language models specified by the user, and the specific implementation manner refers to the relevant content in the foregoing embodiment, which is not described herein again.

Step S505, the first server determines evaluation information of multiple language models in various multiple interactive capability dimensions according to the input information of each round, the above information of the input information of each round, the response result of the input information of each round output by each language model in at least one session, and the multiple interactive capability covered by each session.

In the step, the first server determines the interaction quality information of the man-machine interaction model in each session according to the input information of each round in at least one session, the above information of the input information and the response result. And determining evaluation information of the multi-round interaction capability of the man-machine interaction model according to the interaction quality information of the man-machine interaction model in each session and the multi-round interaction capability covered by each session.

The first server determines, according to the input information of each round in at least one session, the above information of the input information, and the response result, an interaction quality information of the man-machine interaction model in each session, which is consistent with the specific implementation manner of the step S507, and the specific implementation manner refers to the related content in the foregoing embodiment, which is not repeated herein.

Further, the first server can calculate the average value of the interaction quality information of the man-machine interaction model in the session covering the same type of multi-round interaction capability according to the multi-round interaction capability covered by at least one session and the interaction quality information of the man-machine interaction model in each session, so as to obtain the evaluation information of the man-machine interaction model in various multi-round interaction capability dimensions. The specific implementation manner refers to the relevant content in the foregoing step S508, and will not be described herein.

And step S506, the first server outputs evaluation information of the multiple language models in various multi-round interaction capability dimensions to the terminal side equipment.

And step S507, the terminal side equipment receives evaluation information of a plurality of language models in various multi-round interaction capability dimensions, which is sent by the first server.

And step S508, the end side equipment outputs evaluation information of the multiple language models in various multi-round interaction capability dimensions.

In an optional embodiment, the terminal device may weight, sum and/or average the evaluation information of each language model in each multi-round interaction capability dimension according to the weight coefficients corresponding to each multi-round interaction capability, obtain the comprehensive evaluation information of the multi-round interaction capability of each language model, and output the comprehensive evaluation information of the multi-round interaction capability of each language model. Further, the terminal device may select one of the language models as the target model according to the comprehensive evaluation information of the multi-round interaction capability of each of the language models, and output information of the target model. Optionally, the terminal side device may further select one of the language models as an optimized version according to the comprehensive evaluation information of the multiple interactive capabilities of each of the language models, and update the optimized version of the language model.

In an optional embodiment, the terminal device may compare the evaluation information of the multiple language models in various multi-round interaction capability dimensions, and output a comparison result. The comparison result is used for guiding the user to select the language model with better multi-round interaction capability.

In this embodiment, for a plurality of language models, by acquiring the multi-round interaction information of at least one pre-labeled session, the multi-round interaction information of one session includes multi-round input information, pre-labeled response information of each round of input information, and multi-round interaction capability covered by the present session; determining the upper information of each round of input information in the current session according to the pre-marked multi-round interaction information of any one session, inputting each round of input information and the upper information of the input information in the current session into each language model, and outputting the response result of each round of input information through each language model; according to the input information of each round, the above information of the input information of each round and the response result of the input information of each round output by each language model in at least one session and the multi-round interaction capability covered by each session, the evaluation information of the multiple language models in various multi-round interaction capability dimensions is determined, so that the multi-round interaction capability of different language models can be evaluated fairly and objectively, the multi-round interaction capability of the language models can be evaluated comprehensively and accurately from the dimensions of the multiple different multi-round interaction capabilities, the multi-round interaction capability evaluation quality of the language models is improved, and the man-machine interaction quality based on the language models can be improved.

Fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application. As shown in fig. 6, the server includes: a memory 601 and a processor 602. Memory 601 is used to store computer-executable instructions and may be configured to store various other data to support operations on a server. The processor 602 is communicatively connected to the memory 601, and is configured to execute the computer-executable instructions stored in the memory 601 to implement the technical scheme of the first server in any of the above method embodiments, and the specific functions and the technical effects that can be implemented are similar, and are not repeated herein.

Optionally, as shown in fig. 6, the cloud server further includes: firewall 603, load balancer 604, communication component 605, power component 606, and other components. Only some components are schematically shown in fig. 6, which does not mean that the cloud server only includes the components shown in fig. 6.

The embodiment of the application also provides an end side device, which comprises: memory and a processor. The memory is used to store computer-executable instructions and may be configured to store various other data to support operations on the end-side device. The processor is in communication connection with the memory, and is configured to execute computer-executed instructions stored in the memory, so as to implement the technical scheme executed by the end-side device in any of the above method embodiments, and specific functions and technical effects that can be implemented are similar, and are not repeated herein.

The embodiment of the application further provides a computer readable storage medium, in which computer executable instructions are stored, where the computer executable instructions are used to implement the technical scheme executed by the first server in any of the above method embodiments when executed by the processor, and specific functions and technical effects that can be implemented are not described herein.

The embodiment of the application further provides a computer readable storage medium, in which computer executable instructions are stored, and when the computer executable instructions are executed by a processor, the computer executable instructions are used to implement the technical scheme executed by the end-side device in any of the above method embodiments, and specific functions and technical effects that can be implemented are not repeated herein.

The embodiment of the application also provides a computer program product, which comprises: the computer program is stored in a readable storage medium, and the at least one processor of the first server may read the computer program from the readable storage medium, where execution of the computer program by the at least one processor causes the first server to execute the technical solution executed by the first server in any one of the method embodiments, and specific functions and technical effects that can be achieved are not repeated herein.

The embodiment of the application also provides a computer program product, which comprises: the computer program is stored in the readable storage medium, and the at least one processor of the end-side device may read the computer program from the readable storage medium, where execution of the computer program by the at least one processor causes the end-side device to execute the technical solution executed by the end-side device in any of the foregoing method embodiments, and specific functions and technical effects that can be achieved are not repeated herein.

The embodiment of the application provides a chip, which comprises: the processing module and the communication interface, where the processing module can execute the technical solution of the first server or the end-side device in the foregoing method embodiment. Optionally, the chip further includes a storage module (e.g. a memory), where the storage module is configured to store the instructions, and the processing module is configured to execute the instructions stored in the storage module, and execution of the instructions stored in the storage module causes the processing module to execute the technical solution executed by the first server or the end-side device in any one of the foregoing method embodiments.

The memory may be an object store (Object Storage Service, OSS). The memory may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The communication component is configured to facilitate wired or wireless communication between the device in which the communication component is located and other devices. The device where the communication component is located may access a wireless network based on a communication standard, such as a mobile hotspot (WiFi), a mobile communication network of a second generation mobile communication system (2G), a third generation mobile communication system (3G), a fourth generation mobile communication system (4G)/Long Term Evolution (LTE), a fifth generation mobile communication system (5G), or a combination thereof. In one exemplary embodiment, the communication component receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies. The power supply component provides power for various components of equipment where the power supply component is located. The power components may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the devices in which the power components are located.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, compact disk read-only memory (CD-ROM), optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media. Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should be noted that, the user information (including but not limited to user equipment information, user attribute information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with related laws and regulations and standards, and provide corresponding operation entries for the user to select authorization or rejection.

In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations appearing in a particular order are included, but it should be clearly understood that the operations may be performed out of order or performed in parallel in the order in which they appear herein, merely for distinguishing between the various operations, and the sequence number itself does not represent any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types. The meaning of "a plurality of" is two or more, unless specifically defined otherwise.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A data processing method for man-machine interaction, comprising:

acquiring the multi-round interaction information of at least one pre-labeled session, wherein the multi-round interaction information of one session comprises multi-round input information and pre-labeled response information of each round of input information;

determining the upper information of each round of input information in a conversation according to the multi-round input information of any one time of conversation which is marked in advance and the response information which corresponds to each round of input information, inputting the upper information of each round of input information and the input information in the conversation into a man-machine interaction model, and outputting the response result of each round of input information through the man-machine interaction model;

Determining evaluation information of the multi-round interaction capability of the man-machine interaction model according to the input information of each round, the above information of the input information of each round and the response result in the at least one session;

and outputting evaluation information of the multi-round interaction capability of the man-machine interaction model.

2. The method according to claim 1, wherein determining the context information of each round of input information in the session according to the multi-round input information of any one session and the pre-labeled response information of the input information, and inputting each round of input information and the context information of each round of input information in the session into the man-machine interaction model, and outputting the response result of each round of input information through the man-machine interaction model comprises:

according to the multi-round input information of any one pre-labeled session and the pre-labeled response information of each round of input information, the following interactive processing is sequentially carried out on each round of input information in the session:

taking the context information of the session as the context information of the input information of the current round, wherein the context information of the session comprises the history interaction information of the session;

inputting the input information of the round and the context information of the session into a man-machine interaction model, and outputting a response result of the input information of the round through the man-machine interaction model;

And updating the context information of the session, and replacing the response result of the input information of the round in the updated context information of the session by using the pre-marked response information of the input information of the round.

3. The method according to claim 1, wherein determining the context information of each round of input information in the session according to the pre-labeled multi-round input information of any one time session and the pre-labeled response information of each round of input information comprises:

and according to the multi-round input information of any pre-labeled session and the pre-labeled response information of the input information of each round, regarding the input information of any round in the session, taking the input information before the round and the pre-labeled response information of the input information as the above information of the input information of the round.

4. The method according to claim 1, wherein determining the evaluation information of the multi-round interaction capability of the man-machine interaction model according to the input information of each round, the above information of the input information of each round and the response result in the at least one session comprises:

according to the input information of each round in the at least one session, the above information of the input information of each round and the response result, determining the interaction quality information of the man-machine interaction model in each session;

And determining evaluation information of the multi-round interaction capability of the man-machine interaction model according to the interaction quality information of the man-machine interaction model in each session.

5. The method according to claim 4, wherein after the obtaining the multi-round interaction information of the pre-labeled at least one session, the method comprises:

marking multi-round interaction capability covered by each session, wherein the multi-round interaction capability at least comprises the following categories: context dependent, context independent, context consistent.

6. The method according to claim 5, wherein determining evaluation information of multiple rounds of interaction ability of the human-computer interaction model according to the interaction quality information of the human-computer interaction model in each session comprises:

according to the multi-round interaction capability covered by the at least one session and the interaction quality information of the man-machine interaction model in each session, calculating to obtain evaluation information of the man-machine interaction model in various multi-round interaction capability dimensions according to the interaction quality information of the man-machine interaction model in the session covering the same multi-round interaction capability.

7. The method of claim 6, wherein determining the human-machine interaction model after evaluating information of various multi-round interaction capability dimensions further comprises:

Weighting and synthesizing the evaluation information of the human-computer interaction model in the dimensions of various multi-round interaction capabilities according to the weight coefficients corresponding to the various multi-round interaction capabilities to obtain comprehensive evaluation information of the multi-round interaction capabilities of the human-computer interaction model;

determining whether the man-machine interaction model meets an online condition according to comprehensive evaluation information of the multi-round interaction capability of the man-machine interaction model;

and outputting the on-line prompt information of the man-machine interaction model and/or the evaluation information of the man-machine interaction model, wherein the on-line prompt information indicates whether the man-machine interaction model meets the on-line condition or not.

8. The method as recited in claim 7, further comprising:

a weight configuration interface for displaying various multi-round interaction capabilities;

and acquiring weight coefficients corresponding to various multi-round interaction capabilities configured on the weight configuration interface.

9. The method according to claim 4, wherein determining the interaction quality information of the man-machine interaction model in each session according to each round of input information in the at least one session, the above information of each round of input information, and the response result comprises:

determining evaluation information of the response result of each round of input information according to each round of input information, the above information of each round of input information and the response result in the at least one session;

And according to the weight coefficient corresponding to each round in one session, weighting and averaging the evaluation information of the response result of each round of input information in the same session to obtain the interaction quality information of the man-machine interaction model in the current session.

10. The method according to claim 1, wherein determining the evaluation information of the multi-round interaction capability of the man-machine interaction model according to the input information of each round, the above information of the input information of each round and the response result in the at least one session comprises:

outputting the input information of the round, the above information of the input information of the round and the response results of the input information of the round output by a plurality of man-machine interaction models through an interaction interface for any round of input information in any session, wherein the interaction interface does not display the corresponding relation between the response results and the man-machine interaction models;

receiving a sequencing result of a response result of the same input information output by each human-computer interaction model appointed in the interaction interface;

and calculating the relative evaluation information of the response quality of each human-computer interaction model according to the sequencing result of the response result of the same input information output by each human-computer interaction model.

11. The method according to any one of claims 1-10, further comprising:

selecting one of the man-machine interaction models as a target model according to evaluation information of multi-round interaction capability of the man-machine interaction models, and outputting information of the target model to end-side equipment;

or alternatively, the process may be performed,

and selecting one version as an optimized version according to the evaluation information of the multi-round interaction capability of the man-machine interaction models of a plurality of different versions, and updating the optimized version of the man-machine interaction model.

12. The data processing method of man-machine interaction is characterized by being applied to a server and comprising the following steps:

receiving multiple rounds of interactive capability assessment requests sent by the terminal side equipment for multiple language models;

acquiring multi-round interaction information of at least one pre-labeled session and multi-round interaction capability covered by each session, wherein the multi-round interaction information comprises multi-round input information and pre-labeled response information of each round of input information;

determining the upper information of each round of input information in a conversation according to the multi-round input information of any one-time conversation which is marked in advance and the response information of each round of input information, inputting each round of input information and the upper information of the input information in the conversation into each language model, and outputting the response result of each round of input information through each language model;

Determining evaluation information of the multiple language models in various multiple interactive capability dimensions according to the input information of each round, the above information of the input information of each round, the response result of the input information of each round output by each language model in the at least one session and the multiple interactive capability covered by each session;

and outputting evaluation information of the multiple language models in various multi-round interaction capability dimensions to the end-side equipment.

13. The data processing method of man-machine interaction is characterized by being applied to a server and comprising the following steps:

acquiring an evaluation model to be tested provided by end-side equipment;

responding to a multi-round interaction capability assessment request sent by end side equipment to the to-be-assessed model, acquiring pre-labeled multi-round interaction information of at least one session, wherein the multi-round interaction information of one session comprises multi-round input information and pre-labeled response information of the input information;

determining the upper information of each round of input information in a conversation according to the multi-round input information of any one time of conversation which is marked in advance and the response information of the input information which is marked in advance, inputting the upper information of each round of input information and the input information in the conversation into the evaluation model to be tested, and outputting the response result of each round of input information through the evaluation model to be tested;

Determining evaluation information of the multi-round interaction capability of the to-be-evaluated model according to the input information of each round, the above information of the input information of each round and the response result in the at least one session;

and outputting evaluation information of the multi-round interaction capability of the to-be-evaluated model to the end-side equipment.

14. A server, comprising: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1-13.