CN113761156A

CN113761156A - Data processing method, device and medium for man-machine interaction conversation and electronic equipment

Info

Publication number: CN113761156A
Application number: CN202110599432.9A
Authority: CN
Inventors: 李泽康; 张金超; 费政聪
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-12-07

Abstract

The application belongs to the technical field of man-machine interaction dialog of artificial intelligence, and particularly discloses a data processing method and device of man-machine interaction dialog, a readable medium and electronic equipment, wherein predicted text semantic features and real text semantic features are extracted by acquiring query texts to be replied and historical dialog data; generating dialogue quality evaluation information of the man-machine interaction dialogue according to the semantic features of the predicted text and the semantic features of the real text; the dialogue quality evaluation information of the man-machine interaction dialogue generated by the application is combined with the specific context, the dialogue condition of the man-machine interaction can be objectively reflected, the evaluation is practical, the user experience in the man-machine interaction dialogue is enhanced, technical personnel can make optimization and improvement on the man-machine interaction dialogue more conveniently, and the industrial development of the man-machine interaction is effectively promoted.

Description

Data processing method, device and medium for man-machine interaction conversation and electronic equipment

Technical Field

The application belongs to the technical field of artificial intelligence, and particularly relates to a data processing method of a human-computer interaction session, a data processing device of the human-computer interaction session, a computer readable medium and electronic equipment.

Background

The man-machine interactive dialogue is that a user conducts multi-turn chatting with the artificial intelligent device, the chatting content is not limited to any specific task, so that no standard answer exists, the intelligent chatting robot and the user can fully conduct smooth communication with context, and the robot is embodied with high dialogue capability.

However, the development work of the chat robot still belongs to the exploration period at present, and whether the chat of the chat robot brings good chat experience to the user cannot be determined at present, and more directly, an evaluation method capable of evaluating the human-computer interaction quality is not provided at present to automatically evaluate the human-computer interaction quality.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present application and therefore may include information that does not constitute prior art known to a person of ordinary skill in the art.

Disclosure of Invention

The application aims to provide a data processing method of a man-machine interaction session, a data processing device of the man-machine interaction session, a computer readable medium and an electronic device. The technical problems that the man-machine interaction dialogue quality cannot be objectively and truly evaluated in the related technology and the like are solved at least to a certain extent.

Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.

According to an aspect of an embodiment of the present application, there is provided a data processing method for a human-computer interaction session, including:

acquiring a query text to be replied and historical dialogue data, wherein the historical dialogue data comprises a plurality of rounds of dialogue data of a man-machine interaction dialogue generated before the query text;

extracting semantic features of a predicted text of the current conversation turn according to the query text and the historical conversation data, and generating a reply text for replying the query text according to the semantic features of the predicted text;

extracting real text semantic features of the current conversation turn according to the reply text, the query text and the historical conversation data;

and generating dialogue quality evaluation information of the man-machine interaction dialogue according to the predicted text semantic features and the real text semantic features.

According to an aspect of an embodiment of the present application, there is provided a data processing apparatus for a human-computer interaction session, the apparatus including:

the data acquisition module is used for acquiring a query text to be replied and historical dialogue data, and the historical dialogue data comprises multiple rounds of dialogue data of human-computer interaction dialogue generated before the query text;

the feature extraction module is connected with the data acquisition module and is used for extracting semantic features of a predicted text of the current conversation turn according to the query text and the historical conversation data and generating a reply text for replying the query text according to the semantic features of the predicted text; the feature acquisition module is further used for extracting real text semantic features of the current conversation turn according to the reply text, the query text and the historical conversation data;

and the evaluation information generation module is connected with the feature extraction module and is used for generating the dialogue quality evaluation information of the man-machine interaction dialogue according to the predicted text semantic features and the real text semantic features.

In an embodiment of the present application, based on the above technical solution, the feature extraction module includes:

the contextual feature extraction unit is configured to perform feature extraction on the query text and the historical dialogue data to obtain current contextual features in the current dialogue turn;

a prediction unit configured to predict a predicted context feature in a next conversation turn from the current context feature;

a semantic feature extraction unit configured to determine a predicted text semantic feature of a current conversation turn according to a feature difference between the predicted contextual feature and the current contextual feature.

In an embodiment of the application, based on the above technical solution, the contextual feature extraction unit is further configured to perform feature extraction on the reply text, the query text, and the historical dialogue data to obtain a real contextual feature in a next dialogue turn;

the semantic feature extraction unit is further configured to determine real text semantic features for a current turn of the dialog based on feature differences between the real contextual features and the current contextual features.

In an embodiment of the application, based on the above technical solution, the semantic feature extraction unit is further configured to combine the reply text, the query text, and the historical dialogue data to form real dialogue data; inputting the real dialogue data into a coding model to obtain real context characteristics; the coding model uses a bidirectional self-attention mechanism to calculate context-dependent vector representations of all contextual features in the real dialogue data, and then takes the mean of the context-dependent vector representations of all contextual features as the real contextual features of the real dialogue data.

In an embodiment of the present application, based on the above technical solution, the feature extraction module further includes an encoding unit, where the encoding unit is configured to calculate context-related vector representations of all contextual features in the historical dialogue data by using a bidirectional self-attention mechanism, and then take a mean value of the context-related vector representations of all contextual features as a current contextual feature of the historical dialogue data;

the prediction unit is further configured to acquire all contextual features of historical dialogue data and input all contextual features of the historical dialogue data into a machine learning model for training, so as to obtain a model of predicted contextual features; and inputting the current contextual features into the model of the predicted contextual features to obtain predicted contextual features.

In an embodiment of the application, based on the above technical solution, the prediction unit is further configured to obtain historical dialogue data, and input all question and answer data of the historical dialogue data into a machine learning model for training, so as to obtain a model of a prediction reply text; and inputting the query data to be replied and the semantic features of the predicted text into a model of the predicted reply text to obtain a reply text.

In an embodiment of the application, based on the above technical solution, the evaluation information generation module further includes a calculation unit, and the calculation unit is configured to substitute the predicted text semantic features and the real text semantic features into a two-vector included angle cosine formula to obtain a vector included angle between the predicted text semantic features and the real text semantic features;

the computing unit is further configured to divide a minimum value of the absolute value of the predicted text semantic feature and the absolute value of the real text semantic feature by a maximum value of the absolute value of the predicted text semantic feature and the absolute value of the real text semantic feature to obtain a difference value between the predicted text semantic feature and the real text semantic feature;

the calculation unit is further configured to multiply a vector included angle between the predicted text semantic features and the real text semantic features by a difference value to obtain dialogue quality evaluation information.

According to an aspect of the embodiments of the present application, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor implements a data processing method of a human-computer interaction dialog as in the above technical solutions.

According to an aspect of an embodiment of the present application, there is provided an electronic apparatus including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the data processing method of the man-machine interaction dialog as in the above technical solution via executing the executable instructions.

According to an aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the data processing method of the man-machine interaction session in the technical scheme.

In the technical scheme provided by the embodiment of the application, the dialogue quality evaluation information of the man-machine interaction dialogue is generated by the predicted text semantic features and the real text semantic features, and the predicted text semantic features and the real text semantic features correspond to the change of the context, so that the dialogue quality evaluation information of the man-machine interaction dialogue generated by the application is combined with the specific context, the dialogue condition of the man-machine interaction can be reflected more objectively, the evaluation is more practical, the user experience in the man-machine interaction dialogue is enhanced, technical personnel can perform optimization and improvement on the man-machine interaction dialogue according to the dialogue quality evaluation information of the man-machine interaction dialogue more conveniently, and the industrial development of the man-machine interaction is effectively promoted.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It should be apparent that the drawings in the following description are merely one embodiment of the present application, and that other drawings may be obtained from those drawings by those skilled in the art without inventive effort.

Fig. 1 schematically shows a block diagram of an exemplary system architecture to which the solution of the present application applies.

Fig. 2 schematically shows a flow chart of steps of a data processing method of a human-computer interaction dialog in an embodiment of the present application.

Fig. 3 schematically shows a flowchart of steps of an extraction method of semantic features of a predictive text in one embodiment of the present application.

FIG. 4 is a flow chart that schematically illustrates steps of a method for extracting predicted contextual features according to an embodiment of the present application.

Fig. 5 is a flow chart schematically illustrating steps of a method for extracting reply texts in one embodiment of the present application.

Fig. 6 schematically shows a flowchart of steps of a method for extracting semantic features of a real text in one embodiment of the present application.

Fig. 7 schematically shows a flowchart of steps of a method for extracting real contextual features in an embodiment of the present application.

Fig. 8 is a flow chart schematically illustrating steps of a data processing method of a man-machine interaction dialog in a specific application embodiment of the present application.

Fig. 9 schematically shows a block diagram of a data processing apparatus of a human-computer interaction dialog in an embodiment of the present application.

FIG. 10 schematically illustrates a block diagram of a computer system suitable for use in implementing an electronic device of an embodiment of the present application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-field cross subject, and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The scheme provided by the embodiment of the application relates to a man-machine interaction dialogue technology of artificial intelligence, the man-machine interaction dialogue uses a machine learning technology in an artificial intelligence software technology, and the content of the embodiment of the application is specifically described through a specific man-machine interaction dialogue technology.

The man-machine interactive dialogue is that a user chats with the artificial intelligence equipment, and the chatting content is not limited to any specific task, so that no specific standard answer is provided. There are many kinds of artificial intelligence devices, such as smart speakers, smart chat robots, etc. The artificial intelligence device can fully and smoothly communicate with the user in the context, and is embodied that the artificial intelligence device has higher conversation capacity. Smooth and easy exchange is to carry out direct effective interaction, and the artificial intelligence equipment of the question asked by the user can have corresponding response at once, and the condition that the artificial intelligence equipment card shell does not reply or the content replied is irrelevant to the question can not appear, to smooth and easy exchange, artificial intelligence equipment can realize comparatively easily.

The smooth communication of the context is difficult because the artificial intelligence device has a high degree of intelligence, but the artificial intelligence device still has no way to understand the complicated language system of the user. In terms of context, the same words may represent different meanings. Context refers to the language context, which is the context in which a conversation is conducted. There are many kinds of existing contexts, such as scenario context: one factor that is abstracted from the actual scene and affects the speech activity includes the two parties of human-computer interaction, the occasion (time and place), the formal degree of speaking, the interpersonal media, the topic or the language domain. The speech behavior always occurs in a certain situation, and the actual situation (such as related people, events, time, places, and the like) of the speech behavior can also help to determine the meaning represented by the language form. There is also a cultural context: the method can be divided into two aspects, namely a cultural custom which refers to a life mode that people inherit from generations and get along to learn in social life, is a collective habit of the social people on language, behavior and psychology, and has normative and constraint on members belonging to the collective. Secondly, social norms. Refers to the various regulations and restrictions a society makes on verbal communication activities.

For example, a user says "there is no cloud today, and the weather today can be really good" to an artificial intelligence device. The weather can be really like today is a grand prize for the weather, and the current mood of the user is shown to be positive and optimistic, and the corresponding context is the context which compares the mood of the client with better mood and better sunlight. For example, the user says "raining all the time today in the morning, he/she can feel so today" to the artificial intelligence device. Here, "weather today can be really good" is a complaint about weather, and the corresponding context indicates that the current mental state of the user is negative. Therefore, if the artificial intelligence device does not recognize the user context, the same answer is provided for the questions of the user, so that the human-computer interaction experience of the user is poor, and the effective performance of the human-computer interaction is affected. Therefore, the quality evaluation of the man-machine interaction dialog is difficult to objectively evaluate if the dialog is not combined with the context.

By taking the smart speaker as an example, the chat experience of the user and the smart speaker will directly affect the overall evaluation of the product. However, the development work of the existing intelligent sound box still belongs to the exploration period, a product side cannot determine what chat experience a user thinks is good, and further cannot objectively evaluate the interactive information of the intelligent sound box. At present, the evaluation of the chat robot is usually evaluated depending on objective factors such as behavior data (such as number of conversation turns and duration) and the like, or only the chat conversation is evaluated (such as whether the response is continuous or not and whether the response is timely or not), the evaluation dimension is too coarse, and an integral evaluation index of the quality of the human-computer interaction conversation is lacked, so that the existing evaluation of the human-computer interaction conversation is not objective, and the improvement of the human-computer interaction conversation is not facilitated. Therefore, the application provides a solution capable of evaluating the quality of a human-computer interaction dialog. The solution of the present application will be further explained below.

As shown in fig. 1, system architecture 100 may include a terminal device 110, a network 120, and a server 130. The terminal device 110 may include various electronic devices such as a smart phone, a tablet computer, a notebook computer, and a desktop computer. The server 130 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. Network 120 may be a communication medium of various connection types capable of providing a communication link between terminal device 110 and server 130, such as a wired communication link or a wireless communication link. Wherein the artificial intelligence device may be provided in the terminal device 110 or in the server 130. For example in the form of an artificial intelligence device via virtual chat software.

The user may make an input of the query text through the terminal device 110, and the server 130 receives the query text of the terminal device 110 through the network 120. The server 130 acquires a query text to be replied and historical dialogue data, wherein the historical dialogue data comprises a plurality of rounds of dialogue data of a human-computer interaction dialogue generated before the query text; historical dialogue data may be stored in server 130. The server 130 extracts the semantic features of the predicted text of the current conversation turn according to the query text and the historical conversation data, and generates a reply text for replying the query text according to the semantic features of the predicted text; the server 130 extracts the semantic features of the real text of the current conversation turn according to the reply text, the query text and the historical conversation data; and generating dialogue quality evaluation information of the man-machine interaction dialogue according to the semantic features of the predicted text and the semantic features of the real text. Therefore, according to the present application, after the server 130 receives the query text of the terminal device 110 through the network 120, the predicted text semantic features and the real text semantic features can be extracted according to the query text and the historical dialogue data stored in the server 130, and then the dialog quality evaluation information of the human-computer interaction dialog is generated by using the two semantic features.

The system architecture in the embodiments of the present application may have any number of terminal devices, networks, and servers, according to implementation needs. For example, the server 130 may be a server group composed of a plurality of server devices. In addition, the technical solution provided by the embodiment of the present application is mainly applied to an artificial intelligence device, and the artificial intelligence device may be embedded in the terminal device 110 or the server 130. Therefore, the technical solution provided in the embodiment of the present application may be applied to the terminal device 110, or may be applied to the server 130, or may be implemented by both the terminal device 110 and the server 130, which is not particularly limited in this application.

The above discloses the content of the system architecture corresponding to the technical scheme of the present application, and the following specifically discloses a specific method corresponding to the technical scheme of the present application.

Fig. 2 schematically shows a flow chart of steps of a data processing method of a human-computer interaction dialog in an embodiment of the present application. The method may be applied to the server 130, may be applied to the terminal device 110, or may be executed by both the terminal device 110 and the server 130. The application discloses a data processing method of a man-machine interaction dialogue, which specifically comprises the steps S210-S240.

Wherein step S210 includes: and acquiring query texts to be replied and historical dialogue data, wherein the historical dialogue data comprises a plurality of rounds of dialogue data of a human-computer interaction dialogue generated before the query texts.

In a specific application, the manner of obtaining the query text to be replied may be various. Illustratively, when a user carries out man-machine interaction dialogue by using characters, character information input by the user can be directly received and used as a query text to be replied; or when the user carries out man-machine conversation with voice, the received voice information can be converted into text data to be used as the query text to be replied. Or when the user performs man-machine interaction by inputting pictures, the pictures can be identified and converted into text data through identification software to be used as query texts to be replied.

Also, the manner of obtaining the historical dialog data may be various. Illustratively, when the historical dialogue data is stored locally in the server 130 or the terminal device 110 as the subject of execution of the application, the user identifier of the query text to be replied may be obtained, and then the multiple rounds of dialogue data of the human-computer interaction dialogue generated before the query text by the user identifier may be read locally from the server 130 or the terminal device 110 as the historical dialogue data. Alternatively, for example, when the historical dialogue data is stored in the server 130 or a cloud of the network to which the terminal device 110 is communicatively connected, the user identifier of the query text to be replied may be obtained, and then multiple rounds of dialogue data of the human-computer interaction dialogue generated before the query text by the user identifier may be requested from the cloud as the historical dialogue data.

In addition, for any query text to be replied, the historical dialogue data may specifically include historical input data before the query text to be replied, and reply data of human-computer interaction corresponding to the historical input data. The number of the historical input data can be set according to a specific application scene or context, and can be one or more.

Any method for obtaining the query text to be replied and the historical dialogue data can be used in the present invention, and this embodiment does not limit this.

Step S220 includes extracting semantic features of the predicted text of the current dialog turn according to the query text and the historical dialog data, and generating a reply text for replying to the query text according to the semantic features of the predicted text.

In an embodiment of the present application, a method for extracting a semantic feature of a predictive text of a current dialog turn according to a query text and historical dialog data is specifically disclosed, which includes steps S310 to S320, and fig. 3 schematically illustrates a flowchart of steps of the method for extracting the semantic feature of the predictive text in an embodiment of the present application.

In step S310: and extracting the characteristics of the query text and the historical dialogue data to obtain the current context characteristics in the current dialogue turn, and predicting the predicted context characteristics in the next dialogue turn according to the current context characteristics.

Step S310 includes two parts, the first part is to obtain the current context feature, as in step S410, and the second part is to obtain the predicted context feature, as in step S420, wherein the predicted context feature needs to be obtained based on the current context feature. The extraction of the predicted contextual features of step S310 requires steps S410-S420, and fig. 4 schematically illustrates a flowchart of the steps of the method for extracting the predicted contextual features according to an embodiment of the present application. The specific method is as follows.

In one embodiment of the present application, a method for performing feature extraction on query text and historical dialogue data to obtain current contextual features in a current dialogue turn includes:

step S410: converting the query text and the historical dialogue data into a feature vector; and inputting the historical dialogue data into the coding model in a feature vector mode to obtain the current contextual feature.

The coding model calculates context-related vector representations of all the contextual features in the historical dialogue data by using a bidirectional self-attention mechanism, and then takes the mean value of the context-related vector representations of all the contextual features as the current contextual features of the historical dialogue data;

wherein the coding model uses a Transformer model, the Transformer model may include a plurality of stack units, each stack unit including a plurality of forward layers and a multi-headed self-attention layer. Among them, the transform model is an Encoder-Decoder (encoding-decoding) model based on the attention mechanism entirely.

The application uses the attention mechanism of the Transformer model to determine the context of the human-computer interaction session. The core idea of attention mechanism is to calculate the correlation of each word in a sentence to all words in the sentence, and then consider the correlation between the words to reflect the relevance and importance degree between different words in the sentence to some extent. By using these correlations to adjust the importance (weight) of each word, a new expression of the entire word can be obtained. The new representation not only contains the word itself, but also contains the relationship between other words and the word, and the new expression can be represented as the current context of the dialog, so that the new representation is a more global expression compared with a simple word vector. According to the method, after converting historical dialogue data and query texts into feature vectors, inputting the feature vectors into a coding model, identifying multiple rounds of man-machine interaction dialogue data in the historical dialogue data through a Transformer model, and then outputting the vectors corresponding to the current context features of the historical dialogue data through an attention mechanism of the Transformer model. Thus, the current context can be represented using the current context features. Therefore, the answer made by the man-machine interaction can be more suitable for the requirements of the user by effectively combining the context.

For example, the dialog content of the historical dialog data is: the user asks: today weather, artificial intelligence devices answer: today, the weather is sunny, and the temperature is 25-30 degrees; the user continues to ask: suitable to go where to play, artificial intelligence equipment answers: the user can go to a small mountain or a lake beside to enjoy the landscape; the user continues to ask: what mountains have is fun, and the artificial intelligence device answers: the hill has a viewing platform, so that the city can be seen. The Transformer model identifies the attention mechanism according to the historical dialogue data, calculates the interrelation of each word in a sentence to all words in the sentence, and then considers that the interrelation between the words reflects the relevance and importance degree between different words in the sentence to a certain extent. Finally, it is recognized that the current context of the user is a series of questions and answers around "play", and then the "play" corresponds to the current context features. The artificial intelligence device can perform multi-level answer and recommendation based on the current context feature of 'play' so that the user can obtain more information and the user experience is improved. For example, when the artificial intelligence device knows that the user is "playing" the current contextual feature, the artificial intelligence device may also recommend tickets played from various places, obtain relevant information about the "playing" of the current contextual feature for sharing by strategies played by others, and so on. At this moment, the user can obtain more information wanted, thereby greatly improving the experience of the user.

After the current context features are obtained, the predicted context features need to be extracted, and in one embodiment of the present application, a specific method for predicting the predicted context features in the next conversation turn according to the current context features includes:

step S420: acquiring all contextual features of historical dialogue data, and inputting all contextual features of the historical dialogue data into a machine learning model for training to obtain a model for predicting the contextual features; the current context features are input into a model of the predicted context features to derive predicted context features.

The historical dialogue data comprises all dialogue contents of human-computer interaction dialogue between the current user and the artificial intelligence device, and the dialogue contents are composed of a plurality of dialogues which can be in the same context or different contexts. After all the contextual features of the current user are acquired through the historical dialogue data, all the contextual features of the historical dialogue data are acquired, and then all the contextual features of the historical dialogue data are input into a machine learning model for training, wherein the machine learning model can be a model constructed based on a convolutional neural network, a cyclic neural network and the like. Here, the context features are obtained in the same manner as the context features are obtained in step S410. One context switching habit of the current user can be known through a machine learning model, so that a model for predicting the context characteristics can be obtained; the predicted contextual features can then be derived by inputting the current contextual features into a model of the predicted contextual features. The predicted contextual feature is the next contextual feature corresponding to the current contextual feature in the model of predicted contextual features.

For example, 200 sessions are obtained through historical session data, and the current session contents between the user and the artificial intelligence device are in session, and all the contextual features in the 200 sessions are obtained by using a Transformer model respectively. For example, the first 20 sessions of the historical session data are queries of the current user for weather with the artificial intelligence device and familiarity with the corresponding session, and then the context corresponding to the 20 sessions is "weather". When the 21 st to 60 th dialogs of the historical dialog data are dialogs of the current user making a travel recommendation query with the artificial intelligence device, the context corresponding to the 21 st to 60 th dialogs is the "place of play". When the 61 st to 100 th dialogs of the historical dialog data are dialogs of the current user for inquiring the travel ticket information and the air ticket information with the artificial intelligence device, the context corresponding to the 61 st to 100 th dialogs is 'amount of money spent playing'. When the 101 st to 150 th dialogs of the historical dialog data are dialogs of the current user for holiday query with the artificial intelligence device, the context corresponding to the 101 st to 150 th dialogs is "holiday"; and so on, wherein the contexts are derived using a Transformer model. Therefore, when the machine learning model performs convolutional neural network training on enough historical dialogue data of the current user, a context change situation of the current user can be obtained, for example, the current user changes to a playing place after chatting. In this case, the current contextual features can be input into a model of the predicted contextual features to derive the predicted contextual features. For example, the current contextual feature is "weather", and the context after discovering by the machine learning model that the context is "weather" in all historical dialog data of the current user is "playground"; at this point, the predictive contextual feature may be defined as "play location". The above text data is represented in a vector manner in a Transformer model and a machine learning model.

In step S320: and determining the semantic features of the predicted text of the current conversation turn according to the feature difference between the predicted contextual features and the current contextual features.

The semantic features of the predicted text are feature differences between the predicted contextual features and the current contextual features, and the feature differences can be understood as context changes between the predicted contextual features and the current contextual features. A simple human-computer interaction dialog logic of the current user can be predicted by predicting context changes between the contextual features and the current contextual features, e.g. the context after the context "weather" is "place of play" when a certain user is talking as mentioned in step S420. The purpose of extracting the semantic features of the predicted text is to obtain the context change of the user through prediction to a certain extent, so that the real context change corresponding to the semantic features of the subsequent real text is compared, and the final man-machine interaction conversation quality evaluation is carried out. Since the semantic features of the predicted text are the predicted contextual features which are all expressed in vectors, the semantic features of the predicted text are the vector difference between the corresponding predicted contextual features and the current contextual features.

The semantic features of the predicted text are obtained through the steps, so that the semantic features of the real text are further combined and compared to obtain the final human-computer interaction dialogue quality evaluation. In order to obtain the semantic features of the real text, the reply text of the reply query text needs to be obtained first, and a specific extraction mode of the reply text of the reply query text is as follows.

In one embodiment of the present application, a method for generating a reply text for replying to a query text according to semantic features of a predicted text is specifically disclosed, and the method includes steps S510-S520. Fig. 5 schematically shows a flowchart of steps of an extraction method of a reply text in an embodiment of the present application, and the specific steps are as follows.

Step S510: acquiring historical dialogue data, inputting all question and answer data of the historical dialogue data into a machine learning model for training to obtain a model for predicting a reply text;

the historical dialogue data comprises all dialogue contents of a current user in man-machine interaction dialogue with the artificial intelligence device, and the dialogue contents are composed of a plurality of dialogues. All dialogue contents of the historical dialogue data are acquired and then input into a machine learning model for training, wherein the machine learning model can be a model constructed based on a convolutional neural network, a cyclic neural network and the like. The conversation content between the current user and the artificial intelligence device can be known through the machine learning model, so that the model of the predicted reply text can be obtained. Of course, a large amount of historical dialogue data is required for the machine learning model. A large amount of historical dialogue data is learned through a machine learning model, and then a model for predicting the reply text can be obtained.

Step S520: and inputting the query data to be replied and the semantic features of the predicted text into a model of the predicted reply text to obtain the reply text.

And after obtaining the model of the predicted reply text, inputting the query data to be replied and the semantic features of the predicted text into the model of the predicted reply text to obtain the reply text.

For example, the query text input by the current user is "what there is something funny nearby", the answer of the machine learning model to "what there is something funny nearby" corresponding to the learned historical dialogue data is mostly "west hills or east lakes can enjoy scenery", and then the corresponding reply text is "west hills or east lakes can enjoy scenery".

The reply text replying the query text is obtained through the steps, but the semantic features of the real text are not extracted yet, and the extraction method of the semantic features of the real text is as follows.

Step S230 includes extracting the real text semantic features of the current conversation turn according to the reply text, the query text and the historical conversation data;

in an embodiment of the present application, specifically, the step of extracting the semantic features of the real text of the current dialog turn according to the reply text, the query text and the historical dialog data is disclosed, which includes steps S610-S620, and fig. 6 schematically illustrates a flow chart of steps of the extraction method of the semantic features of the real text in an embodiment of the present application, and specifically includes the following steps.

Step S610: extracting features of the reply text, the query text and the historical dialogue data to obtain real contextual features in the next dialogue turn;

in an embodiment of the present application, a method for extracting features from a reply text, a query text, and historical dialogue data to obtain real contextual features in a next dialogue turn is specifically disclosed, which includes steps S710 to S720, and fig. 7 schematically illustrates a flowchart of steps of a method for extracting real contextual features in an embodiment of the present application, and specifically includes the following steps.

Step S710: the reply text, the query text, and the historical dialog data are combined to form real dialog data.

When the artificial intelligence device replies to the query text of the current user, the reply text is obtained. The above question-answer information is combined with the historical dialogue data to form real dialogue data, and the real dialogue data is the updated dialogue data. For example, 200 pieces of historical dialogue data are existed in the past, then the current user obtains the reply text after inquiring through the query text, and if the query text and the reply text correspond to 10 pieces of dialogue, the real dialogue data is 210 pieces of dialogue data obtained by including 200 pieces of historical dialogue data and 10 pieces of new dialogue information.

Step S720: and inputting the real dialogue data into the coding model to obtain the real contextual characteristics.

Wherein the coding model calculates context-dependent vector representations of all contextual features in the real dialogue data using a bidirectional self-attention mechanism, and then takes the mean of the context-dependent vector representations of all contextual features as the real contextual features of the real dialogue data.

The coding model and the method used herein are the same as those in step S410, and the real context features can be obtained by the coding model. The real contextual features represent the context of the real dialog data formed when the reply text, the query text and the historical dialog data are combined. The true contextual features correspond to the predicted contextual features, which are derived based on the prediction, and the true contextual features are derived based on the true reply text, both of which can thus be used to represent the context of the current user. After the true contextual features are derived, the true text semantic features can be derived by the following method.

Step S620: and determining the real text semantic features of the current conversation turn according to the feature difference between the real contextual features and the current contextual features.

The real text semantic features are feature differences between the real contextual features and the current contextual features, and the feature differences can be understood as context changes between the real contextual features and the current contextual features. Context changes made by the current user in reply text can be determined using context changes between the real context features and the current context features. The real text semantic features are vector differences of the corresponding real contextual features and the current contextual features.

For example, continuing with the example in step S420, the current user' S query text is "where to play", and the artificial intelligence device replies to "mall is active, better to play". Then, after combining these information with the historical dialogue data, real dialogue data is formed, and a real contextual feature is obtained through the real dialogue data, for example, the real contextual feature is "the playing place is the shopping mall"; then the semantic feature is the change from the current contextual feature "weather" to the real contextual feature "the place of play is a mall" for real text. And the semantic features of the predicted text mentioned in step S420 correspond to the change from the current contextual feature "weather" to the predicted contextual feature "the playing place is an outdoor environment". Therefore, at this time, there is a certain gap between the semantic features of the real text and the semantic features of the predicted text, which indicates that the context habit of the current user does not conform to the reply text, and the content of the reply text may not be satisfied by the current user, so the corresponding evaluation information of the quality of the human-computer interaction dialog will be a lower value.

After the semantic features of the real text and the semantic features of the predicted text are obtained, the two data are needed to be used for generating the dialogue quality evaluation information, and the specific method is as follows.

Step S240 includes generating dialogue quality evaluation information of the man-machine interaction dialogue according to the semantic features of the predicted text and the semantic features of the real text.

In an embodiment of the application, based on the above technical solution, generating dialog quality evaluation information of a human-computer interaction dialog according to a predicted text semantic feature and a real text semantic feature, including:

substituting the semantic features of the predicted text and the semantic features of the real text into a two-vector included angle cosine formula to obtain a vector included angle between the semantic features of the predicted text and the semantic features of the real text;

dividing the minimum value of the absolute value of the semantic features of the predicted text and the absolute value of the semantic features of the real text by the maximum value of the absolute value of the semantic features of the predicted text and the absolute value of the semantic features of the real text to obtain a difference value between the semantic features of the predicted text and the semantic features of the real text;

and multiplying the vector included angle between the semantic features of the predicted text and the semantic features of the real text by the difference value to obtain the dialogue quality evaluation information.

The specific calculation formula may be as follows:

wherein the semantic feature of the predicted text is I_k'; the semantic feature of the real text is I_k,(ii) a The dialogue quality evaluation information is s_k。

According to the following formula to s_kPerforming exponential operation to obtain dialogue quality evaluation information Flow Score, wherein s_kThe Flow Score represents information on the dialog quality evaluation, and specifically calculates a dialog quality evaluation value, and can evaluate the dialog quality according to the magnitude of the dialog quality evaluation value.

Wherein M represents the number of words of the dialogue, and the dialogue quality evaluation information is s_kThe range is [ -1,1 [)]，(s_k+1)/2 range is [0,1 ]]。

The dialogue quality evaluation information of the man-machine interaction dialogue can be directly calculated through the formula.

The content of the present application will be further explained by a specific application embodiment, and fig. 8 schematically shows a flow chart of steps of a data processing method of a man-machine interaction dialog in a specific application embodiment of the present application. Specifically comprising steps S810-S870.

Step S810: inputting historical dialogue information of man-machine interaction dialogue between the artificial intelligent equipment and a user in practical application and a query text input by the user into a preprocessing module for conversion, and converting the historical dialogue information and the query text input by the user into a feature vector;

step S820: transcoding the historical dialogue information converted into the vector and the query text input by the user through a Transformer model, and outputting a current context representation C of the historical dialogue information_k；

Step S830: characterizing the current context C_kInputting the predicted context characterization C 'into a first machine learning model trained in advance to predict'_k+1；

Step S840: characterizing C according to the current context_kAnd predictive context characterization C_k+1Computing predicted text semantic representations I_k’：I_k’＝C’_k+1－C_k；

Step S850: semantically characterizing predicted text I_kInputting the query text input by the user into a second machine learning module together to obtain a reply text based on the query text input by the user;

step S860: inputting the reply text of the query text based on the user input into the preprocessing model and the Transformer model, and outputting a real context representation C_k+1；

Step S870: characterizing the true context C_k+1Characterisation C with the current context_kMaking difference, and calculating to obtain real semantic representation I_kSemantic representation I with predicted text_k' the dialog quality evaluation information Flow Score is calculated using the formula corresponding to step S240 together. Indicating man-machine when the value of the dialogue quality evaluation information is largerThe higher the quality of the interactive dialog, the higher the rating. When the smaller the value of the dialogue quality evaluation information is, the lower the man-machine interaction dialogue quality is, the worse the evaluation is.

The application also discloses a data processing device 900 of the human-computer interaction session, and fig. 9 schematically shows a structural block diagram of the data processing device of the human-computer interaction session in one embodiment of the application. The method specifically comprises the following steps:

the data acquisition module 910 is configured to acquire a query text to be replied and historical dialogue data, where the historical dialogue data includes multiple rounds of dialogue data of a human-computer interaction dialogue generated before the query text;

the feature extraction module 920, where the feature extraction module 920 is connected to the data acquisition module 910, and is configured to extract semantic features of a predicted text of a current dialog turn according to the query text and historical dialog data, and generate a reply text for replying the query text according to the semantic features of the predicted text; the feature obtaining module 920 is further configured to extract a semantic feature of a real text of the current dialog turn according to the reply text, the query text, and the historical dialog data;

and an evaluation information generation module 930, wherein the evaluation information generation module 930 is connected to the feature extraction module 920 and is used for generating dialogue quality evaluation information of the human-computer interaction dialogue according to the semantic features of the predicted text and the semantic features of the real text.

Based on the above scheme, the feature extraction module 920 specifically includes: the context feature extraction unit is configured to perform feature extraction on the query text and the historical dialogue data to obtain current context features in the current dialogue turn; a prediction unit configured to predict a predicted context feature in a next conversation turn according to a current context feature; and a semantic feature extraction unit configured to determine a semantic feature of the predicted text of the current conversation turn according to a feature difference between the predicted contextual feature and the current contextual feature.

Based on the scheme, the context feature extraction unit is further configured to perform feature extraction on the reply text, the query text and the historical dialogue data to obtain real context features in the next dialogue turn; the semantic feature extraction unit is further configured to determine a real text semantic feature of the current dialog turn based on a feature difference between the real contextual feature and the current contextual feature.

Based on the above scheme, the feature extraction module 920 further includes an encoding unit configured to calculate context-related vector representations of all the contextual features in the historical dialogue data by using a bidirectional self-attention mechanism, and then take the mean of the context-related vector representations of all the contextual features as the current contextual feature of the historical dialogue data; the prediction unit is further configured to acquire all contextual features of the historical dialogue data and input all contextual features of the historical dialogue data into a machine learning model for training, so as to obtain a model of the predicted contextual features; the current context features are input into a model of the predicted context features to derive predicted context features.

Based on the scheme, the prediction unit is further configured to acquire historical dialogue data, input all question answering data of the historical dialogue data into a machine learning model for training, and obtain a model for predicting the reply text; and inputting the query data to be replied and the semantic features of the predicted text into a model of the predicted reply text to obtain the reply text.

Based on the scheme, the semantic feature extraction unit is further configured to combine the reply text, the query text and the historical dialogue data to form real dialogue data; inputting the real dialogue data into a coding model to obtain real context characteristics; the coding model calculates context-dependent vector representations of all contextual features in the real dialog data using a bidirectional self-attention mechanism, and then takes the mean of the context-dependent vector representations of all contextual features as the real contextual features of the real dialog data.

Based on the above scheme, the evaluation information generation module 930 includes a calculation unit configured to substitute the predicted text semantic features and the actual text semantic features into a two-vector included angle cosine formula to obtain a vector included angle between the predicted text semantic features and the actual text semantic features; the calculation unit is further configured to divide the minimum value of the absolute value of the semantic feature of the predicted text and the absolute value of the semantic feature of the real text by the maximum value of the absolute value of the semantic feature of the predicted text and the absolute value of the semantic feature of the real text to obtain a difference value between the semantic feature of the predicted text and the semantic feature of the real text; the calculation unit is further configured to multiply a vector included angle between the semantic features of the predicted text and the semantic features of the real text by the difference value to obtain the dialogue quality evaluation information.

The specific details of the object searching apparatus provided in each embodiment of the present application have been described in detail in the corresponding method embodiment, and are not described herein again.

According to an aspect of an embodiment of the present application, there is provided an electronic apparatus including: a processor; and a memory for storing executable instructions for the processor; wherein the processor is configured to execute the data processing method of the human-computer interaction dialog as in the above technical solution via executing the executable instructions.

Fig. 10 schematically shows a block diagram of a computer system of an electronic device for implementing an embodiment of the present application.

It should be noted that the computer system 1000 of the electronic device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 10, the computer system 1000 includes a Central Processing Unit (CPU) 1001 that can perform various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. In the random access memory 1003, various programs and data necessary for system operation are also stored. The cpu 1001, the rom 1002, and the ram 1003 are connected to each other via a bus 1004. A query/Output interface 1005(Input/Output interface, i.e., I/O interface) is also connected to the bus 1004.

The following components are connected to the query/output interface 1005: a query section 1006 including a keyboard, mouse, etc.; an output section 1007 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 1008 including a hard disk and the like; and a communication section 1009 including a network interface card such as a local area network card, modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 is also connected to the query/output interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.

In particular, according to embodiments of the present application, the processes described in the various method flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication part 1009 and/or installed from the removable medium 1011. When the computer program is executed by the cpu 1001, various functions defined in the system of the present application are executed.

The conversation quality evaluation information of the man-machine interaction conversation is generated by predicting the semantic features of the text and the semantic features of the real text, and the predicted semantic features of the text and the semantic features of the real text correspond to the change conditions of the contexts, so the conversation quality evaluation information of the man-machine interaction conversation generated by the application is combined with the specific contexts, the conversation conditions of the man-machine interaction can be reflected more objectively, the evaluation is more practical, the user experience in the man-machine interaction conversation is enhanced, and the optimization and improvement of the man-machine interaction conversation by technicians can be facilitated according to the conversation quality evaluation information of the man-machine interaction conversation. For example, artificial intelligence equipment with low conversation quality evaluation of human-computer interaction conversation can be eliminated or subjected to machine learning again, so that elimination and restoration of poor conversation quality evaluation of human-computer interaction are realized, and industrial development of human-computer interaction is effectively promoted.

It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A data processing method of a human-computer interaction dialogue is characterized by comprising the following steps:

2. The data processing method of human-computer interaction dialogue according to claim 1, wherein extracting semantic features of predictive text of a current dialogue turn according to the query text and the historical dialogue data comprises:

extracting features of the query text and the historical dialogue data to obtain current context features in the current dialogue turn, and predicting the predicted context features in the next dialogue turn according to the current context features;

and determining the semantic features of the predicted text of the current conversation turn according to the feature difference between the predicted contextual features and the current contextual features.

3. The data processing method of human-computer interaction dialog of claim 1, wherein extracting real text semantic features of a current dialog turn from the reply text, the query text and the historical dialog data comprises:

extracting features of the reply text, the query text and the historical dialogue data to obtain real contextual features in the next dialogue turn;

and determining the real text semantic features of the current conversation turn according to the feature difference between the real contextual features and the current contextual features.

4. The data processing method of human-computer interaction dialogue according to claim 3, wherein the performing feature extraction on the reply text, the query text and the historical dialogue data to obtain real contextual features in a next dialogue turn comprises:

combining the reply text, the query text and the historical dialogue data to form real dialogue data;

inputting the real dialogue data into a coding model to obtain real context characteristics; the coding model uses a bidirectional self-attention mechanism to calculate context-dependent vector representations of all contextual features in the real dialogue data, and then takes the mean of the context-dependent vector representations of all contextual features as the real contextual features of the real dialogue data.

5. The method for processing data of human-computer interaction dialog according to claim 2, wherein performing feature extraction on the query text and the historical dialog data to obtain current contextual features in a current dialog turn, and predicting predicted contextual features in a next dialog turn based on the current contextual features comprises:

converting the query text and the historical dialogue data into feature vectors;

inputting the historical dialogue data into a coding model in a feature vector form to obtain current contextual features, wherein the coding model calculates context-related vector representations of all the contextual features in the historical dialogue data by using a bidirectional self-attention mechanism, and then takes the mean value of the context-related vector representations of all the contextual features as the current contextual features of the historical dialogue data;

acquiring all contextual features of historical dialogue data, and inputting all contextual features of the historical dialogue data into a machine learning model for training to obtain a model for predicting the contextual features;

and inputting the current contextual features into the model of the predicted contextual features to obtain predicted contextual features.

6. The data processing method of human-computer interaction dialogue according to claim 2, wherein generating a reply text for replying to the query text according to the predicted text semantic features comprises:

acquiring historical dialogue data, inputting all question and answer data of the historical dialogue data into a machine learning model for training to obtain a model for predicting a reply text;

and inputting the query data to be replied and the semantic features of the predicted text into a model of the predicted reply text to obtain a reply text.

7. The data processing method of human-computer interaction dialog of claim 1, wherein generating dialog quality evaluation information of the human-computer interaction dialog based on the predicted text semantic features and the actual text semantic features comprises:

and multiplying a vector included angle between the semantic features of the predicted text and the semantic features of the real text by a difference value to obtain dialogue quality evaluation information.

8. A data processing apparatus for human-computer interaction dialog, comprising:

9. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the data processing method of a human-computer interaction dialog of any one of claims 1 to 7.

10. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the data processing method of a human-computer interaction dialog of any of claims 1 to 7 via execution of the executable instructions.