CN113220856A

CN113220856A - Multi-round dialogue system based on Chinese pre-training model

Info

Publication number: CN113220856A
Application number: CN202110588492.0A
Authority: CN
Inventors: 孙迎超; 陈世展; 冯志勇; 薛霄; 吴洪越
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2021-08-06

Abstract

The invention relates to a multi-round dialogue system based on a Chinese pre-training model, which comprises: a data processing module; the retrieval module is used for retrieving the question and answer data set by utilizing the vector inner product similarity; the generating module is used for improving a Chinese pre-training model NEZHA pre-training model so as to enable the Chinese pre-training model NEZHA to be used for generating tasks; the compression module is used for carrying out knowledge distillation on the NEZHA model of the generation module by using a replaceable strategy; and the dialogue management module is used for managing high-frequency user problems.

Description

Multi-round dialogue system based on Chinese pre-training model

Technical Field

The invention belongs to the field of intelligent dialogue, and mainly relates to a multi-round dialogue system based on a Chinese pre-training model.

Background

With the rapid development of deep learning technology and network technology, data-driven models are becoming more and more popular. Building a human-like dialog agent is considered one of the most challenging tasks in artificial intelligence. For a task-specific dialog system, it can be regarded as a continuous decision process. It relies on a large amount of information to continue the conversation, such as conversation context, intentions, external knowledge, common sense, emotions, participant's background and personas, etc. All of this can have an effect on the response in the conversation, and these uncertainties make the conversation extremely daunting.

In addition, as computer computing power is improved, a large amount of real conversation data is generated in daily life, and a large pre-trained neural network model (such as NEZHA) is greatly developed, so that a plurality of natural language understanding tasks based on the real conversation data even exceed the human level. However, at present, the task-based multi-turn dialog class is created directly based on the Chinese pre-training neural network, so that relatively few tasks are required, and the dialog quality and diversity are to be improved.

For a dialog system in a specific scene (such as an e-commerce platform), dialog data in the field is often focused, and although the dialog system can solve the problem of high-frequency users, the dialog system is limited by the dialog data and lacks generalization and semantic understanding ability for some long-tail problems. Although the end-to-end model has become a hot spot in current research, we still need to rely on the traditional pipelined dialog system in the actual dialog system, especially in the pre-heating stage of a new domain.

The retrieval type and the generation type dialogue systems have different realization principles and have respective advantages and disadvantages. While a search-based dialog can provide more fluid and relevant responses, a generative dialog can model more complex contextual semantics (e.g., user emotions). Thus, the present invention explores a combination of search and generation strategies to seek better dialog system performance.

Reference documents:

[1] ScaNN vector search tool:

https://github.com/google-research/google-research/tree/master/scann

[2]Wei J,Ren X,Li X,et al.NEZHA:Neural contextualized representation for chinese language understanding[J].arXiv preprint arXiv:1909.00204,2019.

[3]Guo R,Sun P,Lindgren E,et al.Accelerating large-scale inference with anisotropic vector quantization[C].In International Conference on Machine Learning,2020:3887–3896.

[4]Vaswani A,Shazeer N,Parmar N,et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:6000-6010.

disclosure of Invention

The invention aims to provide a multi-round dialogue system based on a Chinese pre-training model, which adopts the following technical scheme:

a multi-turn dialogue system based on a Chinese pre-training model comprises:

the data processing module is used for segmenting the dialogue data by a proper method, removing stop words, replacing English punctuations, constructing a question-answer data set and constructing a sentence vector index database, the concrete process of constructing the sentence vector index database is as follows,

firstly, segmenting words of user problems by using a Jieba Chinese word segmentation tool;

secondly, loading the open-source Chinese word vectors, and mapping the user problem after word segmentation into a plurality of word vectors;

thirdly, carrying out weighted summation and averaging on a plurality of word vectors to convert the word vectors into sentence vectors, and constructing a user question sentence vector index database;

the retrieval module retrieves the question-answer data set by utilizing the similarity of the vector inner product, and the specific process is as follows,

firstly, segmenting words of a user input problem by using a Jieba Chinese word segmentation tool;

secondly, inputting the user question after word segmentation into a FastText text classification model for user intention identification, and judging whether the question belongs to chatting;

thirdly, if the question belongs to chatting, adding the historical dialogue questions of the last two rounds as the user questions;

fourthly, converting the user problem into a sentence vector, and calculating the similarity of vector inner products with a sentence vector index database constructed by a data processing module by using a ScaNN vector retrieval tool;

fifthly, obtaining a plurality of candidate answers corresponding to the question with the highest similarity score;

sixthly, inputting the question and a plurality of candidate answers into a Chinese pre-training language model NEZHA for reordering to obtain the answer with the highest score;

the generation module improves a Chinese pre-training model NEZHA pre-training model to enable the Chinese pre-training model NEZHA pre-training model to be used for generating tasks, and comprises the following specific processes,

loading a question and answer data set obtained by a data processing module into a generating module, and loading a pre-training weight of a Chinese pre-training model NEZHA for training a generating model aiming at the question and answer data;

secondly, recording different roles in the question and answer data, such as the words of the user as all 0, and the words of the customer service as all 1, and embedding the words as paragraphs;

thirdly, according to the paragraph embedding in the second step, obtaining the length m of the question-answer data, and then constructing a self-attention matrix with m rows and m columns;

setting the upper triangular part of the self-attention matrix in the third step as- ∞, and setting other position elements of the matrix as 0 as the input attention matrix of question-answer pairs;

fifthly, inputting the embedded sequence into a 12-layer transform network for training;

sixthly, generating 10 candidate replies by using topK random decoding;

the compression module utilizes an alternative strategy to carry out knowledge distillation on the NEZHA model of the generation module, and the specific process is as follows,

inputting a question and answer data set obtained by a data processing module, loading the NEZHA model weight finely adjusted by a generating module, and marking as a ancestor layer;

secondly, setting the probability rr of replacing the ancestor layer by the inheritor layer to be 0.5, and compressing the layer number of the ancestor layer to be half of the original layer number;

thirdly, continuing training by using the question-answer pairs constructed by the data processing module, and then generating candidate answers by using the compressed inheritor layer;

and the dialogue management module is used for managing high-frequency user problems.

The invention provides an e-commerce customer service dialogue system which combines retrieval and generation based on a Chinese pre-training model and is assisted by a task template. The current advanced pre-training model is combined with the dialogue system, and in order to improve the satisfaction degree of users, knowledge distillation is applied to the generation model, so that the reasoning efficiency of the pre-training model is optimized, and the system can achieve good performance in the aspects of reply generation quality and operation efficiency. The invention develops a new idea for combining a multi-turn dialogue system with the current advanced Chinese pre-training language model and makes a contribution to improving the reply generation quality and efficiency of the dialogue system.

Drawings

FIG. 1 is a multi-turn dialog system framework diagram;

FIG. 2 is a schematic diagram of a retrieval module;

FIG. 3 is a diagram of a pre-training model base composition transform encoder;

FIG. 4 is a schematic diagram of a generation module;

Detailed Description

The present invention will be described and demonstrated in further detail below with reference to experimental procedures and experimental results.

The invention designs a multi-round dialogue system based on a Chinese pre-training model. The system comprises: the device comprises a data processing module, a retrieval module, a generation module, a compression module and a conversation management module. The invention combines the current advanced pre-training model with the dialogue system, and applies knowledge distillation in the generating model in order to improve the satisfaction degree of users, thereby optimizing the reasoning efficiency of the pre-training model and leading the system to achieve good performance in the aspects of reply generation quality and operation efficiency. Fig. 1 presents the overall framework of the proposed dialog system. The key points of the specific technical scheme are divided into the following five parts:

(1) data processing module

The method provided by the patent firstly carries out word segmentation on the dialogue data through a proper method, removes stop words, replaces English punctuations, constructs a question-answer data set and constructs a sentence vector index database. The specific process of constructing the sentence vector index database is as follows,

firstly, using a Jieba Chinese word segmentation tool to segment the words of the user questions,

secondly, loading the Chinese word vectors of the open source, mapping the user questions after word segmentation into a plurality of word vectors,

and thirdly, carrying out weighted summation and averaging on a plurality of word vectors to convert the word vectors into sentence vectors, and constructing a user question sentence vector index database.

(2) Retrieval module

The retrieval module mainly utilizes the similarity of the vector inner product to retrieve the question-answering data, and a main flow chart of the retrieval module is shown in figure 2,

firstly, the input problems of the user are segmented by utilizing a Jieba Chinese segmentation tool,

secondly, inputting the user question after word segmentation into a FastText text classification model for user intention identification, judging whether the question belongs to chatting,

third, if the question belongs to chat, add its last two rounds of historical dialogue questions to the back as user questions

Fourthly, converting the user problem into a sentence vector, and utilizing a ScaNN (1, 3) vector retrieval tool to calculate the similarity of the neighbors with a sentence vector index database constructed by a data processing module

The fifth step, get 10 candidate answers that the question that the degree of similarity score is the highest corresponds to

Sixthly, inputting the question and 10 candidate responses into Chinese pre-training language model NEZHA [2] for re-ranking to obtain the response with the highest score

(3) Generation module

For the generation module, we improve the improvement of the Chinese pre-training model NEZHA pre-training model to enable it to be used for generating tasks. FIG. 3 shows the basic components of the NEZHA encoder, the transform [4] module. As shown in FIG. 4, NEZHA is composed of 12 layers of transformers, and the specific process is as follows,

firstly, a question and answer data set obtained by a data processing module is loaded into a generating module, and the pre-training weight of a Chinese pre-training model NEZHA is loaded for training the generating model aiming at the question and answer data.

And secondly, recording different roles in the question and answer data, such as the words of the user as all 0, and the words of the customer service as all 1, and Embedding as paragraphs (Segment Embedding).

And thirdly, according to the paragraph embedding in the second step, obtaining the length m of the question-answer data, and then constructing a self-attention matrix with m rows and m columns.

And fourthly, setting the upper triangular part of the self-attention matrix in the third step as ∞ and setting other position elements of the matrix as 0 as the input attention matrix of the question-answer pair.

And fifthly, inputting the embedded sequence into a 12-layer transform network for training.

Sixth, 10 candidate replies are generated using topK random decoding

(4) Compression module

We used an alternative strategy to distill the knowledge of the NEZHA model of the generation module. The specific process is as follows,

firstly, inputting a question and answer data set obtained by a data processing module, loading the NEZHA model weight finely adjusted by a generating module, and marking as a ancestor layer.

And secondly, setting the probability rr of replacing the ancestor layer by the successor layer (Suc layer) to be 0.5, and compressing the layer number of the ancestor layer to be half of the original layer number.

And thirdly, continuing training by using the question-answer pairs constructed by the data processing module, and then generating candidate answers by using the compressed inheritor layer.

(5) Dialogue management module

The question and answer data set constructed by the data processing module is provided with a plurality of high-frequency user questions (such as goods returns, order modification, price protection and the like), and the questions are processed by corresponding flow answers. As shown in fig. 1, we collate these problems and add a task dialog module. The specific process is as follows,

firstly, after the system preprocesses the user input question, firstly, the user input question is input to a task module for task matching, if a response task is matched, the system directly returns a response corresponding to a predefined template

And secondly, if the predefined task template is not matched, executing a subsequent retrieval module.

And thirdly, if the semantic matching score in the candidate answer obtained by the retrieval module is greater than the set threshold value of 0.5, the retrieval is successful, and the candidate answer with the highest score is returned.

And fourthly, if the candidate reply score of the retrieval module is smaller than the threshold value of 0.5, executing the generation module, reordering the generated candidate and the retrieval candidate together, and returning the reply with the highest score.

And selecting dialogues with session sessions more than 2 rounds in the original data set, and selecting three rounds of dialogues for training the model. In addition, we filter out sessions with a number of response words less than 4, because they tend to be generic responses. The data partitioning is shown in table 1.

TABLE 1 Experimental database partitioning and basic cases thereof

As shown in table 2, we performed detailed experiments on the above components from the above 5 indexes, and compared the performances of the current mainstream model on the constructed data set respectively. In the table c _ a is an abbreviation of copy with attribute, i.e. attention and copy mechanism is introduced in seq2seq, l2r denotes the previously mentioned attention mask scheme of NEZHA as generative model, and t _ l denotes the knowledge distillation of generative model.

TABLE 2 comparison of model for each module

Our model has good results both on individual components and overall, compared to the traditional model, which also demonstrates the effectiveness of our designed system framework.

While the invention has been described in connection with the drawings, the invention is not limited to the specific embodiments described above, which are intended to be illustrative rather than limiting, and that many modifications may be made by those skilled in the art without departing from the spirit of the invention, which will fall within the scope of the appended claims.

Claims

1. A multi-turn dialogue system based on a Chinese pre-training model comprises:

sixthly, generating candidate replies by using topK random decoding;