CN117151228A

CN117151228A - Intelligent customer service system based on large model and knowledge base generation

Info

Publication number: CN117151228A
Application number: CN202311422521.1A
Authority: CN
Inventors: 王海龙; 姜华; 王兵
Original assignee: Shenzhen Dahe Chuangzhi Technology Co ltd; Shenzhen Dashu Xinke Technology Co ltd
Current assignee: Shenzhen Dahe Chuangzhi Technology Co ltd; Shenzhen Dashu Xinke Technology Co ltd
Priority date: 2023-10-31
Filing date: 2023-10-31
Publication date: 2023-12-01
Anticipated expiration: 2043-10-31
Also published as: CN117151228B

Abstract

The invention relates to the technical field of artificial intelligence, in particular to an intelligent customer service system generated based on a large model and a knowledge base. The system comprises: the system comprises a knowledge base construction unit, a question-answer large model construction unit and a client; the knowledge base construction unit is used for acquiring user questions and customer service response pairs from the training data and constructing a knowledge base; the question-answer large model construction unit is used for constructing a question-answer large model and comprises the following steps: a feature extraction subunit, a correlation capture subunit, a context coding subunit, an output prediction subunit and a parameter updating unit; the client is used for providing the client with input user questions, submitting the user questions to a question-answering model, generating customer service answers according to the input user questions, and returning the customer service answers to the client. The invention improves the quality and efficiency of customer service, improves the accuracy of automatic customer service answers, and reduces the operation cost.

Description

Intelligent customer service system based on large model and knowledge base generation

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to an intelligent customer service system generated based on a large model and a knowledge base.

Background

With the continuous development of technology, the field of artificial intelligence has made great progress. Artificial intelligence has been applied in various fields, one of which is an intelligent customer service system. Conventional customer service systems often rely on manual operations, requiring significant human resources and time to address user consultation and problems. With the brand-new corner of the intelligent customer service system, users can enjoy more efficient and convenient customer service experience.

However, some intelligent customer service systems still have problems and limitations that require continuous improvement and innovation. In conventional intelligent customer service systems, a rule-based approach is typically employed to answer the user's questions. This approach requires the manual writing of a large number of rules and templates to handle the various possible user queries. Such systems present limitations in facing complex, diverse user problems, require constant maintenance and updating of rule bases, and are costly.

Another problem is that conventional intelligent customer service systems have limited ability to understand user intent and provide accurate answers. While some systems use natural language processing techniques and machine learning algorithms to improve performance, in real world situations, the problems posed by users tend to be diverse and require more advanced approaches to address. Conventional systems lack deep semantic understanding capabilities, resulting in their inability to truly understand the user's questions, and can only provide standard answers based on surface information.

In addition, conventional intelligent customer service systems often lack knowledge base support. The knowledge base is a database that stores a large number of question and answer pairs that help the system answer the user's questions better. However, knowledge base construction and maintenance processes for many existing systems are relatively difficult, resulting in limited quality and utility of the knowledge base.

Disclosure of Invention

The invention mainly aims to provide the intelligent customer service system based on the large model and the knowledge base, which improves the quality and efficiency of customer service, improves the accuracy of automatic customer service answers and reduces the operation cost.

In order to solve the problems, the technical scheme of the invention is realized as follows: an intelligent customer service system generated based on a large model and a knowledge base, the system comprising: the system comprises a knowledge base construction unit, a question-answer large model construction unit and a client; the knowledge base construction unit is used for acquiring user question and customer service answer pairs from training data, respectively encoding the user question and the customer service answer in the user question and customer service answer pairs into word embedding representations, and constructing a knowledge base based on the word embedding representations of the user question and the word embedding representations of the corresponding customer service answers; the question-answer large model construction unit is used for constructing a question-answer large model and comprises the following steps: a feature extraction subunit, a correlation capture subunit, a context coding subunit, an output prediction subunit and a parameter updating unit; the feature extraction subunit is used for extracting feature representations of word embedding representations of user questions and word embedding representations of customer service answers in the knowledge base; the association capturing subunit is used for introducing an attention mechanism and calculating an attention score matrix between the word embedding representation of the user questions and the word embedding representation of the corresponding customer service answers in the knowledge base; a context coding subunit, configured to code context information of the word embedding representation of the user question and the word embedding representation of the customer service answer according to the attention score moment, and obtain a hidden state of the word embedding representation of the user question and a hidden state of the word embedding representation of the corresponding customer service answer according to the context information; the output prediction subunit is used for predicting the starting position and the ending position of the embedded representation of the customer service answer word based on the hidden state of the embedded representation of the word of the user question and the hidden state of the embedded representation of the word of the corresponding customer service answer, and generating the word embedded representation of the predicted customer service answer; the parameter updating unit is used for measuring the difference between the word embedded representation of the predicted customer service answer and the word embedded representation of the customer service answer in the knowledge base by using cross entropy loss according to the starting position and the ending position of the word embedded representation of the predicted customer service answer, calculating total loss according to the difference, and updating parameters of the feature extraction subunit, the association capturing subunit, the context coding subunit and the output prediction subunit with the aim of minimizing the total loss function to complete the construction of a question-answer large model; the client is used for providing the client with input user questions, submitting the user questions to a question-answering model, generating customer service answers according to the input user questions, and returning the customer service answers to the client.

Further, the knowledge base construction unit is set to obtain the user question and customer service answer pair from the training dataWhereinIndicating a problem with the user and,representing customer service answers; mapping user questions and customer service answers to a continuous vector space using a pre-trained Word2Vec Word embedded representation model to encode the user questions and customer service answers into Word embedded representations, respectively; wherein word embedded representation of user questionsThe method comprises the steps of carrying out a first treatment on the surface of the Wherein,is a question of the userSubject No.Word embedding vectors for individual words; word embedding representation of customer service answers asThe method comprises the steps of carrying out a first treatment on the surface of the Wherein,is the first in customer service answersWord embedding vectors for individual words;taking the subscript as a positive integer, taking the lower boundary of the value range as 1, and taking the upper boundary of the value range as the number of user questions and customer service response pairs;for subscript, the value is a positive integer, the lower boundary of the value range is 1, and the upper boundary of the value range is word embedded representation of the user problemThe number of words in (a);the subscript is a positive integer, the lower boundary of the value range is 1, and the upper boundary of the value range is the word embedding representation of customer service answers, which is expressed asThe number of words in (a) is provided.

Further, the feature extraction subunit extracts a feature representation of the word embedded representation of the user question and the word embedded representation of the customer service answer in the knowledge base using a modified residual neural network expressed as

；

Wherein,embedding a representation for a word of a user question;a weight matrix for a first residual block of the improved residual neural network;a bias term for the first residual block;outputting a characteristic representation of the representation for the first residual block, embedding a characteristic representation of the representation for the word of the customer service answer;for correcting the linear function, the method is a nonlinear activation function, and the negative value of each element is changed to zero;a weight matrix for the second residual block for linear transformation;bias term for the second residual block;a feature representation of the word embedded representation of the customer service answer to the user question;to embed user-question words into a representationWord embedded representation of user questions extracted via residual neural networkAdding the resulting word embedded representation of the updated user question.

Further, the association capture subunit calculates a concentration score between the word embedded representation of the user question and the word embedded representation of the corresponding customer service answer in the knowledge base using the formula

；

Wherein,the method comprises the steps of carrying out a first treatment on the surface of the Wherein a attention score matrix is used.

Further, the encoding subunit encodes the word embedded representation of the user question and the context information of the word embedded representation of the customer service answer according to the attention score matrix using the following formula:

；

Wherein,embedding context information for the representation for the word of the user question;words answered for customer service are embedded with representation context information.

Further, the coding subunit uses the following formula to calculate, according to the context information, a hidden state of the word embedding representation of the user problem:

；

wherein,updating gates in time steps for problemsIs a probability value between 0 and 1, indicating the probability of preserving the problematic memory status of the previous time step;is a Sigmoid function;for a weight matrix used to calculate problem memory gates;embedding a hidden state of the representation for the word of the user question of the previous time step;resetting the gate at time steps for problemsIs a probability value between 0 and 1, representing the probability of preserving the hidden state of the word embedding representation of the user question for the previous time step;resetting the weight matrix of the gate for the calculation of the problem;to be in a time stepA hidden state of the word embedding representation of the candidate user question;as a hyperbolic tangent function;embedding a weight matrix of the hidden states of the representation for the words used to calculate the candidate user questions;to be in a time stepHidden state of word embedded representation of user questions of (a) is updated according to the questions and word embedded representation of candidate user questions Hidden state the new word of the user question is calculated to embed the hidden state of the representation.

Further, the coding subunit calculates, according to the context information, a hidden state of the word embedded representation of the customer service answer by using the following formula:

；

wherein, among them,in time steps for answering update doorIs a probability value between 0 and 1, indicating the probability of retaining the answer memory state of the previous time step;is a Sigmoid function;for a weight matrix used to calculate the answer memory gate;embedding a hidden state of the representation for the word of the customer service answer of the previous time step;in time steps for resetting the gate in replyIs a probability value between 0 and 1, representing the probability of preserving the hidden state of the word embedding representation of the customer service answer of the previous time step;resetting the weight matrix of the gates for the calculation of the answers;to be in a time stepA hidden state of the word embedding representation of the candidate customer service answer of (a);as a hyperbolic tangent function;embedding a weight matrix of the hidden state of the representation for the word used to calculate the candidate customer service answer;to be in a time stepThe hidden state of the word embedded representation of the customer service answer is calculated from the hidden states of the word embedded representation of the answer update gate and the candidate customer service answer.

Further, the output prediction subunit predicts the starting position of the customer service answer word embedding representation based on the hidden state of the word embedding representation of the user question and the hidden state of the word embedding representation of the corresponding customer service answer using the following formula:

；

wherein,is a reset gate;to update the gate for controlling whether to pass the memory state to the next time step;embedding probability distribution of the end position of the representation for the customer service answer words;is in a temporary hidden state;embedding probability distribution of the represented starting position for the customer service answer words; and selecting the initial position of the embedded representation of the customer service answer word with the highest probability as the initial position of the embedded representation of the predicted customer service answer word.

Further, the output prediction subunit predicts the end position of the customer service answer word embedding representation based on the hidden state of the word embedding representation of the user question and the hidden state of the word embedding representation of the corresponding customer service answer using the following formula:

。

the intelligent customer service system based on the large model and the knowledge base has the following beneficial effects: in the feature extraction subunit, the invention adopts an improved residual neural network, and extracts the feature representation of the user questions and customer service answers in the knowledge base by introducing a nonlinear activation function and residual connection. The technology enables the system to better capture the association information between the questions and the answers, and improves the accuracy of question understanding. Compared with the traditional feature extraction method, the method is more efficient and can handle more complex problems. The association capture subunit uses an attention mechanism to calculate an attention score matrix between the user questions and customer service answers. This allows the system to better understand the relationship between the question and the answer, improving the relevance and quality of the answer. Conventional customer service systems often fail to capture such complex associated information, resulting in inaccurate answers. The association capture technique of the present invention changes this situation and provides higher quality question answers. In the context encoding subunit, the system encodes context information for user questions and customer service answers based on the attention score matrix. This helps the system to better understand the context of questions and answers, providing more accurate answers. The context-coding technique improves the traditional static answer generation method so that the answer is more coherent and fluent. And the output prediction subunit predicts the initial position of the embedded representation of the customer service answer word by using a gating mechanism and a hyperbolic tangent function. The technology can improve the generation accuracy of the answers and ensure that the generated answers are matched with the questions. Conventional answer prediction methods often do not take into account the association between questions and answers, and irrelevant answers are easily generated. The present invention significantly ameliorates this problem through the use of gating mechanisms and nonlinear activation functions. The parameter updating unit uses the cross entropy loss to measure the difference of the word embedded representation of the predicted customer service answer from the word embedded representation of the customer service answer in the knowledge base. Based on the differences, the total loss is calculated and the parameters of the individual subunits are updated with the goal of minimizing the total loss function. This technique ensures continuous learning and improvement of the system, enabling it to continuously adapt to new questions and knowledge base content, providing higher quality customer service.

Drawings

FIG. 1 is a schematic diagram of a system architecture of an intelligent customer service system generated based on a large model and a knowledge base according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a question-answer large model building unit of an intelligent customer service system based on a large model and knowledge base generation according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

The following will describe in detail.

Example 1: referring to fig. 1 and 2, an intelligent customer service system generated based on a large model and a knowledge base, the system comprising: the system comprises a knowledge base construction unit, a question-answer large model construction unit and a client; the knowledge base construction unit is used for acquiring user question and customer service answer pairs from training data, respectively encoding the user question and the customer service answer in the user question and customer service answer pairs into word embedding representations, and constructing a knowledge base based on the word embedding representations of the user question and the word embedding representations of the corresponding customer service answers; the question-answer large model construction unit is used for constructing a question-answer large model and comprises the following steps: a feature extraction subunit, a correlation capture subunit, a context coding subunit, an output prediction subunit and a parameter updating unit; the feature extraction subunit is used for extracting feature representations of word embedding representations of user questions and word embedding representations of customer service answers in the knowledge base; the association capturing subunit is used for introducing an attention mechanism and calculating an attention score matrix between the word embedding representation of the user questions and the word embedding representation of the corresponding customer service answers in the knowledge base; a context coding subunit, configured to code context information of the word embedding representation of the user question and the word embedding representation of the customer service answer according to the attention score moment, and obtain a hidden state of the word embedding representation of the user question and a hidden state of the word embedding representation of the corresponding customer service answer according to the context information; the output prediction subunit is used for predicting the starting position and the ending position of the embedded representation of the customer service answer word based on the hidden state of the embedded representation of the word of the user question and the hidden state of the embedded representation of the word of the corresponding customer service answer, and generating the word embedded representation of the predicted customer service answer; the parameter updating unit is used for measuring the difference between the word embedded representation of the predicted customer service answer and the word embedded representation of the customer service answer in the knowledge base by using cross entropy loss according to the starting position and the ending position of the word embedded representation of the predicted customer service answer, calculating total loss according to the difference, and updating parameters of the feature extraction subunit, the association capturing subunit, the context coding subunit and the output prediction subunit with the aim of minimizing the total loss function to complete the construction of a question-answer large model; the client is used for providing the client with input user questions, submitting the user questions to a question-answering model, generating customer service answers according to the input user questions, and returning the customer service answers to the client.

Specifically, the feature extraction subunit is responsible for extracting word embedded representations of user questions and customer service answers from the knowledge base and converting them into feature representations. This process typically uses a pre-trained Word vector model, such as Word2Vec or BERT, to map words to a vector space, and then combine these vectors into feature representations of questions and answers. The main function of the feature extraction subunit is to convert the text data into a computer-processable numerical representation. These feature representations will be used in subsequent processing to calculate attention, encode context information, and generate answers. The association capture subunit introduces an attention mechanism that calculates an attention score matrix between the word embedded representations of the user questions and the word embedded representations of the corresponding customer service answers in the knowledge base. This matrix reflects the degree of association between the questions and answers, enabling the model to focus more on relevant information. The role of the association capture subunit is to enhance the model's understanding of the association between questions and answers. It allows the model to more specifically select appropriate information when generating the answer, thereby improving the quality and consistency of the answer. The context encoding subunit encodes context information for user questions and customer service answers based on the attention score matrix and the feature representation. This typically involves capturing context information in a text sequence using a model such as a Recurrent Neural Network (RNN) or a long short-term memory network (LSTM). The role of the context encoding subunit is to translate the feature representations of questions and answers into higher-level hidden states that contain semantic information and context dependencies of the text. This information will be used in the subsequent generation process to generate the answer. The output prediction subunit predicts a starting position and an ending position of the word embedding representation of the customer service answer based on the hidden state of the word embedding representation of the user question and the hidden state of the word embedding representation of the corresponding customer service answer, thereby generating an answer. The role of the output prediction subunit is to map the context-encoded hidden state to the word-embedded representation of the answer, thereby generating a complete answer. It determines the starting and ending positions of the answer, ensuring the accuracy and integrity of the answer.

The knowledge base construction unit is capable of extracting user questions and corresponding customer service answers from the training data and encoding them into word embedded representations. The key to this step is to build an ordered knowledge base so that subsequent question-answering large models can use this knowledge base to answer the user's questions. Compared with the prior art, the method allows the system to dynamically learn and update the knowledge base instead of the static predefined base, thereby improving the flexibility and accuracy of the customer service system.

Example 2: setting the knowledge base construction unit to obtain user questions and customer service answer pairs from training data asWhereinIndicating a problem with the user and,representing customer service answers; mapping user questions and customer service answers to a continuous vector space using a pre-trained Word2Vec Word embedded representation model to encode the user questions and customer service answers into Word embedded representations, respectively; wherein word embedded representation of user questionsThe method comprises the steps of carrying out a first treatment on the surface of the Wherein,is the first of the user questionsWord embedding vectors for individual words; word embedding representation of customer service answers asThe method comprises the steps of carrying out a first treatment on the surface of the Wherein,is the first in customer service answersWord embedding vectors for individual words;takes the subscript as a positive integer, and takes the lower boundary of the value range 1, the upper boundary of the value range is the number of user questions and customer service answer pairs;for subscript, the value is a positive integer, the lower boundary of the value range is 1, and the upper boundary of the value range is word embedded representation of the user problemThe number of words in (a);the subscript is a positive integer, the lower boundary of the value range is 1, and the upper boundary of the value range is the word embedding representation of customer service answers, which is expressed asThe number of words in (a) is provided.

Specifically, the basic principle of the word embedding model (wordeededingModel) is to map words to a continuous vector space such that semantically close words are closer together in the vector space. Word2Vec is one of the commonly used Word embedding models, which has two main algorithms: CBOW (ContinuousBagofWords) and Skip-gram. The following is the basic principle of Word2 Vec: the CBOW model attempts to predict the target word from the context word. It sums and averages the word embedding vectors of the context words and then trains through the neural network so that this sum-averaged vector best predicts the target word. The Skip-gram model is contrary to CBOW, which attempts to predict context words from target words. The Skip-gram model is trained through a neural network so that the target word can best predict the context words around it. Both models use neural networks, trained through large amounts of text data, to learn semantic relationships between words, and thereby convert words into continuous vector representations.

The word embedding model maps words to a continuous vector space that captures semantic relationships between words. This allows the model to better understand the meaning of the words and thus more accurately understand the semantics of the text. The word embedding model maps a high-dimensional vocabulary space to a low-dimensional continuous vector space, which is helpful for reducing the dimension of a natural language processing task and improving the calculation efficiency. The word embedding model learns the context information of the words through training, so that the context of the words can be better considered in a text understanding task, and the performance of the model is improved.

Example 3: the feature extraction subunit extracts a feature representation of the word embedded representation of the user question and the word embedded representation of the customer service answer in the knowledge base using an improved residual neural network expressed as

；

Wherein,embedding a representation for a word of a user question;a weight matrix for a first residual block of the improved residual neural network;a bias term for the first residual block;outputting a characteristic representation of the representation for the first residual block, embedding a characteristic representation of the representation for the word of the customer service answer;for correcting the linear function, the method is a nonlinear activation function, and the negative value of each element is changed to zero; A weight matrix for the second residual block for linear transformation;bias term for the second residual block;a feature representation of the word embedded representation of the customer service answer to the user question;to embed user-question words into a representationWord embedded representation of user questions extracted via residual neural networkAdding the resulting word embedded representation of the updated user question.

In particular, the improved residual neural network is based on the concept of residual blocks. The residual block contains multiple nerve layers, but with a "skip connection" between them, allowing information to be transferred between different layers. This jump connection solves the problem of gradient extinction in conventional deep neural networks, allowing deeper networks to be trained. Inside each residual block, a nonlinear transformation is introduced using a nonlinear activation function (typically a ReLU). These nonlinear transformations enable the network to capture complex patterns and features in the input data, improving expressive power. The improved residual neural network progressively extracts a characteristic representation of the input data through a plurality of residual blocks. Each residual block captures features of different levels of abstraction of the data at different levels. This helps identify key patterns and information in the data. The introduction of nonlinear activation functions (typically ReLU) enables the network to model more complex data relationships. This is very important for understanding the problem of having complex structures, as data in the real world often contains a variety of non-linear relationships. The use of skip connections helps to avoid the problem of gradient extinction, making the network more easy to train. Gradients can be effectively transferred back to the earlier layers of the network so that the deep network can also obtain effective gradient signals.

The calculation of the first residual block is shown. First, the user problemWord embedded representation of (a)And weight matrixMultiplying and then adding the bias termFinally, a ReLU activation function is applied. The purpose of this step is to introduce a nonlinear transformation and extract a low-level feature representation of the problem. The ReLU activation function goes negative to zero so that the network can capture nonlinear relationships in the problem, thereby better characterizing the semantic information of the problem.This equation represents the calculation of the second residual block. It is similar to the first residual block, the output of the first residual block is firstAnd weight matrixMultiplying and then adding the bias termFinally, the ReLU activation function is applied again. The effect of this step is to continue to introduce nonlinear transformations and further extract advanced feature representations of the problem. Through multi-layered nonlinear transformations, the network can progressively capture more complex semantic information in the problem.This formula represents embedding words of a user question into a representationAnd go through the second residualFeature representation for block extractionAdding to obtain a word embedded representation of the updated user question. The effect of this step is to combine the original word-embedded representation with the extracted feature information, thereby enhancing the problem's characterizability. Feature information The abstract features of the problem are included, semantic information of the problem can be enriched by adding the abstract features to the original representation, and understanding of the model on the problem is improved.

Example 4: an association capture subunit that calculates a concentration score between the word embedded representation of the user question and the word embedded representation of the corresponding customer service answer in the knowledge base using the following formula

；

Example 5: the coding subunit, according to the attention score matrix, uses the following formula to code the context information of the word embedded representation of the user question and the word embedded representation of the customer service answer as:

；

Example 6: the coding subunit uses the following formula to calculate the hidden state of the word embedded representation of the user problem according to the context information:

；

wherein,updating gates in time steps for problemsIs a probability value between 0 and 1, indicating the probability of preserving the problematic memory status of the previous time step;is a Sigmoid function;for a weight matrix used to calculate problem memory gates;embedding a hidden state of the representation for the word of the user question of the previous time step; Resetting the gate at time steps for problemsIs a probability value between 0 and 1, representing the probability of preserving the hidden state of the word embedding representation of the user question for the previous time step;resetting the weight matrix of the gate for the calculation of the problem;to be in a time stepA hidden state of the word embedding representation of the candidate user question;as a hyperbolic tangent function;embedding a weight matrix of the hidden states of the representation for the words used to calculate the candidate user questions;to be in a time stepThe hidden state of the word embedded representation of the user question is a hidden state of the word embedded representation of the new user question calculated from the hidden states of the word embedded representation of the question update gate and the candidate user questions.

In particular, problem update doorOutput based on Sigmoid function for controlling the time stepWhether or not the previous time step should be reservedAnd to what extent should be preserved. This is achieved by calculating a probability value between 0 and 1. In particular, the method comprises the steps of,is a weight matrix for calculating problem update gates,is a word embedded representation of the user's question,is the hidden state of the word embedded representation of the user question of the previous time step, Is context information. The problem update gate determines the problem memory state of the previous time step through the calculated probability valueHow much information is to be retained for updating the hidden state of the current time step. If it isApproaching 0 means that the model selectively ignores the problem memory state of the previous time step, i.e. forgets a part of the old information. If it isApproaching 1 means that the model is more prone to preserving the problem memory state of the previous time step, i.e. preserving important information for use at the current time step. The output of the problem update gate is used to update the hidden state of the user problem at the current time stepBy hiding the state of the previous time stepHidden status with candidate user questionsLinear combination is performed. This operation allows the model to rationally fuse old information with new information, thereby generating new hidden states.

Problem reset doorBased on the output of the Sigmoid function for deciding at a time stepWhether or not the previous time step should be reservedUser problems of (a)Hidden state of word embedded representation of (a)And to what extent should be preserved. In particular the number of the elements,is a weight matrix for computing problem reset gates, Is a word embedded representation of the user's question,is the hidden state of the word embedded representation of the user question of the previous time step,is context information. The problem reset gate determines the hidden state of the word embedded representation of the user problem for the previous time step by the calculated probability valueHow much information is to be retained for computing the hidden status of candidate user questions. If it isApproaching 0 means that the model selectively ignores the hidden state of the word embedded representation of the user's question for the previous time step, i.e. forgets unnecessary old information. If it isClose to 1 means that the model is more prone to preserve the hidden state of the word embedded representation of the user question of the previous time step, i.e. preserve useful old information in order to calculate the hidden state of the candidate user question. The output of the problem reset gate affects the hidden state of the computed candidate user problemBy concealing the state of the previous time stepAnd the elements of the hidden state of the word embedded representation of the candidate user question, thereby adjusting the impact of the old information on the new information. Problem reset doorHelping the model to keep and forget old information in time step by controlling the degree of the old information The word embedding representation of the user question that determines how to handle the previous time step. This mechanism allows the model to selectively retain or forget old information in order to better capture important features and semantic information in the sequence. Such mechanisms are widely used in Recurrent Neural Networks (RNNs) and gated loop units (GRUs) to handle natural language processing and sequence modeling tasks.

Hidden state of candidate user questionsBy use in the context of gated loop units (GRUs)The function of the function is that,nonlinear transformations are introduced to enable models to capture complex relationships and features, which are important for handling natural language processing tasks. Linear transformation and gating mechanism in a formulaHidden state for fusing previous time stepsWord embedded representation of user questions with current time step. This operation can be seen as a fusion of the old information and the new information,to generate new candidate hidden states. Hidden status of end user questionsBy updating the door and candidate hidden states using questionsCalculated by the method.Including old and new information that considers the problem reset gate, which affects the final hidden state, capturing context information and semantic features in the sequence modeling task. Hidden state of candidate user questions The method plays an important role in GRU, and is beneficial to better capturing semantic content and context information of a user problem by a model by introducing nonlinear transformation, fusing information and influencing a final hidden state, so that the performance of a sequence modeling task is improved.

Hidden status of user questionsIn the context of a gated loop unit (GRU) there are the following roles: by calculation ofAnd，hidden state for taking previous time stepAnd hidden status of candidate user questions for the current time stepThe weights are fused together. This allows the model to rationally fuse old informationAnd new information to generate a new hidden state. Problem update doorThe value of (2) determines the hidden state of the previous time stepAnd candidate hidden statesThe relative importance of the two. If it isNear 1, the model is more prone to retaining information from the previous time step. If it isNear 0, the model is more prone to use candidate hidden states. Hidden status of end user questionsComprising time stepsHas fused the hidden state of the previous time step and the candidate hidden state. This updated hidden state is used to represent the semantic content and context information of the user problem in the sequence. Hidden status of user questions Plays a key role in a gating loop unit (GRU), and helps a model to better capture semantic content and context information of a user problem by fusing old information and new information, considering the influence of a problem update gate, and updating a final hidden state. This hidden state is important in sequence modeling tasks, for example, in text generation and understanding tasks in natural language processing.

Example 7: the coding subunit uses the following formula to calculate the hidden state of the word embedded representation of the customer service answer according to the context information:

；

wherein, among them,in time steps for answering update doorIs a probability value between 0 and 1, indicating the probability of retaining the answer memory state of the previous time step;is a Sigmoid function;for a weight matrix used to calculate the answer memory gate;embedding a hidden state of the representation for the word of the customer service answer of the previous time step;in time steps for resetting the gate in replyIs a probability value between 0 and 1, representing the probability of preserving the hidden state of the word embedding representation of the customer service answer of the previous time step;resetting the weight matrix of the gates for the calculation of the answers; To be in a time stepA hidden state of the word embedding representation of the candidate customer service answer of (a);as a hyperbolic tangent function;embedding a weight matrix of the hidden state of the representation for the word used to calculate the candidate customer service answer;to be in a time stepThe hidden state of the word embedded representation of the customer service answer is calculated from the hidden states of the word embedded representation of the answer update gate and the candidate customer service answer.

In particular, whenNear 1: the representation model is more prone to preserve the answer memory state of the previous time stepAnd reduces the acceptance of new information for the current time step. This means that the information of the previous time step will be continuously taken into account at the current time step, which is applicable in situations where continuity information is needed in the scene. When (when)Near 0: the representation model is more prone to ignore the answer memory state of the previous time stepAnd attach more importance to the new information of the current time step. This means that the information of the previous time step will be forgotten and the model updates the state mainly based on the information of the current time step, which is suitable for situations requiring instant information.

When (when)Near 1: the representation model is more prone to preserving the hidden state of the word embedded representation of customer service answers for the previous time step And reduces the acceptance of new information for the current time step. This means that the hidden state of the word embedded representation of the previous time step will be continuously taken into account at the current time step, as is the case in situations where continuity information is needed in the scene. When (when)Near 0: hidden states of word embedded representations representing customer service answers that the model is more prone to ignore previous time stepsAnd attach more importance to the new information of the current time step. This means that the hidden state of the word-embedded representation of the previous time step will be forgotten and the model updates the state mainly based on the information of the current time step, which is applicable to situations where instant messages are needed.

Is a non-linear characteristic of (2): hyperbolic tangent functionNon-linearities are introduced that help the model capture more complex features in computing candidate hidden states. This is very important for processing data with rich semantics and structure, such as language, as it allows the model to better represent and understand complex information.Candidate hidden states are generated: candidate hidden statesIs based on a gating mechanismControlled.Word embedding table for deciding whether to retain customer service answer of previous time stepHidden state of the display. This allows the model to selectively retain or forget information at different time steps depending on the specific task and input situation to generate the appropriate hidden state.

Example 8: the output prediction subunit predicts the initial position of the customer service answer word embedding representation based on the hidden state of the word embedding representation of the user question and the hidden state of the word embedding representation of the corresponding customer service answer using the following formula:

；

In particular, the method comprises the steps of,is through Sigmoid functionProcessed values, which are based on a weight matrixHidden state of word embedded representation with user questionsAnd ending position probability distribution of customer service answer word embedded representation of last time stepIs calculated by a linear combination of (a) and (b). Specifically calculated as。The model is controlled to retain the hidden state of the word embedding representation of the customer service answer for the previous time step. If it isApproaching 1 indicates that the information of the previous time step was retained, otherwise, a portion of the hidden state is reset. This helps the model determine at time steps Which information should be retained for generating customer service answers at the current time.Also by Sigmoid functionProcessed values, which are based on a weight matrixHidden state of word embedded representation with user questionsAnd ending position probability distribution of customer service answer word embedded representation of last time stepIs calculated by a linear combination of (a) and (b). Specifically calculated as。Controlling the model in time stepsHow to combine new information with the hidden state of the last time step. If it isClose to 1, indicating that the hidden state of the last time step is fully reserved; if it is close to 0, it means that the hidden state is updated completely with new information. This helps the model decide how to pass and update information to accommodate the current generation task.

Computing of (a) is based on gating mechanismsThis gating mechanism determines whether the model is to use the hidden state of the previous time stepTo update the hidden state of the current time step. Hyperbolic tangent functionActing on linear combinations to compress them to interval [ -1,1]Between which non-linearities are introduced. This helps the model capture complex features.The main function of (a) is to generate a temporary hidden state which is used in a subsequent step to calculate the probability distribution of the starting position of the embedded representation of the customer service answer words . By gating mechanismThe model can flexibly choose whether to retain the hidden state information of the last time step. If it isClose to 1, indicating that most of the previous time step information is retained; if close to 0, this means that no reservation is made, but the hidden state is updated with new information. Hyperbolic tangent functionThe nonlinearity is introduced, which is helpful for the model to capture complex modes and features, so that the prediction capability of the model on the customer service answer starting position is improved.

Example 9: the output prediction subunit predicts the end position of the customer service answer word embedding representation based on the hidden state of the word embedding representation of the user question and the hidden state of the word embedding representation of the corresponding customer service answer using the following formula:

。

after the creation of the question-answer large model is completed. The performance optimization of the model is achieved by minimizing the loss function. In this case, the loss function is typically a cross entropy loss function, used to measure the difference between the word embedded representation of the predicted customer service response of the model and the word embedded representation of the customer service response in the knowledge base.

；

Wherein,indicating the number of samples to be taken,andthe probability distribution of the start and end positions of the real customer service answer respectively, Andis the probability distribution of model predictions. Based on the loss function, back-propagation algorithms can be used to calculate gradients of model parameters that tell us how to adjust the parameters to reduce the loss. Gradient descent or its variant algorithms are typically used to update the parameters of the model to reduce losses. The update rules are typically as follows:

；

wherein,is a parameter that has been updated and is then used,is a current parameter that is to be used,is the rate of learning to be performed,is a loss function related to parametersIs a gradient of (a). The above steps are typically performed over the entire training dataset for multiple iterations to continuously reduce the loss and optimize the model.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; while the invention has been described in detail with reference to the foregoing embodiments, it will be appreciated by those skilled in the art that variations may be made in the techniques described in the foregoing embodiments, or equivalents may be substituted for elements thereof; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An intelligent customer service system based on large model and knowledge base generation, the system comprising: the system comprises a knowledge base construction unit, a question-answer large model construction unit and a client; the knowledge base construction unit is used for acquiring user question and customer service answer pairs from training data, respectively encoding the user question and the customer service answer in the user question and customer service answer pairs into word embedding representations, and constructing a knowledge base based on the word embedding representations of the user question and the word embedding representations of the corresponding customer service answers; the question-answer large model construction unit is used for constructing a question-answer large model and comprises the following steps: a feature extraction subunit, a correlation capture subunit, a context coding subunit, an output prediction subunit and a parameter updating unit; the feature extraction subunit is used for extracting feature representations of word embedding representations of user questions and word embedding representations of customer service answers in the knowledge base; the association capturing subunit is used for introducing an attention mechanism and calculating an attention score matrix between the word embedding representation of the user questions and the word embedding representation of the corresponding customer service answers in the knowledge base; a context coding subunit, configured to code context information of the word embedding representation of the user question and the word embedding representation of the customer service answer according to the attention score moment, and obtain a hidden state of the word embedding representation of the user question and a hidden state of the word embedding representation of the corresponding customer service answer according to the context information; the output prediction subunit is used for predicting the starting position and the ending position of the embedded representation of the customer service answer word based on the hidden state of the embedded representation of the word of the user question and the hidden state of the embedded representation of the word of the corresponding customer service answer, and generating the word embedded representation of the predicted customer service answer; the parameter updating unit is used for measuring the difference between the word embedded representation of the predicted customer service answer and the word embedded representation of the customer service answer in the knowledge base by using cross entropy loss according to the starting position and the ending position of the word embedded representation of the predicted customer service answer, calculating total loss according to the difference, and updating parameters of the feature extraction subunit, the association capturing subunit, the context coding subunit and the output prediction subunit with the aim of minimizing the total loss function to complete the construction of a question-answer large model; the client is used for providing the client with input user questions, submitting the user questions to a question-answering model, generating customer service answers according to the input user questions, and returning the customer service answers to the client.

2. The intelligent customer service system based on large model and knowledge base generation according to claim 1, wherein the knowledge base construction unit is set to obtain user questions and customer service answer pairs from training data asWherein->Indicating a problem with the user and,representing customer service answers; mapping user questions and customer service answers to a continuous vector space using a pre-trained Word2Vec Word embedded representation model to encode the user questions and customer service answers into Word embedded representations, respectively; wherein the word embedded representation of the user question +.>The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is the%>Word embedding vectors for individual words; word embedding of customer service answers is denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is the +.>Word embedding vectors for individual words; />Taking the subscript as a positive integer, taking the lower boundary of the value range as 1, and taking the upper boundary of the value range as the number of user questions and customer service response pairs; />For subscript, the value is positive integer, the lower boundary of the value range is 1, and the upper boundary of the value range is word embedded representation of user problem +.>The number of words in (a); />The subscript is a positive integer, the lower boundary of the value range is 1, and the upper boundary of the value range is the word embedding of customer service answers, which is expressed as +. >The number of words in (a) is provided.

3. The intelligent customer service system based on large model and knowledge base generation of claim 2, wherein the feature extraction subunit extracts feature representations of the word embedded representation of the user questions and the word embedded representation of the customer service answers in the knowledge base using a modified residual neural network expressed as

；

Wherein,embedding a representation for a word of a user question;/>a weight matrix for a first residual block of the improved residual neural network; />A bias term for the first residual block; />Outputting a characteristic representation of the representation for the first residual block, embedding a characteristic representation of the representation for the word of the customer service answer; />To correct the linear function, a nonlinear activation function, the negative value of each element is changed to zero; />A weight matrix for the second residual block for linear transformation; />Bias term for the second residual block; />A feature representation of the word embedded representation of the customer service answer to the user question; />To embed user question words in the representation +.>Word embedded representation of user questions extracted via residual neural network +.>Adding, the resulting word embedded representation of the updated user question, >Will be a new->。

4. A large model and knowledge base generation based intelligent customer service system as in claim 3, wherein the association capture subunit calculates a concentration score between the word embedded representation of the user question in the knowledge base and the word embedded representation of the corresponding customer service answer using the formula

；

5. The intelligent customer service system based on large model and knowledge base generation according to claim 4, wherein the coding subunit, based on the attention score matrix, uses the following formula to code the context information of the word embedded representation of the user question and the word embedded representation of the customer service answer as:；

wherein,embedding context information for the representation for the word of the user question; />Words answered for customer service are embedded with representation context information.

6. The intelligent customer service system based on large model and knowledge base generation according to claim 5, wherein the coding subunit uses the following formula to calculate the hidden state of the word embedded representation of the user problem from the context information:

；

wherein,updating the gate in time steps for problems>Is a probability value between 0 and 1, indicating the probability of preserving the problematic memory status of the previous time step; / >Is a Sigmoid function; />For a weight matrix used to calculate problem memory gates; />Embedding a hidden state of the representation for the word of the user question of the previous time step; />Resetting the gate for the problem at time step +.>Is a probability value between 0 and 1, representing the probability of preserving the hidden state of the word embedding representation of the user question for the previous time step; />Resetting the weight matrix of the gate for the calculation of the problem; />For +.>A hidden state of the word embedding representation of the candidate user question; />As a hyperbolic tangent function; />Embedding a weight matrix of the hidden states of the representation for the words used to calculate the candidate user questions; />For +.>The hidden state of the word embedded representation of the user question is a hidden state of the word embedded representation of the new user question calculated from the hidden states of the word embedded representation of the question update gate and the candidate user questions.

7. The intelligent customer service system based on large model and knowledge base generation according to claim 6, wherein the coding subunit calculates the hidden state of the word embedded representation of the customer service answer from the context information using the following formula:

；

wherein,in time step +. >Is a probability value between 0 and 1, indicating that the answer memory of the previous time step is preservedProbability of state; />Is a Sigmoid function; />For a weight matrix used to calculate the answer memory gate; />Embedding a hidden state of the representation for the word of the customer service answer of the previous time step; />To answer reset gate at time step +.>Is a probability value between 0 and 1, representing the probability of preserving the hidden state of the word embedding representation of the customer service answer of the previous time step; />Resetting the weight matrix of the gates for the calculation of the answers; />For +.>A hidden state of the word embedding representation of the candidate customer service answer of (a); />As a hyperbolic tangent function; />Embedding a weight matrix of the hidden state of the representation for the word used to calculate the candidate customer service answer; />For +.>The hidden state of the word embedded representation of the customer service answer is calculated from the hidden states of the word embedded representation of the answer update gate and the candidate customer service answer.

8. The intelligent customer service system based on large model and knowledge base generation according to claim 7, wherein the output prediction subunit predicts the starting position of the customer service answer word embedding representation based on the hidden state of the word embedding representation of the user question and the hidden state of the word embedding representation of the corresponding customer service answer using the following formula:

；

Wherein,is a reset gate; />To update the gate for controlling whether to pass the memory state to the next time step;embedding probability distribution of the end position of the representation for the customer service answer words; />Is in a temporary hidden state; />Embedding probability distribution of the represented starting position for the customer service answer words; and selecting the initial position of the embedded representation of the customer service answer word with the highest probability as the initial position of the embedded representation of the predicted customer service answer word.

9. The intelligent customer service system based on large model and knowledge base generation according to claim 7, wherein the output prediction subunit predicts the end position of the customer service answer word embedding representation based on the hidden state of the word embedding representation of the user question and the hidden state of the word embedding representation of the corresponding customer service answer using the following formula:

。