CN115062003A - Cloud ERP community generation type question-answering method based on GPT2 - Google Patents

Cloud ERP community generation type question-answering method based on GPT2 Download PDF

Info

Publication number
CN115062003A
CN115062003A CN202210596783.9A CN202210596783A CN115062003A CN 115062003 A CN115062003 A CN 115062003A CN 202210596783 A CN202210596783 A CN 202210596783A CN 115062003 A CN115062003 A CN 115062003A
Authority
CN
China
Prior art keywords
question
community
data
customer service
cloud erp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210596783.9A
Other languages
Chinese (zh)
Other versions
CN115062003B (en
Inventor
廖伟智
黄明彤
阴艳超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210596783.9A priority Critical patent/CN115062003B/en
Publication of CN115062003A publication Critical patent/CN115062003A/en
Application granted granted Critical
Publication of CN115062003B publication Critical patent/CN115062003B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a GPT 2-based cloud ERP community generation type question-answering method, which is implemented by desensitizing, cleaning and denoising artificial customer service data of a cloud ERP community; establishing a cloud ERP community customer service data driving service; constructing unsupervised training based on a Transformer; constructing a GPT2 model based on a Transformer decoder; and constructing a generative question-answering model based on GPT 2. Finally, the performance and the effectiveness of the model are proved through experimental verification. Aiming at the problems that the existing community question-answering mode is low in retrieval efficiency and cannot provide question-answering service for various customers timely, conveniently, flexibly and effectively and the like, and the problem that massive manual customer service session data in community interaction data are not effectively utilized, a cloud ERP (enterprise resource planning) community generation type question-answering method based on GPT2 is provided.

Description

Cloud ERP community generation type question answering method based on GPT2
Technical Field
The invention relates to the field of intelligent question answering, in particular to a cloud ERP community generation type question answering method based on GPT 2.
Background
Enterprise Resource Planning (ERP) is an important means for integrating information technology and advanced management ideas and improving Enterprise benefits and efficiency by integrating and optimizing Enterprise elements and resources. With the development of cloud computing, a cloud ERP platform formed by combining an ERP idea and a cloud service mode can better combine the characteristics of resources and customer requirements of the cloud ERP platform, provide a series of more valuable, more flexible, cheaper and more convenient business and service supports for users, and become a mainstream direction for the development of the current ERP field. Under the cloud service mode, the cloud ERP ecosphere is formed by the main bodies, businesses, data, relations and the like of cloud ERP providers, various user groups, developers, software developers and the like on a supply chain. The important content and trend of cloud ERP development are based on the ecological circle to create a cloud ERP ecological community which is communicated, opened, shared and cooperated by all parties. The good operation of the cloud ERP ecological community can provide guarantee for the ecological culture construction of the community, the learning growth of users and efficient service consultation. The high-quality cloud ERP ecological operation service is the basis of good operation of the community, and the dynamic adaptation and the autonomous evolution of the operation service are one of the keys of good operation of the ecological community. Because community services are wide in coverage, users are large in types and quantity, and various communication exchanges are very frequent, various users often need to consult at any time due to problems in various aspects such as services, technologies, businesses and the like. Therefore, the intelligent question answering is a key link of the cloud ERP ecological operation service, and is an important guarantee for flexible and rapid communication and sharing of all individuals and organizations in the cloud ERP ecological circle. However, different from common life communities such as entertainment and shopping, the cloud ERP ecological community has extremely strong characteristics of field, technical and professional, and has the advantages of multiple related subjects, wide range, rich content and miscellaneous problems, thereby bringing great difficulty and challenge to the operation guarantee work of the cloud ERP ecological community.
The structural idea of the question-and-answer method was first traced back to the turing test proposed by the british mathematician turing in the fifties of the last century. Until 1966, the first more complete intelligent question-answering system Eliza was developed by the american college of labor and technology, an early application of natural language processing. With the continuous development of artificial intelligence technology, on the basis of deep learning of a neural network, the effect of an intelligent question-answering system is better and better, and at present, the intelligent question-answering system is mainly divided into a task completion type question-answering system, a chatting type dialogue system and the like, and the task completion type question-answering system corresponds to a limited professional field and an open question domain respectively. In order to accomplish these questions and answers, researchers mainly use a search-type question-answering method and a generation-type question-answering method to conduct research. The search-type question-answering method is a conventional method commonly used in the industry, and achieves the aim of reducing manual dependence, but due to the limitation of the search-type method, answers cannot be formed for questions outside a database. The advent of the generative question-and-answer method benefits from the application of deep learning to machine translation models, which are typically done based on the Encoder-Decoder (Encoder-Decoder) architecture. The similar Encoder-Decoder model is not dependent on a specific knowledge base or rule template, but based on the learning and training of massive question-answer dialogue linguistic data, and the dialogue capability obtained under the drive of big data is closer to the thinking mode of human beings. In addition, unlike retrieving the matched answers, the generated model can obtain responses more flexibly and variably, but the generated content has great difficulty in control. The generated question-answer model is generally realized by a sequence-to-sequence (Seq2Seq) model, that is, a method capable of generating another sequence according to a given sequence by a specific method, so that the generated question-answer is a kind of conditional generation problem requiring a great number of conditions.
The community question-answering is a key link of cloud ERP community operation service, and is an important guarantee for flexible and rapid communication and sharing of all individuals and organizations in the cloud ERP ecological community. The problem that the existing community question-answering mode is low in retrieval efficiency and cannot provide question-answering service for various clients timely, conveniently, flexibly and effectively exists. The method has the advantages that the openness of the cloud ERP ecological community, the diversity of people-related groups and the variability of user requirements are considered, the field data in the time and space dimensions are numerous, and the problem that massive manual customer service session data in community interaction data are not effectively utilized exists.
Disclosure of Invention
The invention aims to overcome the problems in the background art and provides a cloud ERP community generation type question-answering method based on GPT2, which can overcome the limitation of a search method, answer the problem that a user cannot search answers in a database and improve the flexibility of cloud ERP community question-answering.
The purpose of the invention is mainly realized by the following technical scheme:
the cloud ERP community generation type question-answering method based on the GPT2 comprises the following steps:
(1) desensitizing, cleaning and denoising the cloud ERP community artificial customer service data;
(2) establishing cloud ERP community customer service data driving service;
(3) constructing unsupervised training based on a Transformer;
(4) constructing a GPT2 model based on a Transformer decoder;
(5) and constructing a generative question-answering model based on GPT 2.
Finally, the performance and the effectiveness of the model are proved through experimental verification.
The algorithm flow of desensitization and cleaning denoising is as follows: firstly, reading a knowledge base and artificial customer service session original data in a cloud ERP ecological community, then adding a sensitive information category to be desensitized, reading customer information and setting a sensitive information replacement rule, and traversing and desensitizing the original data to obtain desensitized text data on the basis; adding the type of noise to be cleaned, reading required information, setting a corresponding regular expression cleaning function, then traversing data for cleaning, finally obtaining knowledge base triples and customer service dialogue data after desensitization cleaning and denoising, and storing the knowledge base triples and the customer service dialogue data into a database.
And (2) extracting the cloud ERP artificial customer service session data processed in the step (1) from a database, then extracting relevant information in the cloud ERP artificial customer service session data, establishing a customer service session data set on the basis, packaging the data set and the extracted information, and finally initializing and starting customer service data driving service.
In step (3), an unsupervised markup corpus U ═ U is given 1 ,...,u n Using standard language modeling objectives to maximize the following possibilities, as in equation 1:
L 1 (U)=∑ i logP(u i |u 1 ,......,u k-1 (ii) a θ) (1) where k is the size of the context window, the conditional probability P uses a neural network to model a network with parameters θ that are trained using stochastic gradient descent, then a multi-layered transform decoder is used as the language model, which is a variant of the transform that applies a multi-headed self-attention operation on the input context markers, followed by a position feed-forward layer, to produce an output distribution on the target markers as follows:
m 0 =UW e +W p
(2)
Figure BDA0003663152570000031
in which words of n words are embedded in W e Plus position embedding W p Inputting into the Transformer, n outputs each predict the next word at the position, inputting m 0 Denotes that 0 denotes the initial input layer, m 0 The calculation formula can show that GPT is a unidirectional language model, and m is obtained 0 Then, m is put 0 Sequentially transmitting the signals into all decoders of a transform to finally obtain m n Finally, by the following formula:
P(u)=Softmax(m n ,W e T ) (4)
m to be finally obtained in the formula n And inputting a softmax function for solving to obtain a final unsupervised pre-training result.
The principle of layer normalization of GPT2 in step (4) is shown in equation (5):
Figure BDA0003663152570000032
in the formula u L Representing the mean, σ, of each dimension of the input L Representing the variance of each dimension of the input, a and β are two learnable parameters that are consistent with the dimension of the calculated LayerNorm, and e isA constant bias term, x i Is the input vector representation at the ith time step:
wherein the principle of the future information masking-multi-head attention layer attention calculation is shown as formula (6):
Figure BDA0003663152570000033
q, K, V represents the query matrix, the key matrix, and the value matrix, respectively, and the attention matrix is obtained by matrix multiplication,
Figure BDA0003663152570000041
expressed as a standard normal distribution, resulting in a final attention value;
the calculation principle of the FeedForward neural network (feed forward) is shown as the formula (7):
FF(X)=RELU(Linear(Linear(X)) (7)
Linear(X)=wX+b
(8)
where FF (X) represents inputting vector X into the feedforward neural network, RELU is the activation function, Linear represents the Linear fully-connected layer where two layers are stacked, and w and b are randomly initialized constant terms.
Modeling the question-answer corpus of the previous turn on the same text in the model designing process in the step (5), carrying out forward splicing on each section of conversation, predicting the answer of the current turn through the previous rounds of question-answers and the current round of questions, and expressing the answer as a formula 9 by using conditional probability:
Figure BDA0003663152570000042
wherein A represents the current question answer of the target prediction; p represents all round history question-answer pairs and current round questions before the current answer; obtaining the current question-answer prediction probability x by multiplying the conditional probabilities 1 ,x 2 ,...,x n-1 Representing the first n-1 positions of the input sequence.
In conclusion, compared with the prior art, the invention has the following beneficial effects: the method can overcome the limitation of a search-type method, answer the problem that a user cannot search answers in a database, and improve the flexibility of cloud ERP community question answering.
Drawings
FIG. 1 is a schematic diagram of a process of desensitization, cleaning and denoising of cloud ERP customer service data;
FIG. 2 is an exemplary diagram of portions of raw data for a manual customer service session;
FIG. 3 is an exemplary diagram of partial results after desensitization cleaning and denoising of raw data;
FIG. 4 is a schematic diagram of a cloud ERP community customer service data driven service establishment flow;
FIG. 5 is a schematic diagram of a GPT2 model structure based on a Transformer decoder;
FIG. 6 is a schematic structural diagram of a cloud ERP community-generated question-answer model based on GPT 2;
FIG. 7 is a schematic diagram of a general flow of a cloud ERP community generation type question answering method based on GPT 2;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.
The community question-answering is a key link of cloud ERP community operation service, and is an important guarantee for flexible and rapid communication and sharing of all individuals and organizations in the cloud ERP ecological community. The problem that the existing community question-answering mode is low in retrieval efficiency and cannot provide question-answering service for various clients timely, conveniently, flexibly and effectively exists. The method has the advantages that the openness of the cloud ERP ecological community, the diversity of people-related groups and the variability of user requirements are considered, the field data in the time and space dimensions are numerous, and the problem that massive manual customer service session data in community interaction data are not effectively utilized exists. Therefore, a GPT 2-based cloud ERP community generation type question-answering method is designed, and cloud ERP community customer service data driving service is established through (1) desensitization and cleaning denoising of cloud ERP community artificial customer service data and (2); (3) constructing unsupervised training based on Transformer (4) constructing GPT2 model based on Transformer decoder; (5) and constructing a generative question-answering model based on GPT 2. Finally, the performance and the effectiveness of the model are proved through experimental verification.
The cloud ERP community artificial customer service data desensitization and cleaning denoising module is used for desensitizing sensitive information in community artificial customer service session data in a key mode according to the requirement of data security, and reliable protection and privacy removal of privacy and sensitive information data are achieved. Sensitive information existing in cloud ERP ecological community artificial customer service session data mainly comprises the following steps: client name, company code, taxpayer identification number, user mailbox address, identification number, mobile phone number, consultation date, order number, customer service number and the like. Desensitization of such sensitive private information requires conversion or removal without affecting semantic information. In addition, since the community data includes noise such as messy codes, irrelevant information, repeated text, punctuation marks, etc., it is necessary to perform cleaning and denoising processing.
The current major modes of data desensitization are: shielding desensitization, reversibility desensitization, data consistency desensitization, generalization desensitization, reserved format desensitization and the like. The shielding desensitization is common, is widely applied to personal privacy protection of users and enterprises, and is mainly realized by symbol replacement of whole or part of sensitive information in data; reversible desensitization means that data after desensitization can be restored, namely original data can be obtained from the desensitization data; and (3) data consistency desensitization, which means that the interrelation among data is kept consistent after data desensitization, and is widely applied to the scenes of system secondary development and the like. These desensitization modes can also be used in combination to desensitize the system to different data requirements.
The main ways of data cleaning and denoising are: manual inspection, regular expressions, statistical models, clustering, and the like. The regular expression cleaning and denoising method is a commonly used method, and mainly comprises the step of manually setting a character string matching rule to match and remove a target character string.
The general scheme of desensitization, cleaning and denoising is shown in fig. 1 and is mainly realized by writing a Python script program, and the main algorithm flow of the program is as follows: firstly, reading original data of an artificial customer service session in a cloud ERP ecological community, then adding a sensitive information category to be desensitized, reading customer information and setting a sensitive information replacement rule, and traversing and desensitizing the original data to obtain desensitized text data on the basis; adding the types of the noise to be cleaned, reading the required information, setting a corresponding regular expression cleaning function, and then traversing the data for cleaning. And finally, acquiring a knowledge base triple after desensitization, cleaning and denoising and customer service dialogue data, and storing the knowledge base triple and the customer service dialogue data into a database.
The implementation process of the data desensitization and cleaning denoising Python program is shown in Table 1:
TABLE 1 data desensitization and cleaning De-noising procedure implementation Process
Figure BDA0003663152570000061
Fig. 2 shows the original data of the manual customer service session, in which the privacy information such as date information, website resource information, customer service name and job number, customer name and company, etc. are replaced with "+". Fig. 3 shows the result of desensitization, cleaning and denoising of original data, as shown in the figure, front-end markup language, irrelevant information and the like in text data are removed through the processing of a Python desensitization denoising script program designed herein, and processed manual customer service data is stored by taking a session sample (session) formed by multiple rounds of conversations as a unit.
(2) And establishing a cloud ERP community customer service data driving service. The flow is shown in fig. 4, and a generated question-answering model facing a cloud ERP community is supported by establishing an artificial customer service data driving service of a real scene. Firstly, extracting the cloud ERP artificial customer service session data processed in the step (1) from a database. Then extracting relevant information therein, such as: the system comprises session role information, session turn information, cloud ERP professional field information and the like. And establishing a customer service session data set on the basis, and encapsulating the data set and the extracted information. And finally initializing and starting customer service data driving service.
(3) And constructing the unsupervised training based on the Transformer. Unsupervised pre-training is a special case of semi-supervised learning, with the goal of findingA good initialization point rather than modifying the supervised learning objective. Early work explored the use of this technique in image classification and regression tasks. Research shows that pre-training as a regularization scheme can achieve better generalization in deep neural networks. Given an unsupervised markup corpus U ═ U 1 ,...,u n We use standard language modeling objectives to maximize the following possibilities, as shown in equation 1:
L 1 (U)=∑ i logP(u i |u 1 ,......,u k-1 ;θ)
(1)
where k is the size of the context window and the conditional probability P uses a neural network modeling parameter of θ. These parameters were trained using random gradient descent. A multi-layered transform decoder, which is a variant of the transform, is then used as the language model. The model applies a multi-headed self-attention operation on the input context markers, followed by a position feed forward layer, to produce an output distribution on the target marker as follows:
m 0 =UW e +W p
(2)
Figure BDA0003663152570000071
in which words of n words are embedded in W e Plus position embedding W p The n outputs respectively predict the next word at that position, input into the transform. M for input 0 Denotes that 0 denotes the initial input layer, m 0 The GPT is a unidirectional language model. To obtain m 0 Then m is put 0 Sequentially transmitting the signals into all decoders of a transform to finally obtain m n . And finally by the following formula:
P(u)=Softmax(mnW e T ) (4)
m to be finally obtained in the formula n And inputting a softmax function for solving to obtain a final unsupervised pre-training result.
(4) A GPT2 model based on a Transformer decoder was constructed. GPT (genetic Pre-Training) is Generative Pre-Training, a GPT model is mainly composed of two stages of Training tasks, a language model is obtained in the first stage through unsupervised Training of massive text data, and various downstream tasks can be changed in the second stage, such as: text classification, text relevance, text generation, text labeling, etc., and then fine-tuning of the supervised mode is performed according to these tasks. The goal of GPT2 is to train a generic natural language processing model, unlike GPT, GPT2 does not alter downstream tasks, but rather automatically identifies tasks that need to be completed. Unlike the BERT model built by the transform encoder module, GPT2 is built by the transform decoder module. As shown in fig. 5, the overall model structure of GPT2 is the same as that of GPT, and one word (token) is output at a time, and the use of the unidirectional language model makes it more effective in generating question and answer text. The newly generated token will be spliced immediately after the previously generated sequence and become the new input. This token processing mechanism of GPT2 is also referred to as auto-regression (auto-regression). Subsequent emerging models, TransformerXL and XLNet, are essentially based on this autoregressive mechanism. This autoregressive mechanism also enhances the effect of GPT2 on text generation, so GPT2 is used herein as a sequence-to-sequence model for generating questions and answers.
It can be seen that GPT2 stacked 12 transform decoders, unlike GPT and BERT, GPT2 put layer normalization (LayerNorm) before multi-headed attention, and the residual join locations remain unchanged. And is adjusted according to the network depth at parameter initialization, with additional layer normalization added after the last transform decoder. The masked multi-head attention layer (masked multi-head attention) in the transform decoder is used to mask future predicted character positions in the sequence, with the multi-head attention rule still consistent with the transform encoder, and each attention layer holds the corresponding key and value vector for the current position token. GPT2 explores a larger-scale model, the maximum parameter can reach 15 hundred million, and the best effect is achieved on various NLP tasks by virtue of pre-training, feedback and prediction under the condition of no fine adjustment. According to the principle and the characteristics of the GPT2, the strong generation capacity of the GPT2 is used for multiple rounds of question and answer generation, and the strong general pre-training capacity of the GPT2 is transferred to the question and answer in the professional field of the cloud ERP ecological community.
The principle of layer normalization (LayerNorm) is shown in formula (5):
Figure BDA0003663152570000081
in the formula u L Representing the mean, σ, of each dimension of the input L Representing the variance of each dimension of the input, α and β are two learnable parameters that are consistent with the dimension of the computed LayerNorm, e is a constant bias term, x i Is the input vector representation for the ith time step.
Wherein the principle of the future information masking-multi-head attention layer attention calculation is shown as formula (6):
Figure BDA0003663152570000082
q, K, V represents the query matrix, the key matrix, and the value matrix, respectively, and the attention matrix is obtained by matrix multiplication,
Figure BDA0003663152570000083
expressed as a standard normal distribution, the final attention value was obtained. Repeating the above attention calculation process 8 times represents that 8 attention points are set, and obtaining a plurality of self-attention matrixes. In order not to influence the position of the later-stage predicted answer in the training process, the predicted position is shielded in a shielding (Mask) mode. So as to achieve the effect of shielding future information.
The calculation principle of the FeedForward neural network (feed forward) is shown as the formula (7):
FF(X)=RELU(Linear(Linear(X)) (7)
Linear(X)=wX+b (8)
where FF (X) represents inputting vector X into the feedforward neural network, RELU is the activation function, Linear represents the Linear fully-connected layer where two layers are stacked, and w and b are randomly initialized constant terms.
(5) A GPT 2-based generative question-answering model is constructed, prediction capability is generated by relying on a strong one-way sequence of GPT2, and GPT2 is applied to a cloud ERP field generative question-answering task. The generative question-answer model based on GPT2 is trained on the basis of the architecture of GPT 2. The structure of the text is shown in fig. 6, and in consideration of historical information of multiple rounds of question and answer contexts and multiple rounds of long text characteristics, question and answer corpora of previous rounds are modeled on the same text when a model is designed. And (4) carrying out positive sequence splicing on each session, and predicting the answer of the current turn through the questions of the previous turns and the questions of the current turn. The conditional probability can be expressed as formula 9:
Figure BDA0003663152570000091
wherein A represents the current question answer of the target prediction; p represents all round history question-answer pairs and current round questions before the current answer; obtaining the prediction probability, x, of the answer of the current question and answer through multiplication of conditional probabilities 1 ,x 2 ,...,x n-1 Representing the first n-1 positions of the input sequence.
The question-answer sentence is first modeled by a two-layer word embedding layer in the figure. The first layer is a symbol embedding layer, "[ CLS ]" represents the beginning of a question, each question and answer text is separated by a "[ SEP ]" symbol mark, and essentially each lattice character represents a feature vector of the current character. The second layer is a segment embedding layer, the role of customer service is represented by s, the role of client is represented by c, and the segment embedding layer is used for carrying out role recognition on the customer service and the text input by the client in modeling. The modeled vectors are then input into a GPT2 question-answer text generation model, the GPT2 masks the information to be predicted, and the predictions are output word by word. And finally, obtaining the final current turn answer in the prediction output layer.
The historical session information in the example is: "customer: why the attachment cannot be uploaded; customer service: prompting what woolen cloth is; customer: the certificate is checked, then the required certificate is found, and then the attachment is clicked, and the result shows that the certificate does not react at all. "the current round predicted answer obtained by the question-answer model generated by GPT2 herein is: and the browser cache is removed, and the tweed is logged in again. "
Based on the steps (1) (2) (3) (4) (5), the general flow of our cloud ERP community generation type question-answering method based on GPT2 is shown in fig. 7. The cloud ERP community customer service data driving service is established, and a generated question-answering model facing a cloud ERP ecological community is supported through an artificial customer service data driving service for establishing a real scene. Firstly, desensitized cloud ERP artificial customer service session data are extracted from a database. Then extracting relevant information therein, such as: the system comprises session role information, session turn information, cloud ERP professional field information and the like. And establishing a customer service session data set on the basis, and encapsulating the data set and the extracted information. And finally initializing and starting customer service data driving service. A generative question-answering model based on GPT 2. The cloud ERP session data training system is formed by training of massive cloud ERP manual customer service session data, already contains large-scale weight parameters, and fully extracts features in the cloud ERP field. The overall multi-turn conversation question-answering process comprises the following steps: first the user question asking action triggers the identification of a "start session" and the user's first turn of question text is entered into the GPT2 answer generating model. And then the model gives feedback of generated answers and feeds the feedback to the user. On the basis, after receiving answer feedback, the user continues to trigger the question action mark and inputs the current turn question text. After cycling n rounds in this way, the last user ends the session. In the whole session process, the generated historical session records are fed back to the ongoing session turns through the cloud ERP community customer service data driving service. And then, analyzing and modeling the historical conversation data and the current turn question text together, and finally obtaining the current turn reply containing the historical semantic information.
And finally, performing experimental verification, wherein the experimental verification is performed on a data set constructed by cloud ERP artificial customer service data to prove the effectiveness of the generated question-answering model. The data set comprises 12 ten thousand pieces of real cloud ERP ecological community artificial customer service session data provided by a cloud ERP developer. Through data desensitization processing, a Python script file is compiled, and company names, client service names and job numbers in the cloud ERP manual client service data are desensitized. And obtaining real artificial customer service data after desensitization for establishing a session data set. An example data set is shown in table 2 below:
table 2 example of session data
Figure BDA0003663152570000101
From table 2, it can be seen that a session includes multiple rounds of conversations between the customer service and the customer, and the manual customer service can combine the contexts to solve the questions for the customer, so the text data set is constructed by taking the session as a unit, and a session sets multiple question-answer pairs for the training of the GPT2 generative model. Statistical analysis of the data set is listed in table 3:
TABLE 3 cloud ERP generated question-answer dataset
Figure BDA0003663152570000102
And (4) setting Evaluation indexes and experimental parameters, wherein a GPT2 generation type question-answering module adopts a BLEU (Bilngual Evaluation understudy) value to evaluate the effect of generating multiple rounds of question-answering. In this document, the generated multiple rounds of question-answering sentences and the original corpus are subjected to BLEU value calculation, and the result is used as an evaluation index of the GPT2 generation formula question-answering module. The BLEU value is evaluated by accuracy, and if given a standard answer, GPT2 generates an answer that is a candidate (candidate), the sentence length is n, m words in candidate appear in the answer, and m divided by n is the 1-gram calculation formula of BLEU. BLEU has many variants, which can be divided into many evaluation indexes according to n-gram, wherein the common indexes are BLEU-1, BLEU-2, BLEU-3 and BLEU-4, and n-gram refers to the number of continuous words as n. BLEU-1 measures word-level accuracy, and higher-order BLEUs measure fluency in sentences. The calculation formula is as shown in formula (6):
Figure BDA0003663152570000111
in the formula, candidate represents the answer generated by GPT2, the first summation symbol in the numerator counts all candidates, and the second summation symbol counts all n-grams in one candidate, which represents the number of n-grams in the standard answer. The whole molecule indicates how many n-gram words appear in the standard answer in a given candidate. The first two summations in the denominator have the same meaning as in the numerator, and Count (ngram ') represents the number of ngram' in candidate.
Experiment parameter setting, because the GPT2 model is too huge, the experiment training environment is built in a mode of renting a GPU server, and specific experiment environment parameters are as shown in table 4:
TABLE 4 Experimental Environment for generating question-answer model
Figure BDA0003663152570000112
The experimental parameter settings for the generated question-answering model based on GPT2 are shown in table 5:
TABLE 5 partial Experimental parameters of the generated question-answer model
Figure BDA0003663152570000113
Figure BDA0003663152570000121
And (4) analyzing experimental results, after the GPT2 generated question-answer model is trained, simulating man-machine conversation, inputting the questions of the data set into the model one by one, carrying out conversation, and carrying out BLUE value calculation on the obtained reply and the original reply of the data set so as to evaluate the fitting property of the generated conversation statement and the original conversation corpus. The training results are as follows:
TABLE 6 GPT2 training results of the generated question-answer experiment
Figure BDA0003663152570000122
As can be seen from Table 6, the quantity of parameters of the GPT 2-based generated question-answering model is nearly nine million, 60 epochs of training are completed on a 16G P100 video card, the highest accuracy rate in the training is 92.13%, the lowest loss is 29.34%, and a good model training effect is achieved.
Table 7 evaluation results of the generated question-answer model of GPT2
Figure BDA0003663152570000123
All the finally obtained BLUE values are averaged, meanwhile, the levels in the range of 1 to 4 words are calculated to obtain corresponding BLEU values as shown in a table 7, the questions and answers generated by GPT2 are good in performance in the data set, the BLEU value reaches 20%, and compared with the BLEU value in the range of 1 to 4 words of the existing GRU model based on seq2seq, the BLEU value is greatly improved. In consideration of the problems of uncontrollable generated question and answer contents and the fact that a large amount of training data is needed to ensure the accuracy of the model, the model obtained by training relatively few cloud ERP customer service session data sets is tested through simulation dialogue, and a good effect is achieved.
The above embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, it should be understood that the above embodiments are merely exemplary embodiments of the present invention and are not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (6)

1. The cloud ERP community generation type question-answering method based on the GPT2 is characterized by comprising the following steps:
(1) desensitizing, cleaning and denoising the cloud ERP community artificial customer service data;
(2) establishing cloud ERP community customer service data driving service;
(3) constructing unsupervised training based on a Transformer;
(4) constructing a GPT2 model based on a Transformer decoder;
(5) and constructing a generative question-answering model based on GPT 2.
2. The GPT 2-based cloud ERP community generation-based question answering method according to claim 1, wherein the algorithm flow of desensitization, cleaning and denoising is as follows: firstly, reading a knowledge base and artificial customer service session original data in a cloud ERP ecological community, then adding a sensitive information category to be desensitized, reading customer information and setting a sensitive information replacement rule, and traversing and desensitizing the original data to obtain desensitized text data on the basis; adding the noise category to be cleaned, reading required information, setting a corresponding regular expression cleaning function, traversing data for cleaning, finally obtaining a knowledge base triple after desensitization cleaning and denoising and customer service dialogue data, and storing the triple and the customer service dialogue data in a database.
3. The cloud ERP community generation type question answering method based on the GPT2, according to the claim 1, wherein in the step (2), the cloud ERP artificial customer service session data processed in the step (1) are extracted from a database, then relevant information in the cloud ERP artificial customer service session data is extracted, a customer service session data set is established on the basis, the data set and the extracted information are packaged, and finally customer service data driving service is initialized and started.
4. The GPT 2-based cloud ERP community-generated question-answering method according to claim 1, wherein in the step (3), an unsupervised markup corpus is given U ═ U ═ 1 ,...,u n Using a standard language modeling target to maximize the following likelihood, as in equation 1:
L 1 (U)=∑ i logP(u i |u 1 ,......,u k-1 ;θ) (1)
where k is the size of the context window and the conditional probability P is a network using neural network modeling parameters θ, trained using stochastic gradient descent, then using a multi-layered transform decoder as the language model, the decoder being a variant of the transform, which applies a multi-headed self-attention operation on the input context markers, followed by a position feedforward layer, to produce an output distribution on the target markers as follows:
m 0 =UW e +W p
(2)
Figure FDA0003663152560000011
in which words of n words are embedded in W e Plus position embedding W p Inputting into the Transformer, n outputs each predict the next word at the position, inputting m 0 Denotes that 0 denotes the initial input layer, m 0 The calculation formula can show that GPT is a unidirectional language model, and m is obtained 0 Then, m is put 0 Sequentially transmitting the signals into all decoders of a transform to finally obtain m n Finally by the following formula:
P(u)=Softmax(m n W e T ) (4)
m to be finally obtained in the formula n And inputting a softmax function for solving to obtain a final unsupervised pre-training result.
5. The GPT 2-based cloud ERP community generation type question-answering method according to claim 1, wherein the principle of layer normalization of GPT2 in the step (4) is shown in formula (5):
Figure FDA0003663152560000021
in the formula u L Representing the mean, σ, of each dimension of the input L Representing the variance of each dimension of the input, α and β are two learnable parameters that are consistent with the dimension of the computed LayerNorm, e is a constant bias term, x i Is the input vector representation for the ith time step:
wherein the principle of the future information masking-multi-head attention layer attention calculation is shown as formula (6):
Figure FDA0003663152560000022
q, K, V respectively represents inquiry matrix, key matrix and value matrix, and the attention matrix is obtained by matrix multiplication,
Figure FDA0003663152560000023
expressed as a standard normal distribution, resulting in a final attention value;
the calculation principle of the FeedForward neural network (feed forward) is shown as the formula (7):
FF(X)=RELU(Linear(Linear(X)) (7)
Linear(X)=wX+b
(8)
where FF (X) represents inputting vector X into the feedforward neural network, RELU is the activation function, Linear represents the Linear fully-connected layer where two layers are stacked, and w and b are randomly initialized constant terms.
6. The GPT 2-based cloud ERP community generation-based question-answering method according to claim 1, wherein in the step (5), the question-answering corpus of the previous turn is modeled on the same text in the model design, each session is spliced in a forward order, the answer of the current turn is predicted through the previous turns of question-answering and the current turn of questions, and the conditional probability can be expressed as formula 9:
Figure FDA0003663152560000031
wherein A represents the answer to the current question predicted by the target; p represents all round history question-answer pairs and current round questions before the current answer; obtaining the current question-answer prediction probability x by multiplying the conditional probabilities 1 ,x 2 ,...,x n-1 Representing the first n-1 positions of the input sequence.
CN202210596783.9A 2022-05-26 2022-05-26 Cloud ERP community generation type question-answering method based on GPT2 Active CN115062003B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210596783.9A CN115062003B (en) 2022-05-26 2022-05-26 Cloud ERP community generation type question-answering method based on GPT2

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210596783.9A CN115062003B (en) 2022-05-26 2022-05-26 Cloud ERP community generation type question-answering method based on GPT2

Publications (2)

Publication Number Publication Date
CN115062003A true CN115062003A (en) 2022-09-16
CN115062003B CN115062003B (en) 2024-04-16

Family

ID=83199331

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210596783.9A Active CN115062003B (en) 2022-05-26 2022-05-26 Cloud ERP community generation type question-answering method based on GPT2

Country Status (1)

Country Link
CN (1) CN115062003B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116468298A (en) * 2023-06-12 2023-07-21 江西五十铃汽车有限公司 GPT network model-based automobile technology planning and decision-making method and system
CN117252251A (en) * 2023-11-20 2023-12-19 新华三技术有限公司 Private domain data generation method, device, equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472242A (en) * 2019-08-05 2019-11-19 腾讯科技(深圳)有限公司 A kind of text handling method, device and computer readable storage medium
CN112214591A (en) * 2020-10-29 2021-01-12 腾讯科技(深圳)有限公司 Conversation prediction method and device
CN112364150A (en) * 2021-01-12 2021-02-12 南京云创大数据科技股份有限公司 Intelligent question and answer method and system combining retrieval and generation
US20210174162A1 (en) * 2019-12-09 2021-06-10 Salesforce.Com, Inc. Spatial-Temporal Reasoning Through Pretrained Language Models for Video-Grounded Dialogues
CN113536809A (en) * 2021-05-24 2021-10-22 清华大学 Semantic-based unsupervised common sense question-answering method and system
US20210342380A1 (en) * 2020-04-29 2021-11-04 International Business Machines Corporation Generative ontology learning and natural language processing with predictive language models
CN113657124A (en) * 2021-07-14 2021-11-16 内蒙古工业大学 Multi-modal Mongolian Chinese translation method based on circulation common attention Transformer
US20220084510A1 (en) * 2020-09-15 2022-03-17 Microsoft Technology Licensing, Llc Synthetic data generation for training of natural language understanding models

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472242A (en) * 2019-08-05 2019-11-19 腾讯科技(深圳)有限公司 A kind of text handling method, device and computer readable storage medium
US20210174162A1 (en) * 2019-12-09 2021-06-10 Salesforce.Com, Inc. Spatial-Temporal Reasoning Through Pretrained Language Models for Video-Grounded Dialogues
US20210342380A1 (en) * 2020-04-29 2021-11-04 International Business Machines Corporation Generative ontology learning and natural language processing with predictive language models
US20220084510A1 (en) * 2020-09-15 2022-03-17 Microsoft Technology Licensing, Llc Synthetic data generation for training of natural language understanding models
CN112214591A (en) * 2020-10-29 2021-01-12 腾讯科技(深圳)有限公司 Conversation prediction method and device
CN112364150A (en) * 2021-01-12 2021-02-12 南京云创大数据科技股份有限公司 Intelligent question and answer method and system combining retrieval and generation
CN113536809A (en) * 2021-05-24 2021-10-22 清华大学 Semantic-based unsupervised common sense question-answering method and system
CN113657124A (en) * 2021-07-14 2021-11-16 内蒙古工业大学 Multi-modal Mongolian Chinese translation method based on circulation common attention Transformer

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JASON WESTON等: "Normformer: improved transformer pretraining with extra normalization", 《ARXIV》, 1 November 2021 (2021-11-01), pages 1 - 14 *
王磊: "基于语义的人机交互系统设计", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 December 2021 (2021-12-15), pages 136 - 91 *
黄明彤: "面向云ERP生态社区的知识库检索生成式智能问答方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 January 2023 (2023-01-15), pages 138 - 3942 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116468298A (en) * 2023-06-12 2023-07-21 江西五十铃汽车有限公司 GPT network model-based automobile technology planning and decision-making method and system
CN116468298B (en) * 2023-06-12 2023-11-03 江西五十铃汽车有限公司 GPT network model-based automobile technology planning and decision-making method and system
CN117252251A (en) * 2023-11-20 2023-12-19 新华三技术有限公司 Private domain data generation method, device, equipment and storage medium
CN117252251B (en) * 2023-11-20 2024-03-12 新华三技术有限公司 Private domain data generation method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN115062003B (en) 2024-04-16

Similar Documents

Publication Publication Date Title
Wu et al. Social media opinion summarization using emotion cognition and convolutional neural networks
CN108255805B (en) Public opinion analysis method and device, storage medium and electronic equipment
Shilpa et al. Sentiment analysis using deep learning
CN111783474A (en) Comment text viewpoint information processing method and device and storage medium
Cai et al. nCoder+: a semantic tool for improving recall of nCoder coding
CN116992005B (en) Intelligent dialogue method, system and equipment based on large model and local knowledge base
CN108170848B (en) Chinese mobile intelligent customer service-oriented conversation scene classification method
CN112784532B (en) Multi-head attention memory system for short text sentiment classification
CN115062003B (en) Cloud ERP community generation type question-answering method based on GPT2
US20230169271A1 (en) System and methods for neural topic modeling using topic attention networks
CN115062145A (en) Cloud ERP community cross-domain problem classification method based on BERT-TextCNN
Liu et al. AMFF: A new attention-based multi-feature fusion method for intention recognition
US20190228297A1 (en) Artificial Intelligence Modelling Engine
Hon Artificial neural networks
Prakash et al. Chatterbot implementation using transfer learning and LSTM encoder-decoder architecture
Gogate et al. Random features and random neurons for brain-inspired big data analytics
Khadija et al. Enhancing Indonesian customer complaint analysis: LDA topic modelling with BERT embeddings
CN117350271A (en) AI content generation method and service cloud platform based on large language model
Gao The Advance of GPTs and Language Model in Cyber Security
Fecht Sequential transfer learning in NLP for text summarization
CN115374283A (en) Double-graph attention network-based aspect category emotion classification method
CN113342964B (en) Recommendation type determination method and system based on mobile service
Rajapaksha et al. Explainable Attention Pruning: A Meta-learning-based Approach
Cvejoski et al. Recurrent point review models
Wang et al. Deep and shallow features learning for short texts matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant