CN115062003A - Cloud ERP community generation type question-answering method based on GPT2 - Google Patents
Cloud ERP community generation type question-answering method based on GPT2 Download PDFInfo
- Publication number
- CN115062003A CN115062003A CN202210596783.9A CN202210596783A CN115062003A CN 115062003 A CN115062003 A CN 115062003A CN 202210596783 A CN202210596783 A CN 202210596783A CN 115062003 A CN115062003 A CN 115062003A
- Authority
- CN
- China
- Prior art keywords
- question
- community
- data
- customer service
- cloud erp
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 102100033814 Alanine aminotransferase 2 Human genes 0.000 title claims abstract description 64
- 101000779415 Homo sapiens Alanine aminotransferase 2 Proteins 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000012549 training Methods 0.000 claims abstract description 29
- 238000004140 cleaning Methods 0.000 claims abstract description 25
- 101710096000 Alanine aminotransferase 2 Proteins 0.000 claims abstract description 17
- 238000000586 desensitisation Methods 0.000 claims description 30
- 239000011159 matrix material Substances 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000013528 artificial neural network Methods 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 9
- 239000013598 vector Substances 0.000 claims description 9
- 230000014509 gene expression Effects 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 claims description 3
- 238000012795 verification Methods 0.000 abstract description 5
- 230000003993 interaction Effects 0.000 abstract description 3
- 230000008569 process Effects 0.000 description 8
- 230000008901 benefit Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 238000011161 development Methods 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 241000282414 Homo sapiens Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Quality & Reliability (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a GPT 2-based cloud ERP community generation type question-answering method, which is implemented by desensitizing, cleaning and denoising artificial customer service data of a cloud ERP community; establishing a cloud ERP community customer service data driving service; constructing unsupervised training based on a Transformer; constructing a GPT2 model based on a Transformer decoder; and constructing a generative question-answering model based on GPT 2. Finally, the performance and the effectiveness of the model are proved through experimental verification. Aiming at the problems that the existing community question-answering mode is low in retrieval efficiency and cannot provide question-answering service for various customers timely, conveniently, flexibly and effectively and the like, and the problem that massive manual customer service session data in community interaction data are not effectively utilized, a cloud ERP (enterprise resource planning) community generation type question-answering method based on GPT2 is provided.
Description
Technical Field
The invention relates to the field of intelligent question answering, in particular to a cloud ERP community generation type question answering method based on GPT 2.
Background
Enterprise Resource Planning (ERP) is an important means for integrating information technology and advanced management ideas and improving Enterprise benefits and efficiency by integrating and optimizing Enterprise elements and resources. With the development of cloud computing, a cloud ERP platform formed by combining an ERP idea and a cloud service mode can better combine the characteristics of resources and customer requirements of the cloud ERP platform, provide a series of more valuable, more flexible, cheaper and more convenient business and service supports for users, and become a mainstream direction for the development of the current ERP field. Under the cloud service mode, the cloud ERP ecosphere is formed by the main bodies, businesses, data, relations and the like of cloud ERP providers, various user groups, developers, software developers and the like on a supply chain. The important content and trend of cloud ERP development are based on the ecological circle to create a cloud ERP ecological community which is communicated, opened, shared and cooperated by all parties. The good operation of the cloud ERP ecological community can provide guarantee for the ecological culture construction of the community, the learning growth of users and efficient service consultation. The high-quality cloud ERP ecological operation service is the basis of good operation of the community, and the dynamic adaptation and the autonomous evolution of the operation service are one of the keys of good operation of the ecological community. Because community services are wide in coverage, users are large in types and quantity, and various communication exchanges are very frequent, various users often need to consult at any time due to problems in various aspects such as services, technologies, businesses and the like. Therefore, the intelligent question answering is a key link of the cloud ERP ecological operation service, and is an important guarantee for flexible and rapid communication and sharing of all individuals and organizations in the cloud ERP ecological circle. However, different from common life communities such as entertainment and shopping, the cloud ERP ecological community has extremely strong characteristics of field, technical and professional, and has the advantages of multiple related subjects, wide range, rich content and miscellaneous problems, thereby bringing great difficulty and challenge to the operation guarantee work of the cloud ERP ecological community.
The structural idea of the question-and-answer method was first traced back to the turing test proposed by the british mathematician turing in the fifties of the last century. Until 1966, the first more complete intelligent question-answering system Eliza was developed by the american college of labor and technology, an early application of natural language processing. With the continuous development of artificial intelligence technology, on the basis of deep learning of a neural network, the effect of an intelligent question-answering system is better and better, and at present, the intelligent question-answering system is mainly divided into a task completion type question-answering system, a chatting type dialogue system and the like, and the task completion type question-answering system corresponds to a limited professional field and an open question domain respectively. In order to accomplish these questions and answers, researchers mainly use a search-type question-answering method and a generation-type question-answering method to conduct research. The search-type question-answering method is a conventional method commonly used in the industry, and achieves the aim of reducing manual dependence, but due to the limitation of the search-type method, answers cannot be formed for questions outside a database. The advent of the generative question-and-answer method benefits from the application of deep learning to machine translation models, which are typically done based on the Encoder-Decoder (Encoder-Decoder) architecture. The similar Encoder-Decoder model is not dependent on a specific knowledge base or rule template, but based on the learning and training of massive question-answer dialogue linguistic data, and the dialogue capability obtained under the drive of big data is closer to the thinking mode of human beings. In addition, unlike retrieving the matched answers, the generated model can obtain responses more flexibly and variably, but the generated content has great difficulty in control. The generated question-answer model is generally realized by a sequence-to-sequence (Seq2Seq) model, that is, a method capable of generating another sequence according to a given sequence by a specific method, so that the generated question-answer is a kind of conditional generation problem requiring a great number of conditions.
The community question-answering is a key link of cloud ERP community operation service, and is an important guarantee for flexible and rapid communication and sharing of all individuals and organizations in the cloud ERP ecological community. The problem that the existing community question-answering mode is low in retrieval efficiency and cannot provide question-answering service for various clients timely, conveniently, flexibly and effectively exists. The method has the advantages that the openness of the cloud ERP ecological community, the diversity of people-related groups and the variability of user requirements are considered, the field data in the time and space dimensions are numerous, and the problem that massive manual customer service session data in community interaction data are not effectively utilized exists.
Disclosure of Invention
The invention aims to overcome the problems in the background art and provides a cloud ERP community generation type question-answering method based on GPT2, which can overcome the limitation of a search method, answer the problem that a user cannot search answers in a database and improve the flexibility of cloud ERP community question-answering.
The purpose of the invention is mainly realized by the following technical scheme:
the cloud ERP community generation type question-answering method based on the GPT2 comprises the following steps:
(1) desensitizing, cleaning and denoising the cloud ERP community artificial customer service data;
(2) establishing cloud ERP community customer service data driving service;
(3) constructing unsupervised training based on a Transformer;
(4) constructing a GPT2 model based on a Transformer decoder;
(5) and constructing a generative question-answering model based on GPT 2.
Finally, the performance and the effectiveness of the model are proved through experimental verification.
The algorithm flow of desensitization and cleaning denoising is as follows: firstly, reading a knowledge base and artificial customer service session original data in a cloud ERP ecological community, then adding a sensitive information category to be desensitized, reading customer information and setting a sensitive information replacement rule, and traversing and desensitizing the original data to obtain desensitized text data on the basis; adding the type of noise to be cleaned, reading required information, setting a corresponding regular expression cleaning function, then traversing data for cleaning, finally obtaining knowledge base triples and customer service dialogue data after desensitization cleaning and denoising, and storing the knowledge base triples and the customer service dialogue data into a database.
And (2) extracting the cloud ERP artificial customer service session data processed in the step (1) from a database, then extracting relevant information in the cloud ERP artificial customer service session data, establishing a customer service session data set on the basis, packaging the data set and the extracted information, and finally initializing and starting customer service data driving service.
In step (3), an unsupervised markup corpus U ═ U is given 1 ,...,u n Using standard language modeling objectives to maximize the following possibilities, as in equation 1:
L 1 (U)=∑ i logP(u i |u 1 ,......,u k-1 (ii) a θ) (1) where k is the size of the context window, the conditional probability P uses a neural network to model a network with parameters θ that are trained using stochastic gradient descent, then a multi-layered transform decoder is used as the language model, which is a variant of the transform that applies a multi-headed self-attention operation on the input context markers, followed by a position feed-forward layer, to produce an output distribution on the target markers as follows:
m 0 =UW e +W p
(2)
in which words of n words are embedded in W e Plus position embedding W p Inputting into the Transformer, n outputs each predict the next word at the position, inputting m 0 Denotes that 0 denotes the initial input layer, m 0 The calculation formula can show that GPT is a unidirectional language model, and m is obtained 0 Then, m is put 0 Sequentially transmitting the signals into all decoders of a transform to finally obtain m n Finally, by the following formula:
P(u)=Softmax(m n ,W e T ) (4)
m to be finally obtained in the formula n And inputting a softmax function for solving to obtain a final unsupervised pre-training result.
The principle of layer normalization of GPT2 in step (4) is shown in equation (5):
in the formula u L Representing the mean, σ, of each dimension of the input L Representing the variance of each dimension of the input, a and β are two learnable parameters that are consistent with the dimension of the calculated LayerNorm, and e isA constant bias term, x i Is the input vector representation at the ith time step:
wherein the principle of the future information masking-multi-head attention layer attention calculation is shown as formula (6):
q, K, V represents the query matrix, the key matrix, and the value matrix, respectively, and the attention matrix is obtained by matrix multiplication,expressed as a standard normal distribution, resulting in a final attention value;
the calculation principle of the FeedForward neural network (feed forward) is shown as the formula (7):
FF(X)=RELU(Linear(Linear(X)) (7)
Linear(X)=wX+b
(8)
where FF (X) represents inputting vector X into the feedforward neural network, RELU is the activation function, Linear represents the Linear fully-connected layer where two layers are stacked, and w and b are randomly initialized constant terms.
Modeling the question-answer corpus of the previous turn on the same text in the model designing process in the step (5), carrying out forward splicing on each section of conversation, predicting the answer of the current turn through the previous rounds of question-answers and the current round of questions, and expressing the answer as a formula 9 by using conditional probability:
wherein A represents the current question answer of the target prediction; p represents all round history question-answer pairs and current round questions before the current answer; obtaining the current question-answer prediction probability x by multiplying the conditional probabilities 1 ,x 2 ,...,x n-1 Representing the first n-1 positions of the input sequence.
In conclusion, compared with the prior art, the invention has the following beneficial effects: the method can overcome the limitation of a search-type method, answer the problem that a user cannot search answers in a database, and improve the flexibility of cloud ERP community question answering.
Drawings
FIG. 1 is a schematic diagram of a process of desensitization, cleaning and denoising of cloud ERP customer service data;
FIG. 2 is an exemplary diagram of portions of raw data for a manual customer service session;
FIG. 3 is an exemplary diagram of partial results after desensitization cleaning and denoising of raw data;
FIG. 4 is a schematic diagram of a cloud ERP community customer service data driven service establishment flow;
FIG. 5 is a schematic diagram of a GPT2 model structure based on a Transformer decoder;
FIG. 6 is a schematic structural diagram of a cloud ERP community-generated question-answer model based on GPT 2;
FIG. 7 is a schematic diagram of a general flow of a cloud ERP community generation type question answering method based on GPT 2;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.
The community question-answering is a key link of cloud ERP community operation service, and is an important guarantee for flexible and rapid communication and sharing of all individuals and organizations in the cloud ERP ecological community. The problem that the existing community question-answering mode is low in retrieval efficiency and cannot provide question-answering service for various clients timely, conveniently, flexibly and effectively exists. The method has the advantages that the openness of the cloud ERP ecological community, the diversity of people-related groups and the variability of user requirements are considered, the field data in the time and space dimensions are numerous, and the problem that massive manual customer service session data in community interaction data are not effectively utilized exists. Therefore, a GPT 2-based cloud ERP community generation type question-answering method is designed, and cloud ERP community customer service data driving service is established through (1) desensitization and cleaning denoising of cloud ERP community artificial customer service data and (2); (3) constructing unsupervised training based on Transformer (4) constructing GPT2 model based on Transformer decoder; (5) and constructing a generative question-answering model based on GPT 2. Finally, the performance and the effectiveness of the model are proved through experimental verification.
The cloud ERP community artificial customer service data desensitization and cleaning denoising module is used for desensitizing sensitive information in community artificial customer service session data in a key mode according to the requirement of data security, and reliable protection and privacy removal of privacy and sensitive information data are achieved. Sensitive information existing in cloud ERP ecological community artificial customer service session data mainly comprises the following steps: client name, company code, taxpayer identification number, user mailbox address, identification number, mobile phone number, consultation date, order number, customer service number and the like. Desensitization of such sensitive private information requires conversion or removal without affecting semantic information. In addition, since the community data includes noise such as messy codes, irrelevant information, repeated text, punctuation marks, etc., it is necessary to perform cleaning and denoising processing.
The current major modes of data desensitization are: shielding desensitization, reversibility desensitization, data consistency desensitization, generalization desensitization, reserved format desensitization and the like. The shielding desensitization is common, is widely applied to personal privacy protection of users and enterprises, and is mainly realized by symbol replacement of whole or part of sensitive information in data; reversible desensitization means that data after desensitization can be restored, namely original data can be obtained from the desensitization data; and (3) data consistency desensitization, which means that the interrelation among data is kept consistent after data desensitization, and is widely applied to the scenes of system secondary development and the like. These desensitization modes can also be used in combination to desensitize the system to different data requirements.
The main ways of data cleaning and denoising are: manual inspection, regular expressions, statistical models, clustering, and the like. The regular expression cleaning and denoising method is a commonly used method, and mainly comprises the step of manually setting a character string matching rule to match and remove a target character string.
The general scheme of desensitization, cleaning and denoising is shown in fig. 1 and is mainly realized by writing a Python script program, and the main algorithm flow of the program is as follows: firstly, reading original data of an artificial customer service session in a cloud ERP ecological community, then adding a sensitive information category to be desensitized, reading customer information and setting a sensitive information replacement rule, and traversing and desensitizing the original data to obtain desensitized text data on the basis; adding the types of the noise to be cleaned, reading the required information, setting a corresponding regular expression cleaning function, and then traversing the data for cleaning. And finally, acquiring a knowledge base triple after desensitization, cleaning and denoising and customer service dialogue data, and storing the knowledge base triple and the customer service dialogue data into a database.
The implementation process of the data desensitization and cleaning denoising Python program is shown in Table 1:
TABLE 1 data desensitization and cleaning De-noising procedure implementation Process
Fig. 2 shows the original data of the manual customer service session, in which the privacy information such as date information, website resource information, customer service name and job number, customer name and company, etc. are replaced with "+". Fig. 3 shows the result of desensitization, cleaning and denoising of original data, as shown in the figure, front-end markup language, irrelevant information and the like in text data are removed through the processing of a Python desensitization denoising script program designed herein, and processed manual customer service data is stored by taking a session sample (session) formed by multiple rounds of conversations as a unit.
(2) And establishing a cloud ERP community customer service data driving service. The flow is shown in fig. 4, and a generated question-answering model facing a cloud ERP community is supported by establishing an artificial customer service data driving service of a real scene. Firstly, extracting the cloud ERP artificial customer service session data processed in the step (1) from a database. Then extracting relevant information therein, such as: the system comprises session role information, session turn information, cloud ERP professional field information and the like. And establishing a customer service session data set on the basis, and encapsulating the data set and the extracted information. And finally initializing and starting customer service data driving service.
(3) And constructing the unsupervised training based on the Transformer. Unsupervised pre-training is a special case of semi-supervised learning, with the goal of findingA good initialization point rather than modifying the supervised learning objective. Early work explored the use of this technique in image classification and regression tasks. Research shows that pre-training as a regularization scheme can achieve better generalization in deep neural networks. Given an unsupervised markup corpus U ═ U 1 ,...,u n We use standard language modeling objectives to maximize the following possibilities, as shown in equation 1:
L 1 (U)=∑ i logP(u i |u 1 ,......,u k-1 ;θ)
(1)
where k is the size of the context window and the conditional probability P uses a neural network modeling parameter of θ. These parameters were trained using random gradient descent. A multi-layered transform decoder, which is a variant of the transform, is then used as the language model. The model applies a multi-headed self-attention operation on the input context markers, followed by a position feed forward layer, to produce an output distribution on the target marker as follows:
m 0 =UW e +W p
(2)
in which words of n words are embedded in W e Plus position embedding W p The n outputs respectively predict the next word at that position, input into the transform. M for input 0 Denotes that 0 denotes the initial input layer, m 0 The GPT is a unidirectional language model. To obtain m 0 Then m is put 0 Sequentially transmitting the signals into all decoders of a transform to finally obtain m n . And finally by the following formula:
P(u)=Softmax(mnW e T ) (4)
m to be finally obtained in the formula n And inputting a softmax function for solving to obtain a final unsupervised pre-training result.
(4) A GPT2 model based on a Transformer decoder was constructed. GPT (genetic Pre-Training) is Generative Pre-Training, a GPT model is mainly composed of two stages of Training tasks, a language model is obtained in the first stage through unsupervised Training of massive text data, and various downstream tasks can be changed in the second stage, such as: text classification, text relevance, text generation, text labeling, etc., and then fine-tuning of the supervised mode is performed according to these tasks. The goal of GPT2 is to train a generic natural language processing model, unlike GPT, GPT2 does not alter downstream tasks, but rather automatically identifies tasks that need to be completed. Unlike the BERT model built by the transform encoder module, GPT2 is built by the transform decoder module. As shown in fig. 5, the overall model structure of GPT2 is the same as that of GPT, and one word (token) is output at a time, and the use of the unidirectional language model makes it more effective in generating question and answer text. The newly generated token will be spliced immediately after the previously generated sequence and become the new input. This token processing mechanism of GPT2 is also referred to as auto-regression (auto-regression). Subsequent emerging models, TransformerXL and XLNet, are essentially based on this autoregressive mechanism. This autoregressive mechanism also enhances the effect of GPT2 on text generation, so GPT2 is used herein as a sequence-to-sequence model for generating questions and answers.
It can be seen that GPT2 stacked 12 transform decoders, unlike GPT and BERT, GPT2 put layer normalization (LayerNorm) before multi-headed attention, and the residual join locations remain unchanged. And is adjusted according to the network depth at parameter initialization, with additional layer normalization added after the last transform decoder. The masked multi-head attention layer (masked multi-head attention) in the transform decoder is used to mask future predicted character positions in the sequence, with the multi-head attention rule still consistent with the transform encoder, and each attention layer holds the corresponding key and value vector for the current position token. GPT2 explores a larger-scale model, the maximum parameter can reach 15 hundred million, and the best effect is achieved on various NLP tasks by virtue of pre-training, feedback and prediction under the condition of no fine adjustment. According to the principle and the characteristics of the GPT2, the strong generation capacity of the GPT2 is used for multiple rounds of question and answer generation, and the strong general pre-training capacity of the GPT2 is transferred to the question and answer in the professional field of the cloud ERP ecological community.
The principle of layer normalization (LayerNorm) is shown in formula (5):
in the formula u L Representing the mean, σ, of each dimension of the input L Representing the variance of each dimension of the input, α and β are two learnable parameters that are consistent with the dimension of the computed LayerNorm, e is a constant bias term, x i Is the input vector representation for the ith time step.
Wherein the principle of the future information masking-multi-head attention layer attention calculation is shown as formula (6):
q, K, V represents the query matrix, the key matrix, and the value matrix, respectively, and the attention matrix is obtained by matrix multiplication,expressed as a standard normal distribution, the final attention value was obtained. Repeating the above attention calculation process 8 times represents that 8 attention points are set, and obtaining a plurality of self-attention matrixes. In order not to influence the position of the later-stage predicted answer in the training process, the predicted position is shielded in a shielding (Mask) mode. So as to achieve the effect of shielding future information.
The calculation principle of the FeedForward neural network (feed forward) is shown as the formula (7):
FF(X)=RELU(Linear(Linear(X)) (7)
Linear(X)=wX+b (8)
where FF (X) represents inputting vector X into the feedforward neural network, RELU is the activation function, Linear represents the Linear fully-connected layer where two layers are stacked, and w and b are randomly initialized constant terms.
(5) A GPT 2-based generative question-answering model is constructed, prediction capability is generated by relying on a strong one-way sequence of GPT2, and GPT2 is applied to a cloud ERP field generative question-answering task. The generative question-answer model based on GPT2 is trained on the basis of the architecture of GPT 2. The structure of the text is shown in fig. 6, and in consideration of historical information of multiple rounds of question and answer contexts and multiple rounds of long text characteristics, question and answer corpora of previous rounds are modeled on the same text when a model is designed. And (4) carrying out positive sequence splicing on each session, and predicting the answer of the current turn through the questions of the previous turns and the questions of the current turn. The conditional probability can be expressed as formula 9:
wherein A represents the current question answer of the target prediction; p represents all round history question-answer pairs and current round questions before the current answer; obtaining the prediction probability, x, of the answer of the current question and answer through multiplication of conditional probabilities 1 ,x 2 ,...,x n-1 Representing the first n-1 positions of the input sequence.
The question-answer sentence is first modeled by a two-layer word embedding layer in the figure. The first layer is a symbol embedding layer, "[ CLS ]" represents the beginning of a question, each question and answer text is separated by a "[ SEP ]" symbol mark, and essentially each lattice character represents a feature vector of the current character. The second layer is a segment embedding layer, the role of customer service is represented by s, the role of client is represented by c, and the segment embedding layer is used for carrying out role recognition on the customer service and the text input by the client in modeling. The modeled vectors are then input into a GPT2 question-answer text generation model, the GPT2 masks the information to be predicted, and the predictions are output word by word. And finally, obtaining the final current turn answer in the prediction output layer.
The historical session information in the example is: "customer: why the attachment cannot be uploaded; customer service: prompting what woolen cloth is; customer: the certificate is checked, then the required certificate is found, and then the attachment is clicked, and the result shows that the certificate does not react at all. "the current round predicted answer obtained by the question-answer model generated by GPT2 herein is: and the browser cache is removed, and the tweed is logged in again. "
Based on the steps (1) (2) (3) (4) (5), the general flow of our cloud ERP community generation type question-answering method based on GPT2 is shown in fig. 7. The cloud ERP community customer service data driving service is established, and a generated question-answering model facing a cloud ERP ecological community is supported through an artificial customer service data driving service for establishing a real scene. Firstly, desensitized cloud ERP artificial customer service session data are extracted from a database. Then extracting relevant information therein, such as: the system comprises session role information, session turn information, cloud ERP professional field information and the like. And establishing a customer service session data set on the basis, and encapsulating the data set and the extracted information. And finally initializing and starting customer service data driving service. A generative question-answering model based on GPT 2. The cloud ERP session data training system is formed by training of massive cloud ERP manual customer service session data, already contains large-scale weight parameters, and fully extracts features in the cloud ERP field. The overall multi-turn conversation question-answering process comprises the following steps: first the user question asking action triggers the identification of a "start session" and the user's first turn of question text is entered into the GPT2 answer generating model. And then the model gives feedback of generated answers and feeds the feedback to the user. On the basis, after receiving answer feedback, the user continues to trigger the question action mark and inputs the current turn question text. After cycling n rounds in this way, the last user ends the session. In the whole session process, the generated historical session records are fed back to the ongoing session turns through the cloud ERP community customer service data driving service. And then, analyzing and modeling the historical conversation data and the current turn question text together, and finally obtaining the current turn reply containing the historical semantic information.
And finally, performing experimental verification, wherein the experimental verification is performed on a data set constructed by cloud ERP artificial customer service data to prove the effectiveness of the generated question-answering model. The data set comprises 12 ten thousand pieces of real cloud ERP ecological community artificial customer service session data provided by a cloud ERP developer. Through data desensitization processing, a Python script file is compiled, and company names, client service names and job numbers in the cloud ERP manual client service data are desensitized. And obtaining real artificial customer service data after desensitization for establishing a session data set. An example data set is shown in table 2 below:
table 2 example of session data
From table 2, it can be seen that a session includes multiple rounds of conversations between the customer service and the customer, and the manual customer service can combine the contexts to solve the questions for the customer, so the text data set is constructed by taking the session as a unit, and a session sets multiple question-answer pairs for the training of the GPT2 generative model. Statistical analysis of the data set is listed in table 3:
TABLE 3 cloud ERP generated question-answer dataset
And (4) setting Evaluation indexes and experimental parameters, wherein a GPT2 generation type question-answering module adopts a BLEU (Bilngual Evaluation understudy) value to evaluate the effect of generating multiple rounds of question-answering. In this document, the generated multiple rounds of question-answering sentences and the original corpus are subjected to BLEU value calculation, and the result is used as an evaluation index of the GPT2 generation formula question-answering module. The BLEU value is evaluated by accuracy, and if given a standard answer, GPT2 generates an answer that is a candidate (candidate), the sentence length is n, m words in candidate appear in the answer, and m divided by n is the 1-gram calculation formula of BLEU. BLEU has many variants, which can be divided into many evaluation indexes according to n-gram, wherein the common indexes are BLEU-1, BLEU-2, BLEU-3 and BLEU-4, and n-gram refers to the number of continuous words as n. BLEU-1 measures word-level accuracy, and higher-order BLEUs measure fluency in sentences. The calculation formula is as shown in formula (6):
in the formula, candidate represents the answer generated by GPT2, the first summation symbol in the numerator counts all candidates, and the second summation symbol counts all n-grams in one candidate, which represents the number of n-grams in the standard answer. The whole molecule indicates how many n-gram words appear in the standard answer in a given candidate. The first two summations in the denominator have the same meaning as in the numerator, and Count (ngram ') represents the number of ngram' in candidate.
Experiment parameter setting, because the GPT2 model is too huge, the experiment training environment is built in a mode of renting a GPU server, and specific experiment environment parameters are as shown in table 4:
TABLE 4 Experimental Environment for generating question-answer model
The experimental parameter settings for the generated question-answering model based on GPT2 are shown in table 5:
TABLE 5 partial Experimental parameters of the generated question-answer model
And (4) analyzing experimental results, after the GPT2 generated question-answer model is trained, simulating man-machine conversation, inputting the questions of the data set into the model one by one, carrying out conversation, and carrying out BLUE value calculation on the obtained reply and the original reply of the data set so as to evaluate the fitting property of the generated conversation statement and the original conversation corpus. The training results are as follows:
TABLE 6 GPT2 training results of the generated question-answer experiment
As can be seen from Table 6, the quantity of parameters of the GPT 2-based generated question-answering model is nearly nine million, 60 epochs of training are completed on a 16G P100 video card, the highest accuracy rate in the training is 92.13%, the lowest loss is 29.34%, and a good model training effect is achieved.
Table 7 evaluation results of the generated question-answer model of GPT2
All the finally obtained BLUE values are averaged, meanwhile, the levels in the range of 1 to 4 words are calculated to obtain corresponding BLEU values as shown in a table 7, the questions and answers generated by GPT2 are good in performance in the data set, the BLEU value reaches 20%, and compared with the BLEU value in the range of 1 to 4 words of the existing GRU model based on seq2seq, the BLEU value is greatly improved. In consideration of the problems of uncontrollable generated question and answer contents and the fact that a large amount of training data is needed to ensure the accuracy of the model, the model obtained by training relatively few cloud ERP customer service session data sets is tested through simulation dialogue, and a good effect is achieved.
The above embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, it should be understood that the above embodiments are merely exemplary embodiments of the present invention and are not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (6)
1. The cloud ERP community generation type question-answering method based on the GPT2 is characterized by comprising the following steps:
(1) desensitizing, cleaning and denoising the cloud ERP community artificial customer service data;
(2) establishing cloud ERP community customer service data driving service;
(3) constructing unsupervised training based on a Transformer;
(4) constructing a GPT2 model based on a Transformer decoder;
(5) and constructing a generative question-answering model based on GPT 2.
2. The GPT 2-based cloud ERP community generation-based question answering method according to claim 1, wherein the algorithm flow of desensitization, cleaning and denoising is as follows: firstly, reading a knowledge base and artificial customer service session original data in a cloud ERP ecological community, then adding a sensitive information category to be desensitized, reading customer information and setting a sensitive information replacement rule, and traversing and desensitizing the original data to obtain desensitized text data on the basis; adding the noise category to be cleaned, reading required information, setting a corresponding regular expression cleaning function, traversing data for cleaning, finally obtaining a knowledge base triple after desensitization cleaning and denoising and customer service dialogue data, and storing the triple and the customer service dialogue data in a database.
3. The cloud ERP community generation type question answering method based on the GPT2, according to the claim 1, wherein in the step (2), the cloud ERP artificial customer service session data processed in the step (1) are extracted from a database, then relevant information in the cloud ERP artificial customer service session data is extracted, a customer service session data set is established on the basis, the data set and the extracted information are packaged, and finally customer service data driving service is initialized and started.
4. The GPT 2-based cloud ERP community-generated question-answering method according to claim 1, wherein in the step (3), an unsupervised markup corpus is given U ═ U ═ 1 ,...,u n Using a standard language modeling target to maximize the following likelihood, as in equation 1:
L 1 (U)=∑ i logP(u i |u 1 ,......,u k-1 ;θ) (1)
where k is the size of the context window and the conditional probability P is a network using neural network modeling parameters θ, trained using stochastic gradient descent, then using a multi-layered transform decoder as the language model, the decoder being a variant of the transform, which applies a multi-headed self-attention operation on the input context markers, followed by a position feedforward layer, to produce an output distribution on the target markers as follows:
m 0 =UW e +W p
(2)
in which words of n words are embedded in W e Plus position embedding W p Inputting into the Transformer, n outputs each predict the next word at the position, inputting m 0 Denotes that 0 denotes the initial input layer, m 0 The calculation formula can show that GPT is a unidirectional language model, and m is obtained 0 Then, m is put 0 Sequentially transmitting the signals into all decoders of a transform to finally obtain m n Finally by the following formula:
P(u)=Softmax(m n W e T ) (4)
m to be finally obtained in the formula n And inputting a softmax function for solving to obtain a final unsupervised pre-training result.
5. The GPT 2-based cloud ERP community generation type question-answering method according to claim 1, wherein the principle of layer normalization of GPT2 in the step (4) is shown in formula (5):
in the formula u L Representing the mean, σ, of each dimension of the input L Representing the variance of each dimension of the input, α and β are two learnable parameters that are consistent with the dimension of the computed LayerNorm, e is a constant bias term, x i Is the input vector representation for the ith time step:
wherein the principle of the future information masking-multi-head attention layer attention calculation is shown as formula (6):
q, K, V respectively represents inquiry matrix, key matrix and value matrix, and the attention matrix is obtained by matrix multiplication,expressed as a standard normal distribution, resulting in a final attention value;
the calculation principle of the FeedForward neural network (feed forward) is shown as the formula (7):
FF(X)=RELU(Linear(Linear(X)) (7)
Linear(X)=wX+b
(8)
where FF (X) represents inputting vector X into the feedforward neural network, RELU is the activation function, Linear represents the Linear fully-connected layer where two layers are stacked, and w and b are randomly initialized constant terms.
6. The GPT 2-based cloud ERP community generation-based question-answering method according to claim 1, wherein in the step (5), the question-answering corpus of the previous turn is modeled on the same text in the model design, each session is spliced in a forward order, the answer of the current turn is predicted through the previous turns of question-answering and the current turn of questions, and the conditional probability can be expressed as formula 9:
wherein A represents the answer to the current question predicted by the target; p represents all round history question-answer pairs and current round questions before the current answer; obtaining the current question-answer prediction probability x by multiplying the conditional probabilities 1 ,x 2 ,...,x n-1 Representing the first n-1 positions of the input sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210596783.9A CN115062003B (en) | 2022-05-26 | 2022-05-26 | Cloud ERP community generation type question-answering method based on GPT2 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210596783.9A CN115062003B (en) | 2022-05-26 | 2022-05-26 | Cloud ERP community generation type question-answering method based on GPT2 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115062003A true CN115062003A (en) | 2022-09-16 |
CN115062003B CN115062003B (en) | 2024-04-16 |
Family
ID=83199331
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210596783.9A Active CN115062003B (en) | 2022-05-26 | 2022-05-26 | Cloud ERP community generation type question-answering method based on GPT2 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115062003B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116468298A (en) * | 2023-06-12 | 2023-07-21 | 江西五十铃汽车有限公司 | GPT network model-based automobile technology planning and decision-making method and system |
CN117252251A (en) * | 2023-11-20 | 2023-12-19 | 新华三技术有限公司 | Private domain data generation method, device, equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110472242A (en) * | 2019-08-05 | 2019-11-19 | 腾讯科技(深圳)有限公司 | A kind of text handling method, device and computer readable storage medium |
CN112214591A (en) * | 2020-10-29 | 2021-01-12 | 腾讯科技(深圳)有限公司 | Conversation prediction method and device |
CN112364150A (en) * | 2021-01-12 | 2021-02-12 | 南京云创大数据科技股份有限公司 | Intelligent question and answer method and system combining retrieval and generation |
US20210174162A1 (en) * | 2019-12-09 | 2021-06-10 | Salesforce.Com, Inc. | Spatial-Temporal Reasoning Through Pretrained Language Models for Video-Grounded Dialogues |
CN113536809A (en) * | 2021-05-24 | 2021-10-22 | 清华大学 | Semantic-based unsupervised common sense question-answering method and system |
US20210342380A1 (en) * | 2020-04-29 | 2021-11-04 | International Business Machines Corporation | Generative ontology learning and natural language processing with predictive language models |
CN113657124A (en) * | 2021-07-14 | 2021-11-16 | 内蒙古工业大学 | Multi-modal Mongolian Chinese translation method based on circulation common attention Transformer |
US20220084510A1 (en) * | 2020-09-15 | 2022-03-17 | Microsoft Technology Licensing, Llc | Synthetic data generation for training of natural language understanding models |
-
2022
- 2022-05-26 CN CN202210596783.9A patent/CN115062003B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110472242A (en) * | 2019-08-05 | 2019-11-19 | 腾讯科技(深圳)有限公司 | A kind of text handling method, device and computer readable storage medium |
US20210174162A1 (en) * | 2019-12-09 | 2021-06-10 | Salesforce.Com, Inc. | Spatial-Temporal Reasoning Through Pretrained Language Models for Video-Grounded Dialogues |
US20210342380A1 (en) * | 2020-04-29 | 2021-11-04 | International Business Machines Corporation | Generative ontology learning and natural language processing with predictive language models |
US20220084510A1 (en) * | 2020-09-15 | 2022-03-17 | Microsoft Technology Licensing, Llc | Synthetic data generation for training of natural language understanding models |
CN112214591A (en) * | 2020-10-29 | 2021-01-12 | 腾讯科技(深圳)有限公司 | Conversation prediction method and device |
CN112364150A (en) * | 2021-01-12 | 2021-02-12 | 南京云创大数据科技股份有限公司 | Intelligent question and answer method and system combining retrieval and generation |
CN113536809A (en) * | 2021-05-24 | 2021-10-22 | 清华大学 | Semantic-based unsupervised common sense question-answering method and system |
CN113657124A (en) * | 2021-07-14 | 2021-11-16 | 内蒙古工业大学 | Multi-modal Mongolian Chinese translation method based on circulation common attention Transformer |
Non-Patent Citations (3)
Title |
---|
JASON WESTON等: "Normformer: improved transformer pretraining with extra normalization", 《ARXIV》, 1 November 2021 (2021-11-01), pages 1 - 14 * |
王磊: "基于语义的人机交互系统设计", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 December 2021 (2021-12-15), pages 136 - 91 * |
黄明彤: "面向云ERP生态社区的知识库检索生成式智能问答方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 January 2023 (2023-01-15), pages 138 - 3942 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116468298A (en) * | 2023-06-12 | 2023-07-21 | 江西五十铃汽车有限公司 | GPT network model-based automobile technology planning and decision-making method and system |
CN116468298B (en) * | 2023-06-12 | 2023-11-03 | 江西五十铃汽车有限公司 | GPT network model-based automobile technology planning and decision-making method and system |
CN117252251A (en) * | 2023-11-20 | 2023-12-19 | 新华三技术有限公司 | Private domain data generation method, device, equipment and storage medium |
CN117252251B (en) * | 2023-11-20 | 2024-03-12 | 新华三技术有限公司 | Private domain data generation method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN115062003B (en) | 2024-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wu et al. | Social media opinion summarization using emotion cognition and convolutional neural networks | |
CN108255805B (en) | Public opinion analysis method and device, storage medium and electronic equipment | |
Shilpa et al. | Sentiment analysis using deep learning | |
CN111783474A (en) | Comment text viewpoint information processing method and device and storage medium | |
Cai et al. | nCoder+: a semantic tool for improving recall of nCoder coding | |
CN116992005B (en) | Intelligent dialogue method, system and equipment based on large model and local knowledge base | |
CN108170848B (en) | Chinese mobile intelligent customer service-oriented conversation scene classification method | |
CN112784532B (en) | Multi-head attention memory system for short text sentiment classification | |
CN115062003B (en) | Cloud ERP community generation type question-answering method based on GPT2 | |
US20230169271A1 (en) | System and methods for neural topic modeling using topic attention networks | |
CN115062145A (en) | Cloud ERP community cross-domain problem classification method based on BERT-TextCNN | |
Liu et al. | AMFF: A new attention-based multi-feature fusion method for intention recognition | |
US20190228297A1 (en) | Artificial Intelligence Modelling Engine | |
Hon | Artificial neural networks | |
Prakash et al. | Chatterbot implementation using transfer learning and LSTM encoder-decoder architecture | |
Gogate et al. | Random features and random neurons for brain-inspired big data analytics | |
Khadija et al. | Enhancing Indonesian customer complaint analysis: LDA topic modelling with BERT embeddings | |
CN117350271A (en) | AI content generation method and service cloud platform based on large language model | |
Gao | The Advance of GPTs and Language Model in Cyber Security | |
Fecht | Sequential transfer learning in NLP for text summarization | |
CN115374283A (en) | Double-graph attention network-based aspect category emotion classification method | |
CN113342964B (en) | Recommendation type determination method and system based on mobile service | |
Rajapaksha et al. | Explainable Attention Pruning: A Meta-learning-based Approach | |
Cvejoski et al. | Recurrent point review models | |
Wang et al. | Deep and shallow features learning for short texts matching |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |