CN113849634B - Method for improving interpretability of depth model recommendation scheme - Google Patents

Method for improving interpretability of depth model recommendation scheme Download PDF

Info

Publication number
CN113849634B
CN113849634B CN202110225889.3A CN202110225889A CN113849634B CN 113849634 B CN113849634 B CN 113849634B CN 202110225889 A CN202110225889 A CN 202110225889A CN 113849634 B CN113849634 B CN 113849634B
Authority
CN
China
Prior art keywords
text
return visit
user
digest
user return
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110225889.3A
Other languages
Chinese (zh)
Other versions
CN113849634A (en
Inventor
曹靖城
张继东
王培才
仇东平
王猛德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Shilian Technology Co ltd
Original Assignee
Tianyi Shilian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Shilian Technology Co ltd filed Critical Tianyi Shilian Technology Co ltd
Priority to CN202110225889.3A priority Critical patent/CN113849634B/en
Publication of CN113849634A publication Critical patent/CN113849634A/en
Application granted granted Critical
Publication of CN113849634B publication Critical patent/CN113849634B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for improving the interpretability of a depth model recommendation scheme. The method may include: preprocessing the return visit data of the user; training a summary classifier; extracting features of the return visit data; generating a digest using a digest generator and calculating a digest generation loss; calculating a digest classification loss based on the digest classifier; calculating return visit text classification loss; model parameter training update is carried out with the aim of minimizing the sum of abstract generation loss, abstract classification loss and return visit text classification loss; and training and updating based on the model parameters to obtain a service recommendation model. In addition, the invention also provides a method for generating a business recommendation scheme and a user return visit abstract based on the user return visit text, wherein the generated user return visit abstract can be corrected by using a sequence copy mechanism. The invention can obviously improve the accuracy and the interpretability of the user service demand prediction.

Description

Method for improving interpretability of depth model recommendation scheme
Technical Field
The present invention relates to natural language processing, and more particularly, to a method for improving the interpretability of depth model recommendations.
Background
The task of user business demand prediction can be abstracted into a text classification task in natural language processing, and the related algorithm can be used for realizing automatic intention recognition instead of manual recognition operation. The text classification refers to obtaining the corresponding category of the text for relevant judgment according to a corresponding classification algorithm or model for a given unstructured text. The traditional machine learning algorithm extracts text features based on artificial feature engineering, and has certain limitations on accuracy and robustness of text classification. The deep learning algorithm based on the traditional cyclic neural network and the convolutional neural network has higher quality requirement on training data, and a more accurate and effective classification algorithm needs to be researched and selected for realizing the classification and identification of the user intention.
In addition, the problem of deep learning-based interpretation is one of the work that industry now is constantly discussing and studying, also in the field of natural language processing. Currently, existing studies attempt to make models interpretable, typically interpret the model output or the link between the output and the input. However, the conventional user service recommendation method often ignores the return visit summary information of the user for the recommended service, but this way ignores a lot of fine-grained information (such as text interpretation of labels), and the system cannot generate human readable interpretation after recommending the service. The evaluation information is often text information generated by customer service personnel after refining the customer service requirements. If the evaluation content of the user can be accurately predicted and generated, the classification effect of the classifier can be improved at the same time, and the interpretation of the evaluation content as the classification result improves the interpretability and the robustness of the classifier.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
In view of the above-described drawbacks of the prior art, an object of the present invention is to improve accuracy and interpretability of user service demand prediction and improve recommended service quality.
According to a first aspect of the present invention, there is provided a method for training a business recommendation model including a digest classifier, a feature encoder, a digest decoder, and a return text classifier, each implemented using an artificial neural network, the method may include: step (1): obtaining a user return visit text, a user return visit abstract and a service recommendation scheme corresponding to the user return visit text, and carrying out data preprocessing on the user return visit text and the user return visit abstract to obtain a user return visit data set; step (2): training a digest classifier using a user return visit digest in a user return visit dataset and using a service recommendation scheme corresponding to the user return visit digest as a tag; step (3): extracting features of the user return visit text in the user return visit data set by using a feature encoder to obtain a return visit text hidden state vector; step (4): inputting the return text hidden state vector into a digest decoder to obtain a generated digest and calculating a digest generation penalty; step (5): calculating a digest classification penalty based on the trained digest classifier; step (6): calculating a return text classification loss based on the return text classifier; step (7): training and updating parameters of a feature encoder, a abstract decoder and a return text classifier with the aim of minimizing the sum of abstract generation loss, abstract classification loss and return text classification loss; and (3) repeating the steps (3) - (7) until the parameters are converged, thereby completing the training of the service recommendation model.
Alternatively, the user return visit dataset may include a training set, a validation set, and a test set, and the ratio of the training set, the validation set, and the test set may be 6:2:2.
Alternatively, the summary classifier may be trained using a text convolutional neural network (TextCNN) model.
Optionally, step (3) may further include: performing feature extraction on a user return visit text in a user return visit data set by using a Bi-directional long-short-time memory (Bi-LSTM) feature encoder based on an Attention (Attention) mechanism; and obtaining a return text hidden state vector of the user return text after encoding by using a Bi-directional long short time memory (Bi-LSTM) model based on Attention (Attention) mechanism based on the extracted features.
Optionally, step (4) may further include: inputting the coded return text hidden state vector of the user return text into a digest decoder adopting a Long Short Time Memory (LSTM) network to obtain a generated digest; calculating a bilingual evaluation replacement (BLEU) score based on the generated digest and a tag digest, wherein the tag digest is a user return visit digest for which the user return visit text corresponds in the user return visit dataset; and determining a summary generation penalty based on the bilingual evaluation alternative (BLEU) score.
Optionally, step (5) may further include: inputting the tag abstract and the generated abstract into the abstract classifier trained in the step (2) to obtain probability distribution for predicting the business recommendation scheme by using the tag abstract and the generated abstract respectively; taking out the probability corresponding to the real service recommendation scheme from the probability distribution associated with the tag abstract as a first probability; taking out the probability corresponding to the real business recommendation scheme from the probability distribution associated with the generated abstract as a second probability; and calculating an absolute value of a difference between the first probability and the second probability as a digest classification loss.
Optionally, step (6) may further include: inputting the coded return text hidden state vector of the user return text into a return text classifier adopting a text convolutional neural network (textCNN) model to obtain the probability distribution predicted by the service recommendation scheme; taking out the probability corresponding to the real service recommendation scheme from the probability distribution as a third probability; the absolute value of the difference between the second probability and the third probability is calculated as a return text classification penalty.
Optionally, updating of parameters of the feature encoder, the digest decoder, and the return text classifier is stopped when the return text classification loss is below a threshold.
According to a second aspect of the present invention, there is provided a method for generating a user return visit summary and a service recommendation based on user return visit text, the method may comprise: obtaining a service recommendation model trained by the method of the invention; obtaining a user return visit text and carrying out data preprocessing on the user return visit text; and inputting the user return visit text subjected to data preprocessing into the service recommendation model to generate a corresponding user return visit abstract and a service recommendation scheme.
Optionally, the method may further comprise: correcting the generated user return visit abstract; and outputting the revised user return visit summary.
By adopting the technical scheme provided by the invention, the accuracy and the interpretability of the user service demand prediction can be obviously improved.
These and other features and advantages will become apparent upon reading the following detailed description and upon reference to the associated drawings. It is to be understood that both the foregoing general description and the following detailed description are explanatory only and are not restrictive of aspects as claimed.
Drawings
So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only certain typical aspects of this invention and are therefore not to be considered limiting of its scope, for the description may admit to other equally effective aspects.
Fig. 1 illustrates a schematic diagram of a business recommendation model according to one embodiment of the invention.
Fig. 2 illustrates a flowchart of a method for training a business recommendation model according to one embodiment of the invention.
Fig. 3 illustrates a flow chart of a method for obtaining a user return summary and a business recommendation scheme based on user return text using a trained business recommendation model, according to one embodiment of the invention.
Fig. 4 illustrates a block diagram of an apparatus for implementing a method according to the invention, according to one embodiment of the invention.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are used in the description of the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some examples or embodiments of the present invention, and it is possible for those of ordinary skill in the art to apply the present invention to other similar situations according to these drawings without inventive effort. Unless otherwise apparent from the context of the language or otherwise specified, like reference numerals in the figures refer to like structures or operations.
As used in the specification and in the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. Generally, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.
A flowchart is used in the present invention to describe the operations performed by methods according to embodiments of the present invention. It should be understood that the preceding or following operations are not necessarily performed in order precisely. Rather, the various steps may be processed in reverse order or simultaneously. At the same time, other operations are added to or removed from these processes.
In natural language processing technology, the traditional text classification task and the text generation task are two relatively independent subtasks. The invention trains the two subtask models simultaneously so as to achieve the purpose of unifying the summary content generation and the business recommendation scheme generation. In addition, the traditional deep learning network generally adopts an end-to-end mode to directly predict the text type, belongs to a black box model and has no interpretability. According to the invention, a text generation technology is used for generating the extracted return visit abstract content, so that the recommendation scheme generated by the model can be manually understood according to the return visit abstract, and the model recommendation accuracy is improved.
Fig. 1 illustrates a schematic diagram of a business recommendation model 100 according to one embodiment of the invention. In the present invention, the service recommendation model 100 refers to a model trained using an artificial neural network, which is capable of generating a corresponding user return visit digest (text generation task) and service recommendation scheme (text classification task) based on user return visit text. The user return text refers to content told by the user during the conversation between the customer service person and the user, which can be obtained by voice recognition and conversion into text. The user return visit abstract refers to a generalized description of user return visit text, which may be text information generated by customer service personnel after refining user business requirements. A service recommendation scheme refers to recommending corresponding types of services to a user based on user requirements (e.g., based on the content of the user return text and/or the user return summary). Thus, the user return visit text, the user return visit summary and the business recommendation are associated or corresponding. For example, in one example, the user return text may include "insufficient flow in cell phone packages per month, a lot of excess cost is incurred", the corresponding user return summary may include "insufficient flow in cell phone packages", and the associated business recommendation scheme may include "upgrade cell phone packages". In another example, the user return text may include "slow surfing at home, frequent churning during movie and game play," the corresponding user return summary may include "the surfing speed is not meeting the demand," and the associated business recommendation may include "upgrade bandwidth. In the invention, by adopting the service recommendation model 100, the user return visit abstract and the service recommendation scheme can be automatically generated based on the user return visit text, so that the service recommendation scheme is provided for the user, and the direct readable abstract text is also provided for the staff to help the service analysis.
In one embodiment of the invention, the business recommendation model 100 may include a plurality of components that may include, but are not limited to, a digest classifier, a feature encoder, a digest decoder, and a return text classifier, each implemented using an artificial neural network. The digest classifier may be used to generate a corresponding business recommendation based on the user return visit digest (i.e., to implement a text classification task based on the return visit digest). A feature encoder may be used to perform feature extraction on the entered user return text to obtain a return text hidden state vector. It is known to those skilled in the art that for text generation tasks (e.g., sequence-to-sequence Seq2Seq task), an encoder-decoder architecture is a commonly used architecture in which an encoder receives an input sequence, generates a hidden state vector as encoded information for the input sequence, and a decoder generates an output sequence based on the hidden state vector. Thus, the hidden state vector of the return text in the present invention refers to the hidden state vector generated after the user return text is extracted by the feature encoder feature. The digest decoder may be used to generate a user return digest based on the return text hidden state vector output by the feature encoder (i.e., to implement a text generation task based on the user return text). The return text classifier may be used to generate corresponding business recommendations (i.e., to implement a text classification task based on the user return text) based on the user return text (e.g., based on the return text hidden state vector).
Fig. 2 illustrates a flowchart of a method 200 for training a business recommendation model according to one embodiment of the invention. In some examples, the method 200 may be performed by the apparatus 400 illustrated in fig. 4. In some examples, the method 200 may be performed by any suitable device or means for performing the functions or algorithms described below. As described above, the service recommendation model may include a digest classifier, a feature encoder, a digest decoder, and a return text classifier, each implemented using an artificial neural network. The service recommendation model, namely, the parameters of an artificial neural network for realizing a summary classifier, a feature encoder, a summary decoder and a return visit text classifier are trained to obtain the service recommendation model capable of accurately generating a user return visit summary and a service recommendation scheme based on the user return visit text.
The method 200 may begin at block 210 (i.e., step (1)) where user return text is obtained along with a user return summary and business recommendations corresponding to the user return text, and the user return text and the user return summary are data preprocessed to obtain a user return dataset. As described above, the user return text, user return summary and business recommendation are associated or corresponding, and historical or existing user return text, user return summary and business recommendation may be stored in the database. Thus, in one embodiment, such data may be obtained from a database to use such data to train a business recommendation model. In another embodiment, the data may also be obtained from a computer, storage device, server, or the like.
In one embodiment, the data preprocessing may include building a common article of manufacture, a word list of mood words, and carrying out text replacement and extraction on the user return visit text and the virtual words and the mood words in the user return visit abstract by using the regular expression so as to obtain the effective data of the user return visit information. The effective data may be proportionally divided into a training set, a validation set, and a test set and stored in a file format to obtain a user return visit data set, wherein the ratio of the training set, the validation set, and the test set may be 6:2:2. The validation set and the test set may be used to validate model training scenarios and generalizations.
At block 220 (i.e., step (2)), the method 200 may include training a digest classifier using the user return digest in the user return data set and using as a tag a business recommendation corresponding to the user return digest. After training, the abstract classifier may be used to derive a probability distribution of business recommendation predictions based on the abstract (i.e., to implement text classification tasks). In one embodiment, the operations of block 220 may further comprise: the user return summaries in the user return data set are replaced with word vectors (e.g., word vectors pre-trained using a global vector (GloVe) model of word representations, word vectors pre-trained using a word2vec model, etc.), resulting in a basic embedded representation of the user return summaries, and then a text classifier is trained as a summary classifier using a text convolutional neural network (TextCNN) model with the corresponding business recommendations as labels. Because the short text abstract is used for classification, the performance of the textCNN model is relatively high, and the classification accuracy of more than 95% can be achieved in actual operation, so that the abstract classifier does not participate in subsequent model parameter updating.
At block 230 (i.e., step (3)), the method 200 may include feature extracting the user return text in the user return data set using a feature encoder to obtain a return text hidden state vector. In one embodiment, the operations of block 230 may further comprise: replacing the user return text in the user return data set with a word vector (e.g., a word vector pre-trained using a GloVe model, a word vector pre-trained using a word2vec model, etc.), to obtain a base embedded representation of the user return text; then, a Bi-directional long-short-term memory (Bi-LSTM) feature encoder based on an Attention (Attention) mechanism is used for extracting features of the user return visit text in the user return visit data set; and obtaining the coded return text hidden state vector of the user return text by using a Bi-directional long short time memory (Bi-LSTM) model based on the attribute mechanism based on the extracted features.
At block 240 (i.e., step (4)), method 200 may include inputting the return text hidden state vector into a digest decoder to obtain a generated digest and calculating a digest generation penalty. In the present invention, digest generation loss can be used to evaluate the generation effect of the digest, i.e., the degree of difference between the generated digest generated by the digest decoder and the tag digest. In one embodiment, the operations of block 240 may further comprise: inputting the coded return text hidden state vector of the user return text into a digest decoder adopting a Long Short Time Memory (LSTM) network to obtain a generated digest; calculating a bilingual evaluation replacement (BLEU) score based on the generated digest and a tag digest, wherein the tag digest is a user return visit digest for which the user return visit text corresponds in the user return visit dataset; and determining a summary generation penalty based on the bilingual evaluation alternative (BLEU) score. The BLEU score may be used to evaluate text generated by a set of natural language processing tasks. It is an index for evaluating the difference between the candidate sentence and the reference sentence. Its value range is between 0.0 and 1.0. If the two sentences match perfectly, then BLEU is 1.0; conversely, if the two sentences do not perfectly match, then BLEU is 0.0. In one example, the BLEU score may be calculated in a Bi-gram manner or in any other known manner. In another example, the digest generation loss may be inversely proportional to the BLEU score, i.e., the higher the BLEU score, the smaller the digest generation loss; conversely, the lower the BLEU score, the greater the digest generation loss.
At block 250 (i.e., step (5)), method 200 may include calculating a digest classification loss based on the trained digest classifier. In the present invention, the digest classification penalty can be used to evaluate the effect of inputting the generated digest into the digest classifier to obtain the service recommendation. In one embodiment, the operations of block 250 may further comprise: inputting the tag abstract and the generated abstract into an abstract classifier trained at a block 220 to obtain probability distribution for predicting a service recommendation scheme by using the tag abstract and the generated abstract respectively; taking out the probability corresponding to the real service recommendation scheme from the probability distribution associated with the tag abstract as a first probability; taking out the probability corresponding to the real business recommendation scheme from the probability distribution associated with the generated abstract as a second probability; and calculating a difference between the first probability and the second probability as a digest classification loss. Here, the real service recommendation scheme refers to the service recommendation scheme corresponding to the tag digest obtained at block 210.
At block 260 (i.e., step (6)), the method 200 may include calculating a return text classification penalty based on the return text classifier. In the present invention, return text classification loss can be used to evaluate the effect of using a return text classifier to obtain a business recommendation. In one embodiment, the operations of block 260 may further comprise: inputting the coded return text hidden state vector of the user return text into a return text classifier adopting a text convolutional neural network model to obtain the probability distribution predicted by the service recommendation scheme; taking out the probability corresponding to the real service recommendation scheme from the probability distribution as a third probability; the absolute value of the difference between the second probability and the third probability is calculated as a return text classification penalty. Here, the real service recommendation scheme refers to the service recommendation scheme corresponding to the user return text obtained at block 210.
At block 270 (i.e., step (7)), the method 200 may include training updates to parameters of the feature encoder, the digest decoder, and the return text classifier with the goal of minimizing the sum of the digest generation loss, the digest classification loss, and the return text classification loss. In one embodiment, the sum of the summary classification penalty and the return text classification penalty may be referred to as the EF factor. The feature encoder, digest decoder, and return text classifier may be updated for parameter training using a model parameter optimizer with the goal of minimizing the sum of the EF factor and the digest generation loss. In one embodiment, the model parameter optimizer may be an adaptive moment estimation (Adam) optimizer. In another embodiment, the model parameter optimizer may be any other suitable optimizer, such as an adaptive gradient (AdaGrad) optimizer, RMSProp optimizer, or the like.
At block 280, the method 100 may include determining whether the model parameters converge. If convergence (e.g., the sum of the summary generation penalty, the summary classification penalty, and the return text classification penalty cannot be further reduced or the magnitude of the further reduction is very small), then the method 200 ends, completing the training of the business recommendation model; otherwise, the operations of blocks 230-270 are repeated to continue the training of the business recommendation model. In one embodiment, when the return text classification loss is below a threshold, the updating of parameters of the feature encoder, the digest decoder, and the return text classifier is stopped to prevent the occurrence of shortening, overfitting, etc. of the generated digest text.
Fig. 3 illustrates a flow chart of a method 300 for obtaining a user return summary and a business recommendation scheme based on user return text using a trained business recommendation model, according to one embodiment of the invention.
In some examples, the method 300 may be performed by the apparatus 400 illustrated in fig. 4. In some examples, the method 300 may be performed by any suitable device or means for performing the functions or algorithms described below.
At block 310, the method 300 may include obtaining a business recommendation model trained by the method 200.
At block 320, the method 300 may include obtaining user return text and data preprocessing the user return text. In one embodiment, the user return text may be obtained by speech recognition of the content of the user's speech and conversion to text during the conversation with the user. In one embodiment, the data preprocessing may include building a common article of manufacture, a word list of mood words, and replacing and extracting the text of the virtual word and the word of the mood in the user return visit text by using the regular expression so as to obtain the effective data of the user return visit text.
At block 330, the method 300 may include inputting the data-preprocessed user return text into the business recommendation model to generate a corresponding user return summary and business recommendation scheme.
Optionally, the method 300 may further comprise: correcting the generated user return visit abstract; and outputting the revised user return visit summary. Conventional text generation models typically discard low frequency, professional domain vocabulary to obtain higher text generation scores. For example, words in the input sequence that are not in the vocabulary are typically replaced with < UNK > tags, and < UNK > tags may also appear in the output sequence. In one embodiment, for the output of the generated digest, a digest modifier based on the Soft-Attention mechanism may be used to modify the generated digest content, which uses a sequence copy mechanism to replace the < UNK > tag in the generated digest to solve the problem that low frequency vocabulary may be ignored in the digest generator.
By comparing the method with the traditional service recommendation model algorithm, the feature representation capability of the model is enhanced by introducing the user return visit abstract based on the EF factor and the sequence copying mechanism, text information which can be read by service personnel is generated by refining, the interpretation of the model is improved, the method has important significance for service recommendation of the service personnel, and meanwhile, obvious growth occurs on multiple indexes. The model evaluation adopts accuracy, recall, AUC value and log loss, wherein the correct recommendation frequency/total recommendation frequency of model prediction is adopted as an accuracy index, the correct and successful recommendation/recommendation marketing frequency of model prediction is adopted as a recall index, the sorting capacity of positive and negative samples (the success rate of the samples predicted to be positive is higher as the AUC value is larger) of the model is calculated by using the AUC value, and the fitting capacity of the model is evaluated by using the log loss (the smaller the AUC value is, the better the fitting degree is indicated).
Specific indices are shown in Table 1 below:
recall rate of recall Accuracy rate of AUC log loss
The invention is that 0.2364 0.3563 0.2523 0.3824
MLP 0.1934 0.3021 0.1967 0.3921
RNNs 0.2039 0.3114 0.1992 0.3974
CNNs 0.2014 0.3301 0.2143 0.3945
FM 0.1984 0.2945 0.1967 0.4245
GBDT+LR 0.1934 0.3209 0.1932 0.4394
Table 1 comparison of the method of the present invention with conventional service recommendation model algorithm
Fig. 4 illustrates a block diagram of an example of a hardware implementation of an apparatus 400 for implementing a method according to the invention, according to one embodiment of the invention. The apparatus 400 may be implemented using a processing system 414 that includes one or more processors 404. Examples of processor 404 include microprocessors, microcontrollers, digital Signal Processors (DSPs), field Programmable Gate Arrays (FPGAs), programmable Logic Devices (PLDs), state machines, gate logic, discrete hardware circuits, and other suitable hardware configured to perform the various functionality described throughout this disclosure. In various examples, the apparatus 400 may be configured to perform any one or more of the functions described herein. That is, the processor 404 as utilized in the apparatus 400 may be used to implement the method 200 described above with reference to fig. 2 and/or the method 300 described with reference to fig. 3.
In this example, processing system 414 may be implemented with a bus architecture, represented generally by bus 402. Bus 402 may include any number of interconnecting buses and bridges depending on the specific application of processing system 414 and the overall design constraints. Bus 402 communicatively couples together various circuitry including one or more processors (represented generally by processor 404), memory 405, and computer-readable media (represented generally by computer-readable media 406). Bus 402 may also link various other circuits such as timing sources, peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further. Bus interface 408 provides an interface between bus 402 and transceiver 410. Transceiver 310 provides a communication interface or means for communicating with various other apparatus over a transmission medium. Depending on the characteristics of the device, a user interface 412 (e.g., keypad, display, speaker, microphone, joystick) may also be provided. Of course, such user interfaces 412 are optional and may be omitted in some examples.
In some aspects, the processor 404 may be configured to: obtaining a user return visit text, a user return visit abstract and a service recommendation scheme corresponding to the user return visit text, and carrying out data preprocessing on the user return visit text and the user return visit abstract to obtain a user return visit data set; training a digest classifier using a user return visit digest in a user return visit dataset and using a service recommendation scheme corresponding to the user return visit digest as a tag; extracting features of the user return visit text in the user return visit data set by using a feature encoder to obtain a return visit text hidden state vector; inputting the return text hidden state vector into a digest decoder to obtain a generated digest and calculating a digest generation penalty; calculating a digest classification penalty based on the trained digest classifier; calculating a return text classification loss based on the return text classifier; and training and updating parameters of the feature encoder, the abstract decoder and the return text classifier with the aim of minimizing the sum of the abstract generation loss, the abstract classification loss and the return text classification loss until the parameters are converged, thereby completing the training of the service recommendation model.
In other aspects, the processor 404 may be configured to: obtaining a service recommendation model trained by the method of the invention; obtaining a user return visit text and carrying out data preprocessing on the user return visit text; and inputting the user return visit text subjected to data preprocessing into the service recommendation model to generate a corresponding user return visit abstract and a service recommendation scheme.
The processor 404 is responsible for managing the bus 402 and general-purpose processing, including the execution of software stored on the computer-readable medium 406. The software, when executed by the processor 404, causes the processing system 414 to perform the various functions described for any particular apparatus. Computer-readable medium 406 and memory 405 may also be used for storing data that is manipulated by processor 404 when executing software.
One or more processors 404 in a processing system may execute software. Software should be construed broadly to mean instructions, instruction sets, code segments, program code, programs, subroutines, software modules, applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether described in software, firmware, middleware, microcode, hardware description language, or other terminology. The software may reside on a computer readable medium 406. Computer readable medium 306 may be a non-transitory computer readable medium. By way of example, non-transitory computer-readable media include magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips), optical disks (e.g., compact Disk (CD) or Digital Versatile Disk (DVD)), smart cards, flash memory devices (e.g., card, stick, or key drive), random Access Memory (RAM), read Only Memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), registers, removable disk, and any other suitable medium for storing software and/or instructions that can be accessed and read by a computer. The computer readable medium 406 may reside in the processing system 414, external to the processing system 414, or distributed across multiple entities including the processing system 414. Computer readable medium 406 may be embodied in a computer program product. By way of example, a computer program product may include a computer readable medium in an encapsulating material. Those skilled in the art will recognize how to best implement the described functionality presented throughout this disclosure depending on the particular application and overall design constraints imposed on the overall system.
In one or more examples, the computer-readable storage medium 406 may include software configured for various functions, including, for example, functions for training a business recommendation model and/or functions for generating user return summaries and business recommendation schemes based on user return text using a trained business recommendation model. The software may include instructions that may configure the processing system 414 to perform one or more functions described with reference to fig. 2 and/or 3.
In the description of the present invention, it should be understood that the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
It will be appreciated by one of ordinary skill in the art that various embodiments of the present invention may be provided as a method, apparatus, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the invention may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-executable program code stored therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, systems and computer program products according to embodiments of the invention. It will be understood that each flowchart and/or block of the flowchart illustrations and/or block diagrams, and combinations of flowcharts and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Although aspects of the present invention have been described so far with reference to the accompanying drawings, the above-described methods, systems and apparatuses are merely examples, and the scope of the present invention is not limited to these aspects but is limited only by the appended claims and equivalents thereof. Various components may be omitted or replaced with equivalent components. In addition, the steps may also be implemented in a different order than described in the present invention. Furthermore, the various components may be combined in various ways. It is also important that as technology advances, many of the described components can be replaced by equivalent components that appear later. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for training a business recommendation model comprising a digest classifier, a feature encoder, a digest decoder, and a return text classifier implemented using an artificial neural network, respectively, the method comprising:
step (1): obtaining a user return visit text, a user return visit abstract and a service recommendation scheme corresponding to the user return visit text, and carrying out data preprocessing on the user return visit text and the user return visit abstract to obtain a user return visit data set;
step (2): training the abstract classifier using a user return visit abstract in the user return visit dataset and using a business recommendation scheme corresponding to the user return visit abstract as a tag;
step (3): extracting features of the user return visit text in the user return visit data set by using the feature encoder to obtain a return visit text hidden state vector;
step (4): inputting the return text hidden state vector to the digest decoder to obtain a generated digest and calculate a digest generation penalty;
step (5): calculating a summary classification loss based on the trained summary classifier;
step (6): calculating a return text classification loss based on the return text classifier;
step (7): training and updating parameters of the feature encoder, the digest decoder and the return text classifier with the aim of minimizing the sum of the digest generation loss, the digest classification loss and the return text classification loss; and
and (3) repeating the steps (3) - (7) until the parameters are converged, thereby completing the training of the service recommendation model.
2. The method of claim 1, wherein the user return data set comprises a training set, a validation set, and a test set, the ratio of the training set, the validation set, and the test set being 6:2:2.
3. The method of claim 1, wherein the summary classifier is trained using a text convolutional neural network model.
4. The method of claim 1, wherein step (3) further comprises:
extracting features of user return visit text in the user return visit dataset by using a Bi-directional long-short-term memory (Bi-LSTM) feature encoder based on an attention mechanism; and
a recall text hidden state vector is obtained for the encoded user recall text based on the extracted features using a attention-based two-way long short term memory (Bi-LSTM) network.
5. The method of claim 4, wherein step (4) further comprises:
inputting the coded return text hidden state vector of the user return text into a digest decoder adopting a Long Short Time Memory (LSTM) network to obtain a generated digest;
calculating a bilingual evaluation replacement (BLEU) score based on the generated summary and a tag summary, wherein the tag summary is a user return summary corresponding to the user return text in the user return dataset; and
the summary generation penalty is determined based on the bilingual evaluation replacement (BLEU) score.
6. The method of claim 5, wherein step (5) further comprises:
inputting the tag abstract and the generated abstract into the abstract classifier trained in the step (2) to obtain probability distribution for predicting a service recommendation scheme by using the tag abstract and the generated abstract respectively;
taking out the probability corresponding to the real business recommendation scheme from the probability distribution associated with the tag abstract as a first probability;
taking out the probability corresponding to the real business recommendation scheme from the probability distribution associated with the generated abstract as a second probability; and
and calculating an absolute value of a difference between the first probability and the second probability as a digest classification loss.
7. The method of claim 6, wherein step (6) further comprises:
inputting the coded return text hidden state vector of the user return text into a return text classifier adopting a text convolutional neural network model to obtain probability distribution predicted by a service recommendation scheme;
taking out the probability corresponding to the real service recommendation scheme from the probability distribution as a third probability;
and calculating the absolute value of the difference between the second probability and the third probability as a return visit text classification loss.
8. The method of claim 1, wherein updating of the parameters is stopped when the return text classification loss is below a threshold.
9. A method for generating a user return visit summary and a business recommendation based on user return visit text, the method comprising:
obtaining a business recommendation model trained by the method of any one of claims 1-8;
obtaining a user return visit text and carrying out data preprocessing on the user return visit text; and
and inputting the user return visit text subjected to data preprocessing into the service recommendation model to generate a corresponding user return visit abstract and a service recommendation scheme.
10. The method of claim 9, wherein the method further comprises:
correcting the generated user return visit abstract; and
and outputting the revised user return visit abstract.
CN202110225889.3A 2021-03-01 2021-03-01 Method for improving interpretability of depth model recommendation scheme Active CN113849634B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110225889.3A CN113849634B (en) 2021-03-01 2021-03-01 Method for improving interpretability of depth model recommendation scheme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110225889.3A CN113849634B (en) 2021-03-01 2021-03-01 Method for improving interpretability of depth model recommendation scheme

Publications (2)

Publication Number Publication Date
CN113849634A CN113849634A (en) 2021-12-28
CN113849634B true CN113849634B (en) 2024-04-16

Family

ID=78972833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110225889.3A Active CN113849634B (en) 2021-03-01 2021-03-01 Method for improving interpretability of depth model recommendation scheme

Country Status (1)

Country Link
CN (1) CN113849634B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114970552B (en) * 2022-07-27 2022-10-11 成都乐超人科技有限公司 User return visit information analysis method, device, equipment and medium based on micro-service

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947931A (en) * 2019-03-20 2019-06-28 华南理工大学 Text automatic abstracting method, system, equipment and medium based on unsupervised learning
CN110929030A (en) * 2019-11-07 2020-03-27 电子科技大学 Text abstract and emotion classification combined training method
CN111639176A (en) * 2020-05-29 2020-09-08 厦门大学 Real-time event summarization method based on consistency monitoring

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11748613B2 (en) * 2019-05-10 2023-09-05 Baidu Usa Llc Systems and methods for large scale semantic indexing with deep level-wise extreme multi-label learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947931A (en) * 2019-03-20 2019-06-28 华南理工大学 Text automatic abstracting method, system, equipment and medium based on unsupervised learning
CN110929030A (en) * 2019-11-07 2020-03-27 电子科技大学 Text abstract and emotion classification combined training method
CN111639176A (en) * 2020-05-29 2020-09-08 厦门大学 Real-time event summarization method based on consistency monitoring

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于预训练BERT模型的客服工单自动分类研究;任莹;;云南电力技术;20200215(第01期);全文 *

Also Published As

Publication number Publication date
CN113849634A (en) 2021-12-28

Similar Documents

Publication Publication Date Title
CN108710704B (en) Method and device for determining conversation state, electronic equipment and storage medium
CN115310425B (en) Policy text analysis method based on policy text classification and key information identification
CN111783993A (en) Intelligent labeling method and device, intelligent platform and storage medium
CN115357719B (en) Power audit text classification method and device based on improved BERT model
CN115599901B (en) Machine question-answering method, device, equipment and storage medium based on semantic prompt
CN113791757A (en) Software requirement and code mapping method and system
CN111950295A (en) Method and system for training natural language processing model
CN114091466A (en) Multi-modal emotion analysis method and system based on Transformer and multi-task learning
CN112349294A (en) Voice processing method and device, computer readable medium and electronic equipment
CN115359321A (en) Model training method and device, electronic equipment and storage medium
CN113849634B (en) Method for improving interpretability of depth model recommendation scheme
CN115293794A (en) Software cost evaluation method and system based on intelligent scale recognition
CN115099310A (en) Method and device for training model and classifying enterprises
CN111368066B (en) Method, apparatus and computer readable storage medium for obtaining dialogue abstract
CN114817467A (en) Intention recognition response method, device, equipment and storage medium
CN113486174A (en) Model training, reading understanding method and device, electronic equipment and storage medium
CN113705207A (en) Grammar error recognition method and device
CN116842263A (en) Training processing method and device for intelligent question-answering financial advisor model
CN115936003A (en) Software function point duplicate checking method, device, equipment and medium based on neural network
CN113297385B (en) Multi-label text classification system and method based on improved GraphRNN
CN115617959A (en) Question answering method and device
CN112860843A (en) News long text sentiment analysis method and device
CN113570455A (en) Stock recommendation method and device, computer equipment and storage medium
CN113837910B (en) Test question recommending method and device, electronic equipment and storage medium
CN116414965B (en) Initial dialogue content generation method, device, medium and computing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220208

Address after: Room 1423, No. 1256 and 1258, Wanrong Road, Jing'an District, Shanghai 200072

Applicant after: Tianyi Digital Life Technology Co.,Ltd.

Address before: 201702 3rd floor, 158 Shuanglian Road, Qingpu District, Shanghai

Applicant before: Tianyi Smart Family Technology Co.,Ltd.

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240315

Address after: Unit 1, Building 1, China Telecom Zhejiang Innovation Park, No. 8 Xiqin Street, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province, 311100

Applicant after: Tianyi Shilian Technology Co.,Ltd.

Country or region after: China

Address before: Room 1423, No. 1256 and 1258, Wanrong Road, Jing'an District, Shanghai 200072

Applicant before: Tianyi Digital Life Technology Co.,Ltd.

Country or region before: China

GR01 Patent grant
GR01 Patent grant