CN111274791B

CN111274791B - Modeling method of user loss early warning model in online home decoration scene

Info

Publication number: CN111274791B
Application number: CN202010031987.9A
Authority: CN
Inventors: 陈旋; 王冲; 张平; 付虹源
Original assignee: Jiangsu Aijia Household Products Co Ltd
Current assignee: Jiangsu Aijia Household Products Co Ltd
Priority date: 2020-01-13
Filing date: 2020-01-13
Publication date: 2023-08-18
Anticipated expiration: 2040-01-13
Also published as: CN111274791A

Abstract

The invention discloses a modeling method of a user loss early warning model in an online home decoration scene, which belongs to the technical field of modeling of user loss early warning models, and solves the problems that the online home decoration industry cannot collect enough data quantity for supporting modeling and analysis due to low occurrence frequency of user login, browsing, consumption and the like, the user loss early warning model modeling method of the user loss early warning model cannot be established in the conventional mode, a shallow neural network model is constructed by performing supervised learning on a training data set of word distribution conditions of marked users and customer service chat content in IM software and whether the users corresponding to the chat content are lost, and an output value is mapped onto a (0, 1) interval by adopting a sigmoid function at an output layer, and the output value is expressed in a percentage form, namely the user loss probability. The model is integrated into engineering application, and an accurate and efficient user loss early warning system can be developed.

Description

Modeling method of user loss early warning model in online home decoration scene

Technical Field

The invention belongs to the technical field of modeling of user loss early warning models, and particularly relates to a modeling method of a user loss early warning model in an online home decoration scene based on online chat user word distribution.

Background

The user loss early warning system is very important for interconnected enterprises which exchange profit with the flow, and users to be lost can be rapidly screened out through the system and then can be recovered through some operation means, so that the loss of the flow is reduced.

The main idea of constructing a user loss early warning system is to perform data analysis on the behaviors of the users and build a model, and the users to be lost are screened out through the model. The data dimensions used in conducting user behavior analysis typically include user interactions such as login, browsing, consumption, and the like. Because the online home decoration industry has the advantages of high order amount, long service period, more personalized requirements, high degree of specialization, serious information asymmetry between the owner and the constructor and the like, each order needs long-term communication between customer service personnel and users, the main means of communication is based on IM (Instant Messaging instant messaging) software autonomously developed by the enterprise, the main interaction behavior of the enterprise and the users is also realized, the occurrence frequency of the behaviors of logging, browsing, consumption and the like of the users is low, and the data volume which is enough for supporting modeling and analysis is not acquired, so that the establishment of a model for data analysis on chat records communicated between the customer service personnel and the users becomes the best choice for realizing a user loss early warning system.

By carrying out hot word analysis on language text contents generated in the chat process of lost users and users with successful transactions, the word distribution of the chat contents of the two types of users is found to be quite different, for example, the frequency of words such as refund, gas generation and the like in the lost users is far higher than that of the users with successful transactions, and the frequency of words such as thank, satisfaction and the like in the users with successful transactions is far higher than that of the lost users. Based on the finding, a plurality of models such as logistic regression, maximum entropy hidden Markov, deep neural network and the like are established for the word distribution situation of the chat text content and the user loss probability. When the model is tested, the accuracy of verification set data on the shallow neural network is found to be superior to that of logistic regression, and the model complexity is far lower than that of the maximum entropy hidden Markov model and the depth neural network model, and the speed advantage of training to convergence is obvious because the model complexity is far lower than that of the maximum entropy hidden Markov model and the depth neural network model, so that the shallow neural network model is most suitable for being applied to engineering application which needs frequent model updating to keep model timeliness.

Disclosure of Invention

The invention aims to solve the technical problems of providing a modeling method for a user loss early warning model, which solves the problems that the occurrence frequency of behaviors such as login, browsing and consumption of users is low in the online home decoration industry, the data quantity which is insufficient for supporting modeling and analysis is not collected, and a user loss early warning model cannot be established in the conventional mode. The model is integrated into engineering application, and an accurate and efficient user loss early warning system can be developed.

The invention adopts the following technical scheme for solving the technical problems:

a modeling method of a user loss early warning model in an on-line home decoration scene specifically comprises the following steps of;

step 1, determining a stop word list, wherein the stop word comprises a stop word, a conjunctive word, a preposition, a place name, a person-named pronoun, a number and punctuation marks;

step 2, marking data: pulling chat records of all successful users and lost users, marking the chat records of all successful users with a label 0 as a forward data sample set A, and marking the chat records of all lost users with a label 1 as a reverse data sample set B;

step 3, calculating and storing idf values of words appearing in the keyword library W in all chat texts sent by all users, wherein the specific calculation formula is as follows

Step 4, processing all chat text contents sent by each user in the forward data sample set A and the reverse data sample set B, firstly segmenting the text, then removing stop words to obtain a chat keyword set of the user, and further calculating tf idf values of each word in the set;

wherein ,idf is the idf value calculated in step 3, tf is the product of tf and the idf value; creating a text vector of the chat, wherein the length of the vector is N, the mth bit of the vector corresponds to tf idf value of a word with the number m in the chat keyword library W in the chat, for the word which does not appear in the chat,tf is 0 and the value of the bit is 0; recording a label corresponding to the vector, wherein if the chat belongs to the set A, the label is 0, otherwise, the label is 1; all the chat data are processed to obtain a data sample set T with each piece of data being a text vector and a label;

and 5, constructing a shallow neural network structure.

As a further preferable scheme of the modeling method of the user loss early warning model in the on-line home decoration scene, the step 1 specifically comprises the following steps: the method comprises the steps of selecting the first n words with highest occurrence frequency from verbs, nouns, adjectives and adverbs except stop words in all chat texts sent by all users as a chat keyword library W, creating a number with a value range of [1, n ] for each word in the W, wherein the number of each word is unique and is not repeated with other words.

As a further preferable scheme of the modeling method of the user loss early warning model in the on-line home decoration scene, in the step 5, the shallow neural network structure comprises an input layer for inputting an N-dimensional vector, a hidden layer formed by a plurality of matrixes and an output layer taking a sigmoid function as an output result.

As a further preferable scheme of the modeling method of the user loss early warning model in the on-line home decoration scene, the training process of the shallow neural network specifically comprises the following steps:

step 5.1, establishing a hidden layer matrix by using a random value at the beginning of training;

step 5.2, after each word vector in the training data set is input to the input layer and multiplied by the matrix of the hidden layer, the final output value is mapped onto a (0, 1) interval by a sigmoid function to be used as output;

step 5.3, calculating the error between the output result of the model and the actual label through cross entropy, and optimizing the hidden layer matrix in a gradient descending mode until the error value is minimum;

and 5.4, performing iterative training in the data sample set T by using a K-fold cross validation mode until the accuracy and the error trend of the model on the validation set are stable.

Compared with the prior art, the technical scheme provided by the invention has the following technical effects:

1. the problems that the online home decoration industry is low in occurrence frequency due to the behaviors of logging, browsing, consumption and the like of users, the data quantity which is enough for supporting modeling and analysis is not collected, and a user loss early warning model cannot be established in a conventional mode are solved;

2. the training device adopts a shallow neural network structure, the training is fast in convergence speed, the time cost for completing the training is low, and along with the continuous generation of new chat content training samples, the training can be completed rapidly to generate a new model, so that the model has a specific stronger effectiveness.

3. The constructed model has accurate prediction and excellent performance, can be quickly realized by using tensorsurface, and is easy to use on a large scale after engineering.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

fig. 2 is a schematic diagram of the structure of the shallow neural network of the present invention.

Detailed Description

The technical scheme of the invention is further described in detail below with reference to the accompanying drawings:

the following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without undue burden are within the scope of the invention

As shown in fig. 1, a modeling method of a user loss early warning model in an on-line home decoration scene specifically comprises the following steps of;

word-filling is performed on non-stop words with fixed vector length (the word-filling mode is similar to one hold, a group of words with larger classification meaning needs to be selected, in this example, words with highest word frequency are adopted, so that the sparse degree of word vectors can be reduced, and words with more occurrence times and less meaning are filtered in stop words, so that the left words are significant. The larger the value of the word vector length N, N accepted by the model input layer is set, the more words participate in model calculation, the larger the information content contained in the trained model generally brings better effect, but the calculation amount of modeling is increased, so that the size of N needs to be limited, and a model with satisfactory effect can be built when the N is found to be 1000 in practical use. The method comprises the steps of selecting the first n words with highest occurrence frequency from verbs, nouns, adjectives and adverbs except stop words in all chat texts sent by all users as a chat keyword library W, creating a number with a value range of [1, n ] for each word in the W, wherein the number of each word is unique and is not repeated with other words.

step 3, calculating and saving idf (inverse) of the word appearing in the keyword library W in all chat texts issued by all users

Text frequency index Inverse Document Frequency)

The specific calculation formula of the value is as follows:

where tf is the Term Frequency (Term Frequency),idf is the idf value calculated in step 3, tf is the product of tf and the idf value; creating a text vector of the chat, wherein the length of the vector is N, and the m-th bit of the vector corresponds to tf idf values of words numbered as m in a chat keyword library W in the chat, wherein the tf idf values are commonly used for evaluating the importance of a word to one of a file set or a corpus. The importance of a word increases proportionally with the number of times it appears in the file, but at the same time decreases inversely with the frequency with which it appears in the corpus. For words that did not appear in the chat, tf is 0 and the value of this bit is 0; recording a label corresponding to the vector, wherein if the chat belongs to the set A, the label is 0, otherwise, the label is 1; all the chat data are processed to obtain a data sample set T with each piece of data being a text vector and a label;

and 5, constructing a shallow neural network structure. As shown in fig. 2.

As a further preferable scheme of the modeling method of the user loss early warning model in the on-line home decoration scene, the shallow neural network structure comprises an input layer for inputting an N-dimensional vector, a hidden layer formed by a plurality of matrixes and an output layer taking a sigmoid function as an output result.

As a further preferable scheme of the modeling method of the user loss early warning model in the on-line home decoration scene, the modeling method is shallow

The training process of the neural network specifically comprises the following steps: when training is started, a hidden layer matrix is established by using random values, each word vector in a training data set is input to an input layer, the word vector is multiplied by the hidden layer matrix, a final output value is mapped to a (0, 1) interval by a sigmoid function to be used as output, the error between an output result of a model and an actual label is calculated by using cross entropy, and the hidden layer matrix is optimized to be minimum in error value by using a gradient descent mode; and performing iterative training in the data sample set T by using a K-fold cross validation mode until the accuracy and the error trend of the model on the validation set are stable.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereto, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the present invention. The embodiments of the present invention have been described in detail, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the present invention.

Claims

1. A modeling method of a user loss early warning model in an on-line home decoration scene is characterized by comprising the following steps of: the method specifically comprises the following steps of;

wherein ,idf is the idf value calculated in step 3, tf is the product of tf and the idf value; creating a text vector of the chat, wherein the length of the vector is N, the mth bit of the vector corresponds to tf of a word with the number m in a chat keyword library W in the chat, tf is 0 for a word which does not appear in the chat, and the value of the bit is 0; recording a label corresponding to the vector, wherein if the chat belongs to the set A, the label is 0, otherwise, the label is 1; all the chat data are processed to obtain a data sample set T with each piece of data being a text vector and a label;

step 5, constructing a shallow neural network structure;

in step 5, the shallow neural network structure comprises an input layer for inputting an N-dimensional vector, a hidden layer formed by a plurality of matrixes, and an output layer for taking a sigmoid function as an output result;

the training process of the shallow neural network specifically comprises the following steps:

2. The modeling method of the user loss early warning model in the on-line home decoration scene according to claim 1, wherein the modeling method is characterized by comprising the following steps of: the step 1 specifically comprises the following steps: the method comprises the steps of selecting the first n words with highest occurrence frequency from verbs, nouns, adjectives and adverbs except stop words in all chat texts sent by all users as a chat keyword library W, creating a number with a value range of [1, n ] for each word in the W, wherein the number of each word is unique and is not repeated with other words.