CN111274791B - Modeling method of user loss early warning model in online home decoration scene - Google Patents

Modeling method of user loss early warning model in online home decoration scene Download PDF

Info

Publication number
CN111274791B
CN111274791B CN202010031987.9A CN202010031987A CN111274791B CN 111274791 B CN111274791 B CN 111274791B CN 202010031987 A CN202010031987 A CN 202010031987A CN 111274791 B CN111274791 B CN 111274791B
Authority
CN
China
Prior art keywords
chat
word
early warning
model
users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010031987.9A
Other languages
Chinese (zh)
Other versions
CN111274791A (en
Inventor
陈旋
王冲
张平
付虹源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Aijia Household Products Co Ltd
Original Assignee
Jiangsu Aijia Household Products Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Aijia Household Products Co Ltd filed Critical Jiangsu Aijia Household Products Co Ltd
Priority to CN202010031987.9A priority Critical patent/CN111274791B/en
Publication of CN111274791A publication Critical patent/CN111274791A/en
Application granted granted Critical
Publication of CN111274791B publication Critical patent/CN111274791B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a modeling method of a user loss early warning model in an online home decoration scene, which belongs to the technical field of modeling of user loss early warning models, and solves the problems that the online home decoration industry cannot collect enough data quantity for supporting modeling and analysis due to low occurrence frequency of user login, browsing, consumption and the like, the user loss early warning model modeling method of the user loss early warning model cannot be established in the conventional mode, a shallow neural network model is constructed by performing supervised learning on a training data set of word distribution conditions of marked users and customer service chat content in IM software and whether the users corresponding to the chat content are lost, and an output value is mapped onto a (0, 1) interval by adopting a sigmoid function at an output layer, and the output value is expressed in a percentage form, namely the user loss probability. The model is integrated into engineering application, and an accurate and efficient user loss early warning system can be developed.

Description

Modeling method of user loss early warning model in online home decoration scene
Technical Field
The invention belongs to the technical field of modeling of user loss early warning models, and particularly relates to a modeling method of a user loss early warning model in an online home decoration scene based on online chat user word distribution.
Background
The user loss early warning system is very important for interconnected enterprises which exchange profit with the flow, and users to be lost can be rapidly screened out through the system and then can be recovered through some operation means, so that the loss of the flow is reduced.
The main idea of constructing a user loss early warning system is to perform data analysis on the behaviors of the users and build a model, and the users to be lost are screened out through the model. The data dimensions used in conducting user behavior analysis typically include user interactions such as login, browsing, consumption, and the like. Because the online home decoration industry has the advantages of high order amount, long service period, more personalized requirements, high degree of specialization, serious information asymmetry between the owner and the constructor and the like, each order needs long-term communication between customer service personnel and users, the main means of communication is based on IM (Instant Messaging instant messaging) software autonomously developed by the enterprise, the main interaction behavior of the enterprise and the users is also realized, the occurrence frequency of the behaviors of logging, browsing, consumption and the like of the users is low, and the data volume which is enough for supporting modeling and analysis is not acquired, so that the establishment of a model for data analysis on chat records communicated between the customer service personnel and the users becomes the best choice for realizing a user loss early warning system.
By carrying out hot word analysis on language text contents generated in the chat process of lost users and users with successful transactions, the word distribution of the chat contents of the two types of users is found to be quite different, for example, the frequency of words such as refund, gas generation and the like in the lost users is far higher than that of the users with successful transactions, and the frequency of words such as thank, satisfaction and the like in the users with successful transactions is far higher than that of the lost users. Based on the finding, a plurality of models such as logistic regression, maximum entropy hidden Markov, deep neural network and the like are established for the word distribution situation of the chat text content and the user loss probability. When the model is tested, the accuracy of verification set data on the shallow neural network is found to be superior to that of logistic regression, and the model complexity is far lower than that of the maximum entropy hidden Markov model and the depth neural network model, and the speed advantage of training to convergence is obvious because the model complexity is far lower than that of the maximum entropy hidden Markov model and the depth neural network model, so that the shallow neural network model is most suitable for being applied to engineering application which needs frequent model updating to keep model timeliness.
Disclosure of Invention
The invention aims to solve the technical problems of providing a modeling method for a user loss early warning model, which solves the problems that the occurrence frequency of behaviors such as login, browsing and consumption of users is low in the online home decoration industry, the data quantity which is insufficient for supporting modeling and analysis is not collected, and a user loss early warning model cannot be established in the conventional mode. The model is integrated into engineering application, and an accurate and efficient user loss early warning system can be developed.
The invention adopts the following technical scheme for solving the technical problems:
a modeling method of a user loss early warning model in an on-line home decoration scene specifically comprises the following steps of;
step 1, determining a stop word list, wherein the stop word comprises a stop word, a conjunctive word, a preposition, a place name, a person-named pronoun, a number and punctuation marks;
step 2, marking data: pulling chat records of all successful users and lost users, marking the chat records of all successful users with a label 0 as a forward data sample set A, and marking the chat records of all lost users with a label 1 as a reverse data sample set B;
step 3, calculating and storing idf values of words appearing in the keyword library W in all chat texts sent by all users, wherein the specific calculation formula is as follows
Step 4, processing all chat text contents sent by each user in the forward data sample set A and the reverse data sample set B, firstly segmenting the text, then removing stop words to obtain a chat keyword set of the user, and further calculating tf idf values of each word in the set;
wherein ,idf is the idf value calculated in step 3, tf is the product of tf and the idf value; creating a text vector of the chat, wherein the length of the vector is N, the mth bit of the vector corresponds to tf idf value of a word with the number m in the chat keyword library W in the chat, for the word which does not appear in the chat,tf is 0 and the value of the bit is 0; recording a label corresponding to the vector, wherein if the chat belongs to the set A, the label is 0, otherwise, the label is 1; all the chat data are processed to obtain a data sample set T with each piece of data being a text vector and a label;
and 5, constructing a shallow neural network structure.
As a further preferable scheme of the modeling method of the user loss early warning model in the on-line home decoration scene, the step 1 specifically comprises the following steps: the method comprises the steps of selecting the first n words with highest occurrence frequency from verbs, nouns, adjectives and adverbs except stop words in all chat texts sent by all users as a chat keyword library W, creating a number with a value range of [1, n ] for each word in the W, wherein the number of each word is unique and is not repeated with other words.
As a further preferable scheme of the modeling method of the user loss early warning model in the on-line home decoration scene, in the step 5, the shallow neural network structure comprises an input layer for inputting an N-dimensional vector, a hidden layer formed by a plurality of matrixes and an output layer taking a sigmoid function as an output result.
As a further preferable scheme of the modeling method of the user loss early warning model in the on-line home decoration scene, the training process of the shallow neural network specifically comprises the following steps:
step 5.1, establishing a hidden layer matrix by using a random value at the beginning of training;
step 5.2, after each word vector in the training data set is input to the input layer and multiplied by the matrix of the hidden layer, the final output value is mapped onto a (0, 1) interval by a sigmoid function to be used as output;
step 5.3, calculating the error between the output result of the model and the actual label through cross entropy, and optimizing the hidden layer matrix in a gradient descending mode until the error value is minimum;
and 5.4, performing iterative training in the data sample set T by using a K-fold cross validation mode until the accuracy and the error trend of the model on the validation set are stable.
Compared with the prior art, the technical scheme provided by the invention has the following technical effects:
1. the problems that the online home decoration industry is low in occurrence frequency due to the behaviors of logging, browsing, consumption and the like of users, the data quantity which is enough for supporting modeling and analysis is not collected, and a user loss early warning model cannot be established in a conventional mode are solved;
2. the training device adopts a shallow neural network structure, the training is fast in convergence speed, the time cost for completing the training is low, and along with the continuous generation of new chat content training samples, the training can be completed rapidly to generate a new model, so that the model has a specific stronger effectiveness.
3. The constructed model has accurate prediction and excellent performance, can be quickly realized by using tensorsurface, and is easy to use on a large scale after engineering.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
fig. 2 is a schematic diagram of the structure of the shallow neural network of the present invention.
Detailed Description
The technical scheme of the invention is further described in detail below with reference to the accompanying drawings:
the following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without undue burden are within the scope of the invention
As shown in fig. 1, a modeling method of a user loss early warning model in an on-line home decoration scene specifically comprises the following steps of;
step 1, determining a stop word list, wherein the stop word comprises a stop word, a conjunctive word, a preposition, a place name, a person-named pronoun, a number and punctuation marks;
word-filling is performed on non-stop words with fixed vector length (the word-filling mode is similar to one hold, a group of words with larger classification meaning needs to be selected, in this example, words with highest word frequency are adopted, so that the sparse degree of word vectors can be reduced, and words with more occurrence times and less meaning are filtered in stop words, so that the left words are significant. The larger the value of the word vector length N, N accepted by the model input layer is set, the more words participate in model calculation, the larger the information content contained in the trained model generally brings better effect, but the calculation amount of modeling is increased, so that the size of N needs to be limited, and a model with satisfactory effect can be built when the N is found to be 1000 in practical use. The method comprises the steps of selecting the first n words with highest occurrence frequency from verbs, nouns, adjectives and adverbs except stop words in all chat texts sent by all users as a chat keyword library W, creating a number with a value range of [1, n ] for each word in the W, wherein the number of each word is unique and is not repeated with other words.
Step 2, marking data: pulling chat records of all successful users and lost users, marking the chat records of all successful users with a label 0 as a forward data sample set A, and marking the chat records of all lost users with a label 1 as a reverse data sample set B;
step 3, calculating and saving idf (inverse) of the word appearing in the keyword library W in all chat texts issued by all users
Text frequency index Inverse Document Frequency)
The specific calculation formula of the value is as follows:
step 4, processing all chat text contents sent by each user in the forward data sample set A and the reverse data sample set B, firstly segmenting the text, then removing stop words to obtain a chat keyword set of the user, and further calculating tf idf values of each word in the set;
where tf is the Term Frequency (Term Frequency),idf is the idf value calculated in step 3, tf is the product of tf and the idf value; creating a text vector of the chat, wherein the length of the vector is N, and the m-th bit of the vector corresponds to tf idf values of words numbered as m in a chat keyword library W in the chat, wherein the tf idf values are commonly used for evaluating the importance of a word to one of a file set or a corpus. The importance of a word increases proportionally with the number of times it appears in the file, but at the same time decreases inversely with the frequency with which it appears in the corpus. For words that did not appear in the chat, tf is 0 and the value of this bit is 0; recording a label corresponding to the vector, wherein if the chat belongs to the set A, the label is 0, otherwise, the label is 1; all the chat data are processed to obtain a data sample set T with each piece of data being a text vector and a label;
and 5, constructing a shallow neural network structure. As shown in fig. 2.
As a further preferable scheme of the modeling method of the user loss early warning model in the on-line home decoration scene, the shallow neural network structure comprises an input layer for inputting an N-dimensional vector, a hidden layer formed by a plurality of matrixes and an output layer taking a sigmoid function as an output result.
As a further preferable scheme of the modeling method of the user loss early warning model in the on-line home decoration scene, the modeling method is shallow
The training process of the neural network specifically comprises the following steps: when training is started, a hidden layer matrix is established by using random values, each word vector in a training data set is input to an input layer, the word vector is multiplied by the hidden layer matrix, a final output value is mapped to a (0, 1) interval by a sigmoid function to be used as output, the error between an output result of a model and an actual label is calculated by using cross entropy, and the hidden layer matrix is optimized to be minimum in error value by using a gradient descent mode; and performing iterative training in the data sample set T by using a K-fold cross validation mode until the accuracy and the error trend of the model on the validation set are stable.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereto, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the present invention. The embodiments of the present invention have been described in detail, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the present invention.

Claims (2)

1. A modeling method of a user loss early warning model in an on-line home decoration scene is characterized by comprising the following steps of: the method specifically comprises the following steps of;
step 1, determining a stop word list, wherein the stop word comprises a stop word, a conjunctive word, a preposition, a place name, a person-named pronoun, a number and punctuation marks;
step 2, marking data: pulling chat records of all successful users and lost users, marking the chat records of all successful users with a label 0 as a forward data sample set A, and marking the chat records of all lost users with a label 1 as a reverse data sample set B;
step 3, calculating and storing idf values of words appearing in the keyword library W in all chat texts sent by all users, wherein the specific calculation formula is as follows
Step 4, processing all chat text contents sent by each user in the forward data sample set A and the reverse data sample set B, firstly segmenting the text, then removing stop words to obtain a chat keyword set of the user, and further calculating tf idf values of each word in the set;
wherein ,idf is the idf value calculated in step 3, tf is the product of tf and the idf value; creating a text vector of the chat, wherein the length of the vector is N, the mth bit of the vector corresponds to tf of a word with the number m in a chat keyword library W in the chat, tf is 0 for a word which does not appear in the chat, and the value of the bit is 0; recording a label corresponding to the vector, wherein if the chat belongs to the set A, the label is 0, otherwise, the label is 1; all the chat data are processed to obtain a data sample set T with each piece of data being a text vector and a label;
step 5, constructing a shallow neural network structure;
in step 5, the shallow neural network structure comprises an input layer for inputting an N-dimensional vector, a hidden layer formed by a plurality of matrixes, and an output layer for taking a sigmoid function as an output result;
the training process of the shallow neural network specifically comprises the following steps:
step 5.1, establishing a hidden layer matrix by using a random value at the beginning of training;
step 5.2, after each word vector in the training data set is input to the input layer and multiplied by the matrix of the hidden layer, the final output value is mapped onto a (0, 1) interval by a sigmoid function to be used as output;
step 5.3, calculating the error between the output result of the model and the actual label through cross entropy, and optimizing the hidden layer matrix in a gradient descending mode until the error value is minimum;
and 5.4, performing iterative training in the data sample set T by using a K-fold cross validation mode until the accuracy and the error trend of the model on the validation set are stable.
2. The modeling method of the user loss early warning model in the on-line home decoration scene according to claim 1, wherein the modeling method is characterized by comprising the following steps of: the step 1 specifically comprises the following steps: the method comprises the steps of selecting the first n words with highest occurrence frequency from verbs, nouns, adjectives and adverbs except stop words in all chat texts sent by all users as a chat keyword library W, creating a number with a value range of [1, n ] for each word in the W, wherein the number of each word is unique and is not repeated with other words.
CN202010031987.9A 2020-01-13 2020-01-13 Modeling method of user loss early warning model in online home decoration scene Active CN111274791B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010031987.9A CN111274791B (en) 2020-01-13 2020-01-13 Modeling method of user loss early warning model in online home decoration scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010031987.9A CN111274791B (en) 2020-01-13 2020-01-13 Modeling method of user loss early warning model in online home decoration scene

Publications (2)

Publication Number Publication Date
CN111274791A CN111274791A (en) 2020-06-12
CN111274791B true CN111274791B (en) 2023-08-18

Family

ID=71003046

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010031987.9A Active CN111274791B (en) 2020-01-13 2020-01-13 Modeling method of user loss early warning model in online home decoration scene

Country Status (1)

Country Link
CN (1) CN111274791B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633715A (en) * 2020-12-28 2021-04-09 四川新网银行股份有限公司 Method for analyzing loss of online service user
CN113449103B (en) * 2021-01-28 2024-05-10 民生科技有限责任公司 Bank transaction running water classification method and system integrating label and text interaction mechanism

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491493A (en) * 2017-07-22 2017-12-19 长沙兔子代跑网络科技有限公司 A kind of intelligence obtains the method and device for running chat record in generation
CN107766709A (en) * 2017-11-08 2018-03-06 广东小天才科技有限公司 Method for controlling operation of mobile terminal based on input method and mobile terminal
CN108369665A (en) * 2015-12-10 2018-08-03 爱维士软件有限责任公司 (It is mobile)Application program uses the prediction being lost in
CN109962795A (en) * 2017-12-22 2019-07-02 中国移动通信集团广东有限公司 A kind of 4G customer churn method for early warning and system based on multidimensional union variable
WO2019165944A1 (en) * 2018-02-28 2019-09-06 中国银联股份有限公司 Transition probability network based merchant recommendation method and system thereof
CN110569842A (en) * 2019-09-05 2019-12-13 江苏艾佳家居用品有限公司 Semi-supervised learning method for GAN model training

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108369665A (en) * 2015-12-10 2018-08-03 爱维士软件有限责任公司 (It is mobile)Application program uses the prediction being lost in
CN107491493A (en) * 2017-07-22 2017-12-19 长沙兔子代跑网络科技有限公司 A kind of intelligence obtains the method and device for running chat record in generation
CN107766709A (en) * 2017-11-08 2018-03-06 广东小天才科技有限公司 Method for controlling operation of mobile terminal based on input method and mobile terminal
CN109962795A (en) * 2017-12-22 2019-07-02 中国移动通信集团广东有限公司 A kind of 4G customer churn method for early warning and system based on multidimensional union variable
WO2019165944A1 (en) * 2018-02-28 2019-09-06 中国银联股份有限公司 Transition probability network based merchant recommendation method and system thereof
CN110569842A (en) * 2019-09-05 2019-12-13 江苏艾佳家居用品有限公司 Semi-supervised learning method for GAN model training

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于神经网络的电信客户流失预测主题建模及实现;田玲;《计算机应用》;20070901;第2294-2297页 *

Also Published As

Publication number Publication date
CN111274791A (en) 2020-06-12

Similar Documents

Publication Publication Date Title
CN109145112B (en) Commodity comment classification method based on global information attention mechanism
CN107608956B (en) Reader emotion distribution prediction algorithm based on CNN-GRNN
CN108763362B (en) Local model weighted fusion Top-N movie recommendation method based on random anchor point pair selection
CN111797321B (en) Personalized knowledge recommendation method and system for different scenes
CN108363790A (en) For the method, apparatus, equipment and storage medium to being assessed
CN108073568A (en) keyword extracting method and device
CN105183717B (en) A kind of OSN user feeling analysis methods based on random forest and customer relationship
CN112199608A (en) Social media rumor detection method based on network information propagation graph modeling
Lavanya et al. Twitter sentiment analysis using multi-class SVM
CN111160000B (en) Composition automatic scoring method, device terminal equipment and storage medium
CN111274791B (en) Modeling method of user loss early warning model in online home decoration scene
CN113408706B (en) Method and device for training user interest mining model and user interest mining
CN113392209A (en) Text clustering method based on artificial intelligence, related equipment and storage medium
CN110874392B (en) Text network information fusion embedding method based on depth bidirectional attention mechanism
CN111581379B (en) Automatic composition scoring calculation method based on composition question-deducting degree
CN111930931A (en) Abstract evaluation method and device
CN113204624B (en) Multi-feature fusion text emotion analysis model and device
CN111460146A (en) Short text classification method and system based on multi-feature fusion
CN114942974A (en) E-commerce platform commodity user evaluation emotional tendency classification method
CN116756347B (en) Semantic information retrieval method based on big data
CN113486143A (en) User portrait generation method based on multi-level text representation and model fusion
Kumar et al. Self-attention enhanced recurrent neural networks for sentence classification
CN113157993A (en) Network water army behavior early warning model based on time sequence graph polarization analysis
Yan et al. Microblog emotion analysis method using deep learning in spark big data environment
CN112101033B (en) Emotion analysis method and device for automobile public praise

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant