CN110991711A

CN110991711A - Multi-factor perception terminal switching prediction method based on deep neural network

Info

Publication number: CN110991711A
Application number: CN201911135219.1A
Authority: CN
Inventors: 王敬昌; 吴勇; 陈岭; 陈纬奇; 郑羽
Original assignee: Zhejiang Hongcheng Computer Systems Co Ltd
Current assignee: Zhejiang Hongcheng Computer Systems Co Ltd
Priority date: 2019-11-19
Filing date: 2019-11-19
Publication date: 2020-04-10

Abstract

The invention relates to a multi-factor perception terminal switching prediction method based on a deep neural network, which comprises the steps of firstly collecting terminal attributes, related user natural attributes, conversation behavior data, flow use behavior data and historical switching records, and preprocessing the terminal attributes; secondly, coding is carried out to obtain a user natural attribute code and a terminal attribute code, a user history change machine information code is constructed according to a user history change machine record and the corresponding terminal attribute code, a user conversation and flow use behavior time sequence is constructed, and coarsening and normalization processing are carried out; extracting time sequence characteristics of user conversation and flow use behaviors; and finally, splicing the natural attribute code of the user, the historical switch-over information code, the conversation behavior time sequence characteristic and the flow use behavior time sequence characteristic, and sending the spliced result into the full-connection network to predict whether the user can change the switch. The prediction method is beneficial to operators to accurately market the mobile terminal and the matched products thereof for users, so that the sales volume of the mobile terminal is increased, and the market scale is enlarged.

Description

Multi-factor perception terminal switching prediction method based on deep neural network

Technical Field

The invention relates to the technical field of deep learning, in particular to a multi-factor perception terminal switching prediction method based on a deep neural network.

Background

The sale of the mobile terminal and the matched products becomes the strategic core of telecommunication operators, and the accurate prediction of user change is beneficial to the operators to increase the sales volume of the mobile terminal and expand the market scale. However, user switch behavior is affected by a number of factors, such as: the income condition, the consumption habit, the dependence degree of the mobile phone, the conversation behavior, the traffic use behavior and the like of the user bring great challenges to the terminal switching prediction. In order to realize accurate marketing of an operator terminal and a matched product thereof, the influence of natural attributes and behavior data of a user on the change behavior of the user needs to be deeply mined, and then the change behavior of the user is accurately predicted.

The existing prediction method for terminal switching can be divided into two types: and in the first category, prediction is carried out only based on static attributes of the user (such as natural attributes of the user, terminal use preference, package use data, common positions of the user and the like). Such models have poor performance because of neglected user behavior information and the inability to model dynamic user representations. And in the second category, dynamic behavior data of the user is introduced, and statistical characteristics (such as average call duration, traffic usage and the like) of the behavior data of the user in the last week are defined manually by depending on professional knowledge. The statistical features constructed by relying on the domain knowledge can not model the variation trend of the user behavior sequence in a display mode and can cause information loss to a certain extent.

Disclosure of Invention

The invention aims to overcome the defects and provides a multi-factor perception terminal switching prediction method based on a deep neural network, and the technical problem to be solved is how to fully mine natural attributes of users, call and flow use behavior data and factors influencing user switching in historical switching records to predict the user switching; the method fully excavates factors influencing the user change by utilizing the natural attributes of the user, the call and flow use behavior data and the historical change machine record, and further constructs the terminal change machine prediction model with high generalization capability. The method comprises the steps of firstly, acquiring terminal attributes and related natural user attributes, conversation behavior data, flow use behavior data and historical change machine records, and preprocessing the terminal attributes and the related natural user attributes; then, encoding the preprocessed user natural attributes and terminal attributes to obtain user natural attribute codes and terminal attribute codes, constructing user history change information codes according to user history change records and corresponding terminal attribute codes, constructing a user conversation and flow use behavior time sequence, and performing coarsening and normalization processing; then, using the LSTM network to extract the time sequence characteristics of the user conversation and flow use behaviors; and finally, splicing the natural attribute code of the user, the historical switch-over information code, the conversation behavior time sequence characteristic and the flow use behavior time sequence characteristic, and sending the spliced result into the full-connection network to predict whether the user can change the switch. The method and the system realize multi-factor perception terminal replacement prediction based on the deep neural network, and are beneficial to operators to accurately market the mobile terminal and matched products thereof for users, so that the sales volume of the mobile terminal is increased, and the market scale is enlarged.

The invention achieves the aim through the following technical scheme: a multi-factor perception terminal switching prediction method based on a deep neural network comprises three stages of data acquisition and preprocessing, feature construction and extraction, and feature fusion and prediction, and specifically comprises the following steps:

(1) data acquisition and preprocessing stage: acquiring terminal attributes and related natural user attributes, call behavior data, traffic use behavior data and historical change records, and preprocessing the terminal attributes and the related natural user attributes;

(2) and (3) feature construction and extraction stages: coding the preprocessed user natural attribute and terminal attribute to obtain a user natural attribute code and a terminal attribute code, constructing a user history change information code according to a user history change record and the corresponding terminal attribute code, constructing a user conversation and flow use behavior time sequence, and performing coarsening and normalization processing; and utilizes LSTM network to extract time sequence characteristics of user's talking and traffic using behaviors;

(3) and (3) feature fusion and prediction stage: and splicing the natural attribute code of the user, the historical switch-over information code, the conversation behavior time sequence characteristic and the flow use behavior time sequence characteristic, and then sending the spliced result into the full-connection network to predict whether the user can change the switch.

Preferably, the data acquisition and preprocessing stages are specifically as follows:

(1.1) collecting relevant user natural attributes, including: gender, age, network access duration, client star level, whether the user is an in-person network user, the number of CDMA (code division multiple access) users under the same client and App (application profile) use preference;

(1.2) collecting relevant terminal attributes, including: brand, operating system, price, screen size, whether 4G and rear camera pixels are supported;

(1.3) collecting related user call behavior data and flow use behavior data, and performing missing value completion and abnormal value elimination processing; the user call behavior data comprises daily call times, daily call duration, daily call times and daily call duration, and the flow use behavior data comprises daily flow use times, daily flow use duration, daily uplink flow and daily downlink flow;

and (1.4) collecting user history change machine records.

Preferably, in the step (1.3), missing value and abnormal value detection is performed on the user call and traffic usage behavior data, and the missing value and the abnormal value are filled in by using a linear interpolation method.

Preferably, the user history change machine record is represented in the form of a triple < u, s, t >, where u is a user id, s is a terminal id, and t is a date when the user u uses the terminal s first.

Preferably, the feature construction and extraction stages are specifically as follows:

(2.1) coding the natural attribute of the user to obtain a natural attribute code v of the user;

(2.2) mapping the user natural attribute code v to a feature space to obtain a user natural attribute feature representation f_attr(ii) a The user natural attribute code v is mapped to the feature space by using a linear transformation mode, and the formula is as follows:

f_attr＝W_a ^Tv

wherein, W_aIs a mapping matrix;

(2.3) coding the terminal attribute to obtain a terminal attribute code l;

(2.4) constructing a user history change machine information code r according to the user history change machine record and the corresponding terminal attribute code;

(2.5) mapping the user history change machine information code r to a feature space to obtain a user history change machine information feature representation f_term(ii) a Wherein, the user change information code r is mapped to the feature space by using a linear transformation mode, and the formula is as follows:

f_term＝W_t ^Tr

wherein, W_tIs a mapping matrix;

(2.6) according to the user conversation and flow use behavior data after the missing value completion and the abnormal value elimination, constructing a conversation behavior time sequence ξ with the time interval of 1 day and the span of T days_callAnd traffic usage behavior timing ξ_dataCoarsening and normalizing the granularity g;

(2.7) constructing two multi-layer LSTM networks with the same structure, wherein each layer comprises T/g LSTM units, and carrying out coarsening and normalization on conversation behavior time sequence ξ_callAnd traffic usage behavior timing ξ_dataRespectively input into two LSTM networks, and extracting the time sequence characteristic f of the conversation behavior_callAnd flow usage behavior timing characteristics f_data。

Preferably, the method for obtaining the user natural attribute code v includes:

(2.1.1) directly performing one-hot encoding on the discrete attributes;

(2.1.2) dividing continuous attributes except age into 5 intervals through equal frequency sub-boxes and then carrying out one-hot coding;

(2.1.3) regarding the age attribute, considering that users in different age groups have different preference of changing machines, dividing the age into 8 intervals, and then carrying out unique hot coding; wherein 8 intervals are under 16 years old, 16 to 21 years old, 22 to 27 years old, 28 to 33 years old, 34 to 39 years old, 40 to 45 years old, 46 to 51 years old, and over 51 years old, respectively;

and (2.1.4) splicing the one-hot codes of all the user natural attributes to obtain a user natural attribute code v.

Preferably, the method for obtaining the terminal attribute code l includes:

(2.3.1) directly one-hot coding discrete attributes other than brand;

(2.3.2) counting the use distribution of the terminal brands, dividing the terminal brands into 10 categories for coding, wherein the brands of 9 bits before the use proportion are divided into 9 categories, and the rest terminal brands are classified into other categories;

(2.3.3) dividing the continuous attribute into 5 intervals through equal frequency sub-boxes and then carrying out one-hot coding;

and (2.3.4) splicing the one-hot codes of all the terminal attributes to obtain a terminal attribute code l.

Preferably, the method for constructing the user history change machine information code r is as follows:

(2.4.1) assume that user u's historical switch machine records are:

<u，s₁，t₁>，<u，s₂，t₂>，…，<u，s_k，t_k>the current date is t_nowB, carrying out the following steps of; dividing the terminal corresponding to the user history change machine record into user history use terminals { s₁，…，s_k-1And currently used terminal s_k；

(2.4.2) encoding a plurality of terminal attributes corresponding to the terminals used by the user history

Maximum pooling processing is carried out to obtain historical use terminal codes l_hisIndicating a user's historical preference for the terminal;

(2.4.3) encoding the terminal attribute corresponding to the terminal currently used by the user

As the currently used terminal code l_nowIndicating the current preference of the user for the terminal;

(2.4.4) calculating the historical average change time interval p of the user_hisAnd the current terminal usage duration p_nowThe formula is as follows:

p_now＝interval(t_k，t_now)

wherein interval (·,) represents the number of days between two dates;

(2.4.5) terminal code l used by user history_hisCurrently used terminal code l_nowNormalized historical average change machine time interval p_hisAnd the normalized current terminal use time p_nowAnd splicing to obtain the historical user change information code r.

Preferably, the specific steps of the coarsening and the normalization include:

(2.6.1) performing maximum and minimum normalization processing on each sequence to normalize the processed data to [0, 1%]X is the original value, x_maxIs the maximum value of the sequence in which the value lies, x_minFor the minimum value of the sequence in which the value lies, the formula is as follows:

(2.6.2) calculating the average value of the behavior time sequence with the time span of T days every g days, and coarsening the sequence with the original step size of T to the step size of T/g.

Preferably, the step (2.7) adopts a three-layer LSTM network to extract the time sequence characteristics of the user conversation and traffic use behaviors; the LSTM network is a cyclic neural network, and each LSTM unit comprises a memory unit c_tAnd three gates: input door i_tOutput gate o_tAnd forget door f_tRespectively controlling the input, the output and the updating of data; with x_tFor input at time t, h_t-1And c_t-1The calculation formula for the hidden state and the memory unit state at the previous moment is as follows:

i_t＝sigm(w_xix_t+W_hih_t-1+b_i)

f_t＝sigm(W_xfx_t+W_hfh_t-1+b_f)

o_t＝sigm(W_xox_t+W_hoh_t-1+W_coc_t-1+b_o)

wherein the operator

The point multiplication operation is represented, W and b represent a weight matrix and a bias vector respectively, and sigm and tanh represent a sigmoid function and a hyperbolic tangent function respectively.

Preferably, the feature fusion and prediction stage is specifically as follows:

(3.1) characterizing the natural attributes of the user by f_attrAnd f, representing the information characteristics of the user history change machine_termTime sequence characteristic f of conversation behavior_callAnd flow usage behavior timing characteristics f_dataSplicing to obtain a depth characteristic f;

and (3.2) sending the depth feature f into the full-connection network, and predicting whether the user will change the machine within the future T' days.

Preferably, in the step (3.2), the full-connection network is used for processing the depth feature x, and the probability p of switching the machine within the future T' days of the user is output; the activation function of a hidden layer of the full-connection network is ReLU, the neuron number of an output layer is 1, the activation function is Sigmoid, the range of an output result is [0, 1], and the probability of changing the machine in the future T' day of a user is represented; the nonlinear capability of the model can be enhanced by stacking a plurality of fully-connected layers, and in order to balance the fitting capability and the complexity of the model, a three-layer fully-connected network is adopted; wherein the full connectivity layer maps the input h using a non-linear function ReLU activation according to:

z＝ReLU(W^Th)

where W is the full link layer weight matrix and z is the output of the full link layer.

The invention has the beneficial effects that: (1) the invention effectively utilizes the natural attributes of the users, the call behavior data, the traffic use behavior data and the abundant information contained in the historical change machine record, and can fully mine the factors influencing the change machine of the users, so that the model has stronger performance and higher generalization capability; (2) the invention utilizes the LSTM network to extract the time sequence characteristics of the user conversation and flow using behaviors, the depth characteristics are automatically obtained in a data driving mode, the change trend of the user behavior sequence can be modeled in a display mode, and the information loss caused by manually defining the characteristics is avoided.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a schematic diagram of user behavior sequence feature extraction according to an embodiment of the present invention;

FIG. 3 is a schematic illustration of feature fusion for an embodiment of the present invention.

Detailed Description

The invention will be further described with reference to specific examples, but the scope of the invention is not limited thereto:

example (b): as shown in fig. 1, a multi-factor perception terminal switch prediction method based on a deep neural network is divided into three stages, namely data acquisition and preprocessing, feature construction and extraction, and feature fusion and prediction, and specifically includes the following steps:

(1) data acquisition and processing stage: the specific steps of data acquisition and preprocessing are as follows:

step 1, collecting related user natural attributes, comprising: gender, age, duration of network access (month), client star rating, whether the user is an in-person network user, number of CDMA with the client, number of fixed phones with the client and App use preference (divided into 4 types of social contact, video, game and reading).

Step 2, collecting relevant terminal attributes, including: brand, operating system, price (dollar), screen size (inches), whether 4G and rear camera pixels are supported.

And 3, collecting related user call behavior data (including daily call out times, daily call out time, daily call in times and daily call in time) and flow use behavior data (including daily flow use times, daily flow use time, daily uplink flow and daily downlink flow), and performing missing value completion and abnormal value elimination processing.

And detecting missing values and abnormal values of the user call and the traffic using behavior data, and filling the missing values and the abnormal values by using a linear interpolation method.

And 4, collecting historical switch change records of the user.

The change machine record is represented in the form of a triplet < u, s, t >, where u is the user id, s is the terminal id, and t is the date on which user u used terminal s the earliest.

(2) A characteristic construction and extraction stage; the specific steps of the feature construction and extraction stage are as follows:

step 1, encoding the natural attribute of the user to obtain a natural attribute code v of the user.

The specific steps of constructing the user natural attribute code comprise:

a) one-hot encoding is done directly for discrete attributes (e.g., gender, customer star rating).

b) And dividing continuous attributes except age (such as the network access time and the number of CDMA under the same client) into 5 intervals through equal frequency sub-boxes, and then carrying out unique hot coding.

c) Regarding the age attribute, considering that users in different age groups prefer different switches, the ages are divided into 8 intervals (16 years old or less, 16 to 21 years old, 22 to 27 years old, 28 to 33 years old, 34 to 39 years old, 40 to 45 years old, 46 to 51 years old, and 51 years old or more), and then are subjected to unique heat coding.

d) And splicing the one-hot codes of all the user natural attributes to obtain a user natural attribute code v.

Step 2, mapping the user natural attribute code v to a feature space to obtain a user natural attribute feature representation f_attr。

Mapping the user natural attribute code v to a feature space by using a linear transformation mode, wherein the formula is as follows:

f_attr＝W_a ^Tv (1)

wherein, W_aIs a mapping matrix.

And 3, coding the terminal attribute to obtain a terminal attribute code l.

The specific steps of constructing the terminal attribute code comprise:

a) and directly carrying out one-hot coding on discrete attributes (such as an operating system and whether 4G is supported) except brands.

b) Given the large number of terminal brands, directly encoding them one-hot can cause a dimensional disaster. And (4) counting the use distribution of the terminal brands, dividing the terminal brands into 10 categories for coding, wherein the brands using the first 9 bits of the proportion are divided into 9 categories, and the rest terminal brands are classified into other categories.

c) And dividing continuous attributes (such as price and screen size) into 5 intervals through equal frequency sub-boxes and then carrying out one-hot coding.

d) And splicing the one-hot codes of all the terminal attributes to obtain a terminal attribute code l.

And 4, constructing a user history change machine information code r according to the user history change machine record and the corresponding terminal attribute coding mode.

Suppose the historical change machine record for user u is:

<u，s₁，t₁>，<u，s₂，t₂>，…，<u，s_k，t_k>the current date is t_now. The specific steps for constructing the user history change machine information code comprise:

a) dividing the terminal corresponding to the user history change machine record into user history use terminals { s₁，…，s_k-1And currently used terminal s_k。

b) Encoding a plurality of terminal attributes corresponding to the terminal used by the user history

Maximum pooling processing is carried out to obtain historical use terminal codes l_hisIndicating the user's historical preference for the terminal.

c) Coding the terminal attribute corresponding to the terminal currently used by the user

As the currently used terminal code l_nowIndicating the current preference of the user for the terminal.

d) Calculating historical average change time interval p of user_hisAnd the current terminal usage duration p_now，

The formula is as follows:

p_now＝interval(t_k，t_now) (3)

where interval (·, ·) represents the number of days between two dates.

e) Terminal code l used by user history_hisCurrently used terminal code l_nowNormalized historical average change machine time interval p_hisAnd the normalized current terminal use time p_nowAnd splicing to obtain the historical user change information code r.

Step 5, mapping the user history change machine information code r to a characteristic space to obtain a user history change machine information characteristic representation f_term。

Mapping the user change information code r to a feature space by using a linear transformation mode, wherein the formula is as follows:

f_term＝W_t ^Tr (4) wherein W_tIs a mapping matrix.

Step 6, according to the user conversation and flow use behavior data after missing value completion and abnormal value elimination, constructing a conversation behavior time sequence ξ with the time interval of 1 day and the span of T days_callAnd traffic usage behavior timing ξ_dataAnd performing coarsening treatment and normalization treatment with the granularity of g.

The specific steps of the coarsening and the normalization processing comprise:

a) performing maximum and minimum normalization processing on each sequence to normalize the processed data to [0, 1%]X is the original value, x_maxIs the maximum value of the sequence in which the value lies, x_minFor the minimum value of the sequence in which the value lies, the formula is as follows:

b) when the model is trained, the sequence with too many steps can significantly increase the complexity of model calculation. And calculating the average value of the behavior time sequence with the time span of T days every g days, and coarsening the sequence with the original step length of T to the step length of T/g.

Step 7, constructing two multi-layer LSTM networks with the same structure, wherein each layer comprises s ═ T/g LSTM units, and performing coarsening and normalization on the conversation behavior time sequence ξ_callAnd traffic usage behavior timing ξ_dataRespectively input into two LSTM networks, and extracting the time sequence characteristic f of the conversation behavior_callAnd flow usage behavior timing characteristics f_data。

To effectively extract the features of time series data, a multi-layer LSTM network is generally adopted to enhance the nonlinear capability of the model. In order to balance the fitting capability and complexity of the model, three layers of LSTM networks are adopted to extract the time sequence characteristics of the user conversation and traffic use behaviors.

The LSTM network is a kind of recurrent neural network, and each LSTM unit contains a memory unit c_tAnd three gates: input door i_tOutput gate o_tAnd forget door f_tAnd respectively controlling the input, the output and the updating of data. With x_tFor input at time t, h_t-1And c_t-1The calculation formula for the hidden state and the memory unit state at the previous moment is as follows:

i_t＝sigm(W_xix_t+W_hih_t-1+b_i) (6)

f_t＝sigm(W_xfx_t+W_hfh_t-1+b_f) (7)

o_t＝sigm(W_xox_t+W_hoh_t-1+W_coc_t-1+b_o) (9)

wherein the operator

As shown in FIG. 2, the coarsened and normalized conversation activity timing ξ_callAnd traffic usage behavior timing ξ_dataAnd s-T/g values are contained and input into corresponding LSTM units respectively. In the LSTM network, the state of the last moment is input into the next LSTM unit, the time sequence information of the data is reserved, the vector output by the last layer of LSTM is activated by using a Sigmoid function, and the time sequence characteristic f of the call behavior is extracted_callAnd flow usage behavior timing characteristics f_data。

(3) And (3) feature fusion and prediction stage:

step 1, representing the natural attribute characteristics of a user by f_attrAnd f, representing the information characteristics of the user history change machine_termTime sequence characteristic f of conversation behavior_callAnd flow usage behavior timing characteristics f_dataAnd splicing to obtain a depth characteristic f.

And 2, sending the depth characteristic f into a full-connection network, and predicting whether the user will change the machine within the future T' days.

As shown in fig. 3, the depth feature x is processed by using the full-connection network, and the probability p of changing the machine within the future T' days of the user is output. The activation function of the hidden layer of the full-connection network is ReLU, the neuron number of the output layer is 1, the activation function is Sigmoid, the range of the output result is [0, 1], and the probability of changing the machine in the future T' day of the user is represented. The nonlinear capability of the model can be enhanced by stacking a plurality of fully-connected layers, and in order to balance the fitting capability and complexity of the model, a three-layer fully-connected network is adopted. Wherein the full connectivity layer maps the input h using a non-linear function ReLU activation according to:

z＝ReLU(W^Th) (11)

While the invention has been described in connection with specific embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A multi-factor perception terminal switching prediction method based on a deep neural network is characterized by comprising three stages of data acquisition and preprocessing, feature construction and extraction, and feature fusion and prediction, and specifically comprises the following steps:

2. The method according to claim 1, wherein the method comprises the following steps: the data acquisition and preprocessing stage is specifically as follows:

and (1.4) collecting user history change machine records.

3. The method according to claim 2, wherein the method comprises the following steps: in the step (1.3), missing value and abnormal value detection is carried out on the user call and the traffic using behavior data, and a linear interpolation method is used for filling the missing value and the abnormal value.

4. The method according to claim 2, wherein the method comprises the following steps: in the step (1.4), the historical user change record is represented in a form that a triple is less than u, s and t, wherein u is a user id, s is a terminal id, and t is a date when the user u uses the terminal s earliest.

5. The method according to claim 1, wherein the method comprises the following steps: the characteristic construction and extraction stage is as follows:

f_attr＝W_aT_v

wherein, W_aIs a mapping matrix;

(2.3) coding the terminal attribute to obtain a terminal attribute code l;

f_term＝W_t ^Tr

wherein, W_tIs a mapping matrix;

6. The method according to claim 5, wherein the method comprises the following steps: in the step (2.1), the method for obtaining the user natural attribute code v includes:

(2.1.1) directly performing one-hot encoding on the discrete attributes;

7. The method according to claim 5, wherein the method comprises the following steps: in the step (2.3), the method for obtaining the terminal attribute code l includes:

(2.3.1) directly one-hot coding discrete attributes other than brand;

8. The method according to claim 5, wherein the method comprises the following steps: in the step (2.4), the method for constructing the user history change machine information code r is as follows:

(2.4.1) assume that user u's historical switch machine records are: < u, s₁，t₁＞，＜u，s₂，t₂＞，…，＜u，s_k，t_k>. The current date is t_now(ii) a Corresponding the user history to the change recordIs divided into user history use terminals s₁，…，s_k-1And currently used terminal s_k；

p_now＝interval(t_k，t_now)

wherein interval (·,) represents the number of days between two dates;

(2.4.5) terminal code l used by user history_hisCurrently used terminal code l_nowNormalized historical average change machine time interval p_hisAnd the normalized current terminal use time p_nowAnd splicing to obtain the historical user change machine information code r.

9. The method according to claim 5, wherein the method comprises the following steps: in the step (2.6), the specific steps of the user behavior sequence coarsening and normalization processing include:

(2.6.1) performing maximum and minimum normalization processing on each sequence to normalize the processed data to[0，1]X is the original value, x_maxIs the maximum value of the sequence in which the value lies, x_minFor the minimum value of the sequence in which the value lies, the formula is as follows:

10. The method according to claim 5, wherein the method comprises the following steps: in the step (2.7), three-layer LSTM network is adopted to extract the time sequence characteristics of the user conversation and flow using behaviors; the LSTM network is a cyclic neural network, and each LSTM unit comprises a memory unit c_tAnd three gates: input door i_tOutput gate o_tAnd forget door f_tRespectively controlling the input, the output and the updating of data; with x_tFor input at time t, h_t-1And c_t-1The calculation formula for the hidden state and the memory unit state at the previous moment is as follows:

i_t＝sigm(W_xix_t+W_hih_t-1+b_i)

f_t＝sigm(W_xfx_t+W_hfh_t-1+b_f)

o_t＝sigm(W_xox_t+W_hoh_t-1+W_coc_t-1+b_o)

wherein the operator

11. The method according to claim 1, wherein the method comprises the following steps: the characteristic fusion and prediction stage is specifically as follows:

12. The method according to claim 11, wherein the method comprises the following steps: in the step (3.2), the depth feature x is processed by using the full-connection network, and the probability p of changing the machine in the future T' days of the user is output; the activation function of a hidden layer of the full-connection network is ReLU, the neuron number of an output layer is 1, the activation function is Sigmoid, the range of an output result is [0, 1], and the probability of changing the machine in the future T' day of a user is represented; the nonlinear capability of the model can be enhanced by stacking a plurality of fully-connected layers, and in order to balance the fitting capability and the complexity of the model, a three-layer fully-connected network is adopted; wherein the full connectivity layer maps the input h using a non-linear function ReLU activation according to:

z＝ReLU(W^Th)