CN106529711B

CN106529711B - User behavior prediction method and device

Info

Publication number: CN106529711B
Application number: CN201610951917.9A
Authority: CN
Inventors: 赵影
Original assignee: Neusoft Corp
Current assignee: Neusoft Corp
Priority date: 2016-11-02
Filing date: 2016-11-02
Publication date: 2020-06-19
Anticipated expiration: 2036-11-02
Also published as: CN106529711A

Abstract

The disclosure relates to a user behavior prediction method and a device, wherein the method comprises the following steps: collecting behavior record data of at least two users; clustering the behavior record data of each user respectively to form a plurality of clusters; respectively filtering the plurality of clusters corresponding to each user to obtain a long-term behavior characteristic cluster of each user; and determining the similarity between the users according to the long-term behavior feature cluster of each user so as to predict the user behavior. The method and the device have the advantages that the long-term behavior feature cluster of a single user is utilized to achieve acquisition of similar users so as to predict the user behavior, and the behavior prediction of the single user can be more accurate and fine; short-term behaviors in the user behaviors are filtered out, and the accuracy of prediction can be improved.

Description

User behavior prediction method and device

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a user behavior prediction method and apparatus.

Background

With the acceleration of the urbanization process, the urban traffic system is also rapidly developing. The efficiency of traffic operation is improved, all play important effect such as smooth and easy, the energy resource consumption of urban traffic and user's trip.

In the related technology, the travel habits of a large number of users are collected by utilizing big data analysis, and traffic jam prediction is carried out by combining information such as festivals and holidays so as to guide the users to go out off peak by peak or plan a travel route in advance to avoid jam and the like.

However, the prediction of the related art is a macroscopic prediction, which is based on the travel habits of a large number of users, and a more accurate and fine travel behavior prediction cannot be realized for a single user.

On the other hand, the influence of 'sporadic travel data' in a large amount of data is not considered in the prediction of the related technology, so that the prediction result cannot accurately reflect the travel behavior of the user.

Disclosure of Invention

The purpose of the present disclosure is to provide a user behavior prediction method and device, so as to realize fine and accurate prediction of user behavior.

In order to achieve the above object, in a first aspect, the present disclosure provides a user behavior prediction method, including:

collecting behavior record data of at least two users;

clustering the behavior record data of each user respectively to form a plurality of clusters;

respectively filtering the plurality of clusters corresponding to each user to obtain a long-term behavior characteristic cluster of each user;

and determining the similarity between the users according to the long-term behavior feature cluster of each user so as to predict the user behavior.

Optionally, the step of clustering the behavior records of each user respectively to form a plurality of clusters includes:

setting a sliding window with a preset length;

and clustering the behavior record data positioned in the sliding window to form a plurality of clusters.

Optionally, the clustering the behavior record data located in the sliding window to form a plurality of clusters includes:

respectively taking one or more pieces of behavior recording data in the sliding window as single clusters to form a cluster set;

respectively obtaining the similarity between the other behavior record data in the sliding window and each cluster in the cluster set;

for each cluster in the cluster set, behavior record data with the maximum similarity to each cluster is respectively acquired;

for the maximum similarity corresponding to each cluster, if the maximum similarity is greater than a preset threshold, attributing behavior record data corresponding to the maximum similarity to the cluster, and recalculating the centroid of the cluster; and if the maximum similarity is smaller than a preset threshold value, adding the behavior record data corresponding to the maximum similarity into the cluster set as a new cluster.

Optionally, the step of filtering the multiple clusters corresponding to each user respectively to obtain the long-term behavior feature cluster of each user includes:

counting the quantity of behavior record data in each cluster;

deleting the clusters with the quantity of the behavior record data in the clusters smaller than a preset threshold value to obtain the long-term behavior characteristic cluster of each user; or

And filtering the clusters with the cluster dispersion smaller than the preset behavior dispersion according to the preset behavior dispersion to obtain the long-term behavior feature cluster of each user.

counting the quantity of behavior record data in each cluster;

deleting the clusters with the quantity of the behavior recording data in the clusters smaller than a preset threshold value to obtain clusters to be processed;

and filtering clusters with the dispersion smaller than the preset behavior dispersion in the clusters to be processed according to the preset behavior dispersion so as to obtain the long-term behavior feature cluster of each user.

Optionally, the step of determining similarity between users according to the long-term behavior feature cluster of each user to predict user behavior includes:

obtaining cluster similarity of long-term behavior feature clusters of a user to be predicted and one or more users;

according to the cluster similarity, obtaining the similarity between the user to be predicted and the one or more users;

according to the similarity among the users, determining a target user similar to the user to be predicted in the one or more users;

and predicting the behavior of the user to be predicted according to the determined target user.

acquiring cluster similarity of the long-term behavior feature clusters of the multiple users according to the long-term behavior feature clusters of the multiple users;

acquiring the similarity of a plurality of users according to the cluster similarity;

classifying users with the user similarity exceeding a preset similarity threshold into a similar user set;

and predicting the user behavior according to the similar user set.

Optionally, the method further comprises:

and recommending information to the user according to the result of the user behavior prediction.

In a second aspect, the present disclosure provides a user behavior prediction apparatus, including:

the acquisition module is used for acquiring behavior record data of at least two users;

the clustering module is used for clustering the behavior record data of each user respectively to form a plurality of clusters;

the filtering module is used for respectively filtering the plurality of clusters corresponding to each user to obtain the long-term behavior characteristic cluster of each user;

and the prediction module is used for determining the similarity between the users according to the long-term behavior feature cluster of each user so as to predict the user behavior.

Optionally, the clustering module comprises:

a cluster set forming submodule, configured to take one or more pieces of behavior recording data in the sliding window as a single cluster, respectively, and form a cluster set;

a similarity obtaining submodule, configured to obtain similarity between the remaining behavior record data in the sliding window and each cluster in the cluster set;

the maximum similarity obtaining sub-module is used for respectively obtaining behavior record data with the maximum similarity with each cluster in the cluster set;

the cluster updating submodule is used for attributing the behavior record data corresponding to the maximum similarity to each cluster and recalculating the centroid of each cluster if the maximum similarity is greater than a preset threshold; and if the maximum similarity is smaller than a preset threshold value, adding the behavior record data corresponding to the maximum similarity into the cluster set as a new cluster.

Optionally, the filtration module comprises:

the statistic submodule is used for counting the quantity of the behavior record data in each cluster;

the deleting submodule is used for deleting the clusters of which the quantity of the behavior record data is less than a preset threshold value in the clusters to obtain the clusters to be processed;

and the dispersion filtering submodule is used for filtering the clusters with the dispersion smaller than the preset behavior dispersion in the cluster to be processed according to the preset behavior dispersion so as to obtain the long-term behavior characteristic cluster of each user.

Optionally, the prediction module comprises:

the first cluster similarity obtaining submodule is used for obtaining cluster similarity of long-term behavior feature clusters of a user to be predicted and one or more users;

the first user similarity obtaining submodule is used for obtaining the similarity between the user to be predicted and the one or more users according to the cluster similarity;

the target user obtaining sub-module is used for determining a target user similar to the user to be predicted in the one or more users according to the similarity among the users;

and the first behavior prediction sub-module is used for predicting the behavior of the user to be predicted according to the determined target user.

Optionally, the prediction module comprises:

the second cluster similarity obtaining sub-module is used for obtaining cluster similarity of the long-term behavior feature clusters of the multiple users according to the long-term behavior feature clusters of the multiple users;

the second user similarity obtaining submodule is used for obtaining the similarity of a plurality of users according to the cluster similarity;

the similar user set acquisition submodule is used for attributing the users with the user similarity exceeding a preset similarity threshold to the similar user set;

and the second behavior prediction submodule is used for predicting the user behavior according to the similar user set.

Optionally, the apparatus further comprises:

and the information recommendation module is used for recommending information to the user according to the result of the user behavior prediction.

In a third aspect, the present disclosure provides a user behavior prediction apparatus, including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: collecting behavior record data of at least two users; clustering the behavior record data of each user respectively to form a plurality of clusters; respectively filtering the plurality of clusters corresponding to each user to obtain a long-term behavior characteristic cluster of each user; and determining the similarity between the users according to the long-term behavior feature cluster of each user so as to predict the user behavior.

By the technical scheme, similar users are obtained by utilizing the long-term behavior feature cluster of a single user, so that the user behavior is predicted, for example, whether the user will go out in a certain specific time or not, the place of the user going out and the like can be predicted, and the behavior prediction of the single user can be more accurate and fine; short-term behaviors in user behaviors are filtered out, so that the accuracy of prediction can be improved; by predicting the travel behaviors of the user, the guiding significance is provided for the operation of rail transit; in addition, information recommendation is performed according to the predicted user behaviors, targeted information recommendation can be achieved, and user experience and commercial value are improved.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:

FIG. 1 is a schematic flow chart diagram of a user behavior prediction method according to an exemplary embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a data collection platform according to an embodiment of the present disclosure;

FIG. 3 is a schematic flow chart illustrating clustering of behavior record data located in a sliding window to form a plurality of clusters according to an embodiment of the present disclosure;

FIG. 4 is a schematic flow chart illustrating filtering clusters in an embodiment of the present disclosure;

FIG. 5 is a schematic flow chart illustrating user behavior prediction according to an embodiment of the present disclosure;

FIG. 6 is a schematic flow chart illustrating user behavior prediction according to another embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a user behavior prediction apparatus according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of a user behavior prediction apparatus according to another embodiment of the present disclosure;

fig. 9 is a block diagram illustrating an apparatus for a user behavior prediction method according to an example embodiment of the present disclosure.

Detailed Description

The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.

Fig. 1 is a flowchart illustrating a user behavior prediction method according to an exemplary embodiment of the present disclosure. The user behavior prediction method comprises the following steps:

step 101, collecting behavior record data of at least two users.

In an embodiment of the present disclosure, an example is given in which behavior record data of a user includes behavior data of a trip, and an embodiment of the present disclosure is described. The travel behavior record data may include data acquired from a public transportation system, a subway system, a railway system, an aviation system, a road transportation system, a third-party platform (e.g., various travel software systems, a weather forecast system, a news system, etc.), and the like.

For the subway system, when a user goes out, if the user uses a bus card to swipe the card for entering and exiting, the user can obtain the trip data such as the departure place, the departure time, the destination, the arrival time and the like according to the card swiping transaction data. For example, in one embodiment, each bus card has a respective identification number. When each card-holding user swipes the card in and out of the station, transaction data is generated, and the transaction data corresponds to the identification numbers, so that the travel data of the card-holding user corresponding to each identification number can be acquired.

For the public traffic system only performing the system of swiping the card on the bus, the departure time can be obtained according to the transaction data generated when the card is swiped on the bus. In one embodiment, the departure station (origin) may be determined from the departure time in conjunction with the GPS of the bus. The destination and arrival time may be determined by obtaining the destination and arrival time in conjunction with other information, such as transfer swipe information for the same bus card. If the information of transfer card swiping cannot be combined, the destination and the arrival time can be set as defaults, or the destination station of the bus is taken as the destination, and the arrival time of the bus at the destination station is taken as the arrival time.

In an embodiment of the disclosure, in order to more comprehensively reflect the travel behavior of the user, the travel information of the same user through the railway system, the aviation system, the public transportation system, the road transportation system and the third-party platform is integrated with the travel information of the railway system, so as to obtain the travel behavior record data of the user.

For a railway system and an aviation system, travel data such as departure place, departure time, destination, arrival time and the like can be obtained according to ticket information of purchased tickets. For a travel software system in a third-party platform (for example, a network car booking platform, an internet bus travel platform, and the like), travel data such as a departure place, departure time, a destination, arrival time, and the like can be obtained according to travel selection (selection of the departure place and the destination) of a user.

It should be understood that for a railway system, an aviation system, a road transportation system and a third-party platform adopting a real-name system, behavior record data of travel of the same user can be acquired from different systems and platforms according to identity information (for example, identity card information) of the user. And for the subway system and the public transport system, the identity information of the user can be bound with the identification number of the bus when the bus card is purchased, so that the trip data of the same user of each system and platform can be integrated according to the identity information of the user.

In an embodiment of the present disclosure, the behavior record data of the user trip at least includes one of the following: travel date, transportation means used for travel, departure place, departure time, destination, arrival time, weather information (weather condition of departure place, weather condition of destination), event information (e.g., major holiday information, major meeting), and the like.

As described above, the travel date, the transportation means used for travel, the departure place, the departure time, the destination, and the arrival time can be obtained according to card-swiping transaction data, ticket purchasing data, or travel selection of the user; weather information may be obtained from a weather forecasting system; the event information may be obtained from a news system, calendar, etc.

In an embodiment of the present disclosure, a data collection platform may be established to collect behavior data of a user. Referring to fig. 2, the data collection platform 200 is communicatively connected to a subway system 201, a public transportation system 202, a railway system 203, an aviation system 204, a road transportation system 205, and a third party platform 206, respectively. The data collection platform 200 may acquire the trip data of the user from each system, and may perform operations such as format conversion and information extraction on the trip data from different systems to obtain the trip behavior record data of each user.

And 102, clustering the behavior record data of each user respectively to form a plurality of clusters.

Clustering is to divide a data set into different clusters according to a specific criterion (such as a distance criterion), so that the similarity of data objects in the same cluster is as large as possible, and the difference of data objects not in the same cluster is also as large as possible.

In the embodiment of the disclosure, the behavior record data of the user is obtained by tracking the travel record of the user, and in order to accurately reflect the travel behavior of the user, a large amount of data needs to be collected. In order to facilitate analysis of a large amount of data, a sliding window with a preset length is set, and behavior records of a user are sorted according to time. As the user behavior record increases, the sliding window slides (e.g., to the right) to include the newest behavior record in the sliding window, while removing the old behavior record from the sliding window. And clustering the behavior records in the sliding window, and forming a plurality of clusters after clustering, wherein the behaviors in each cluster are similar.

In an embodiment of the present disclosure, the sliding window with the preset length is defined according to a time length, for example, the length of the sliding window may be set to be half a year, and the like.

Referring to fig. 3, a schematic flow chart of clustering behavior record data located in a sliding window to form a plurality of clusters according to an embodiment of the present disclosure is shown.

In an embodiment of the present disclosure, first, the behavior record data in the sliding window is converted into a behavior vector matrix. If the sliding window comprises m behavior record data r of the user A₁，r₂，……，r_mAnd a dimension of each behavior record data is n (for example, if the behavior record data includes travel date, transportation adopted by travel, departure place, departure time, destination, arrival time and weather information, the dimension n is 7), the behavior vector matrix corresponding to the user a is m × n.

Step 301, regarding one or more behavior recording data in the sliding window as a single cluster respectively, and forming a cluster set C.

In the embodiment of the present disclosure, initially, any behavior in the sliding window may be recorded as the first single cluster C in the cluster set C₁I.e. C = { C = { (C)₁}. The cluster set C will be gradually updated in the subsequent steps.

Step 302, respectively obtaining the similarity between the remaining behavior recording data in the sliding window and each cluster in the cluster set.

In one embodiment, a vector space model is used to calculate the similarity, i.e. the similarity is shown in formula (1).

（1）

Wherein n is the dimension of the behavior record data,rc _iis a cluster C_iThe center of mass of the lens. Cluster C_iCenter of mass ofrc _iCan be obtained by the formula (2).

（2）

Therein,. mu.gr ₁Is the cluster C_iThe number of the data in (1) is,p _jis a cluster C_iThe data object of (1).

In one embodiment of the present disclosure, the first behavior is recorded as datar ₁As the first cluster C in the cluster set C₁Then the remaining behavioral record data (r) in the sliding window are calculated sequentially from far to near in time₂，……，r_m) And cluster C₁The similarity of (c). I.e. in formula (1)iTaking out the number 1 of the samples,jtaking 2 to m to respectively obtain behavior record data r₂，……，r_mAnd cluster C₁The similarity of (c).

Step 303, for each cluster in the cluster set, acquiring behavior record data having the maximum similarity with each cluster respectively.

304, for the maximum similarity corresponding to each cluster, if the maximum similarity is greater than a preset threshold, attributing the behavior record data corresponding to the maximum similarity to the cluster, and recalculating the centroid of the cluster; and if the maximum similarity is smaller than a preset threshold value, adding the behavior record data corresponding to the maximum similarity into the cluster set as a new cluster.

Steps

302, 303 and 304 are repeatedly executed until the behavior record data in the sliding window all realize clustering and are classified into corresponding clusters.

For example, for cluster C including data recorded by the first behavior in the cluster set₁Sequentially acquiring the behavior record data r₂，……，r_mAnd cluster C₁And obtains the similarity with the cluster C₁With maximum similarityS _maxIs recorded as r₂. If the maximum similarity isS _maxIf the maximum similarity is larger than the preset threshold, the behavior record data r corresponding to the maximum similarity is recorded₂Is classified into cluster C₁Attributing the behavior record data corresponding to the maximum similarity to the cluster C₁Then, the cluster C is updated according to equation (2)₁Center of mass ofrc ₁. When the loop execution reaches the step 302, the behavior record data r in the sliding serial port is obtained₃，……，r_mWith cluster C having updated centroid₁And clustering the behavior record data corresponding to the maximum similarity according to a preset threshold.

If the maximum similarity isS _maxIf the maximum similarity is less than the preset threshold value, the behavior record data r corresponding to the maximum similarity is recorded₂As a new cluster C₂Added to the cluster set, cluster set C = { C = { C = }₁，C₂}. When the step 302 is circulated, the behavior in the sliding window is recorded into data r₃，……，r_mRespectively with cluster C in the cluster set₁And cluster C₂Similarity calculation is carried out, and the similarity calculation result and the cluster C are respectively obtained₁And cluster C₂And clustering the behavior record data corresponding to the maximum similarity according to a preset threshold.

In the embodiment of the present disclosure, as time goes on, when adding one or more new behavior record data, the newly added behavior record data according to step 302 and step 304 may be clustered to be classified into an existing cluster in the cluster set or to form a new cluster. Since the length of the sliding window is fixed, when the behavior record data is newly added, the behavior record data that is the oldest time is removed from the sliding window, and the centroid of the cluster to which the removed behavior record data belongs is recalculated.

And 103, respectively filtering a plurality of clusters corresponding to each user to obtain a long-term behavior feature cluster of each user.

The same cluster records are either clustered or discrete, and clusters that reflect short-term behaviors are more clustered because short-term behaviors often occur frequently at a specific time, while clusters that reflect long-term behaviors of users are more discrete because such behaviors are more normal and will continue to appear stably in the behavior records. In the embodiment of the disclosure, by identifying the short-term behaviors in the user behavior record, the noise of user behavior prediction can be reduced, and the prediction accuracy is improved.

In one embodiment of the present disclosure, the formed clusters are filtered in one or two or a combination of the following ways to filter out short-term behaviors and improve the accuracy of prediction.

The first method is as follows: filtering by defining filtering factor

Some of the user behaviors are accidental behaviors of the user, have randomness and cannot reflect the behavior characteristics of the user, and records are often clustered into small clusters in the clusters formed by recording data according to the behaviors in the sliding window. Therefore, the cluster reflecting the long-term behavior characteristic and the recent short-term behavior characteristic of the user can be found out from the clustered clusters by defining the filtering factor f.

Thus, the number of behavior record data in each cluster is counted, and clusters in which the number of behavior record data in the cluster is smaller than a preset threshold f × m (m is the total number of behavior record data in the sliding window) are deleted as noise clusters. Clusters in which the number of behavior recording data in the cluster is larger than f × m are regarded as valid clusters reflecting the behavior characteristics of the user. Therefore, only when the proportion of the quantity of the behavior recording data in the cluster to the total quantity of the behavior recording data in the sliding window reaches a certain value, the cluster is considered to be capable of reflecting the behavior characteristics of the user, and the long-term behavior characteristic cluster of the user is obtained.

The second method comprises the following steps: and filtering the clusters with the cluster dispersion smaller than the preset behavior dispersion according to the preset behavior dispersion to obtain the long-term behavior feature cluster of each user.

For the clusters reflecting the long-term behavior characteristics of the user, the more widely the records in the clusters are distributed, which indicates that the user has similar behaviors for a long time. Therefore, the dispersion w of each cluster is obtained according to the formula (3) so as to filter the clusters and obtain the long-term behavior feature cluster of the user.

（3）

Where n is a dimension of the behavior recording data in the cluster, and t is a value obtained by time-dividing a behavior occurrence time (e.g., a departure time) in the cluster, for example, the behavior occurrence time is: 2016-09-2615: 05:26, then t is the number of points elapsed from point 0, i.e., t =15 x 60+5= 905; d is the span of days in which each row in the cluster is recording data. The preset behavior dispersion w is inversely proportional to the time fluctuation and directly proportional to the number of days.

When the dispersion of a cluster is smaller than the preset behavior dispersion, the behavior is an accidental behavior with short duration or a behavior without large regular time fluctuation and cannot represent the generality of the user behavior, so that the cluster is filtered. Therefore, formed clusters can be filtered according to the dispersion of the clusters, and the long-term behavior feature cluster of the user can be obtained.

In an embodiment of the present disclosure, the clusters formed in step 102 may be filtered in one or two ways as described above, so as to obtain the long-term behavior feature cluster of the user.

Referring to fig. 4, in another embodiment of the present disclosure, the first and second ways may be combined to perform filtering of clusters, further improving the accuracy of prediction.

Step 401, counting the number of behavior recording data in each cluster;

step 402, deleting the clusters with the quantity of the behavior recording data in the clusters smaller than a preset threshold value to obtain the clusters to be processed;

and 403, filtering clusters with the dispersion smaller than the preset behavior dispersion in the clusters to be processed according to the preset behavior dispersion so as to obtain the long-term behavior feature cluster of each user.

And step 104, determining the similarity between the users according to the long-term behavior feature cluster of each user so as to predict the user behavior.

In an embodiment of the present disclosure, users with similar behaviors can be found by using a collaborative idea, so that the target user behavior can be predicted according to the known user behavior.

Referring to fig. 5, a schematic flow chart of the user behavior prediction according to an embodiment of the present disclosure includes the following steps:

step 501, obtaining cluster similarity of long-term behavior feature clusters of a user to be predicted and one or more users.

In one embodiment, user U is connected to_xEach cluster in the long-term behavior feature cluster is respectively associated with a user U to be predicted_yThe similarity of each cluster in the long-term behavior feature clusters is calculated. User U_xTarget cluster in long-term behavior feature cluster and user U to be predicted_yThe similarity of the target clusters in the long-term behavior feature cluster of (4) can be obtained by equation (4).

（4）

In the formula C_xiAnd C_yiAre respectively a user U_xTarget cluster in long-term behavior feature cluster and user U to be predicted_yThe centroid of a target cluster in the long-term behavioral feature cluster. From equation (2), the centroid is a vector, and | | | | in equation (4) is the length of the vector.

In one embodiment, the maximum similarity value is obtained as the user U to be predicted_yAnd user U_xCluster similarity of (2).

And 502, acquiring the similarity between the user to be predicted and one or more users according to the cluster similarity.

In one embodiment, user U is obtained according to equation (5)_xAnd the user U to be predicted_yThe similarity of (c).

（5）

Where n is the dimension of the behavior recording data in the cluster,S(C _xi ，C _yi )is the cluster similarity.

In the embodiment of the disclosure, according to the cluster similarity, the similarity between the users is further obtained, the similarity between the two users is measured from a finer perspective, and the prediction accuracy is improved.

Step 503, according to the similarity between users, determining a target user similar to the user to be predicted in one or more users.

In one embodiment, the user with the highest similarity may be the target user U_i. In some embodiments, users with similarity reaching a set threshold may also be all targeted users.

And step 504, predicting the behavior of the user to be predicted according to the determined target user.

When the target user U is obtained_iAnd the user U to be predicted_yIf the behavior of the target user is known, the behavior of the user to be predicted can be predicted.

In one embodiment, if target user U_iThe behavior L is carried out at a certain moment, and the user U to be predicted can be obtained through the formula (6)_yPerforming the probability of the behavior L so as to treat the predicted user U_yAnd performing behavior prediction.

（6）

Wherein the content of the first and second substances,p(U _y L)representing the user U to be predicted_yProbability of proceeding with action L;S(U _x ，U _y )and N is the number of target users performing the behavior L.

In one embodiment, behavior L may represent a trip through a site at a time, then U in equation (6)_iFor users (one or more of target users) who go out through a certain site at a certain moment, the user U to be predicted is obtained through the formula (6)_yProbability of going through a site at a time. When there are multiple sites (i.e. there are multiple behaviors L), the trip users of different sites can be divided among the target users, so as to predict the trip probability of each site, and the site with the highest probability is predicted as U_yAnd (5) sites to be traveled.

Referring to fig. 6, a flowchart illustrating a user behavior prediction according to another embodiment of the present disclosure is shown. The difference between this embodiment and the embodiment shown in fig. 5 is that, in this embodiment, the similarity between users is calculated according to the long-term behavior feature cluster of the users, and the users whose similarity exceeds the preset similarity threshold are taken as a similar user group, so that the user behavior prediction is performed according to the similar user group.

This embodiment comprises the steps of:

step 601, obtaining cluster similarity of the long-term behavior feature clusters of the multiple users according to the long-term behavior feature clusters of the multiple users.

Step 602, obtaining the similarity of a plurality of users according to the cluster similarity.

It should be understood that

steps

601 and 602 are the same as the above embodiments of

steps

501 and 502, respectively, and are not described again here.

And 603, attributing the users with the user similarity exceeding the preset similarity threshold to a similar user set.

In an embodiment of the present disclosure, the preset similarity threshold may be set between 60% and 100%.

And step 604, predicting the user behavior according to the similar user set.

In an embodiment of the present disclosure, when any user or more than a certain proportion of users in the similar user set is detected to perform a certain behavior, it may be predicted that other users in the similar user set may perform the same behavior, and thus, prediction of user behavior may be achieved. For example, for the trip behavior of the user, if 1000 users are collected from the similar users, 1 or 10 users are detected to trip from the site a, and it can be predicted that similar trip behaviors will occur for the rest of the users; therefore, relevant stations can make operation scheduling preparation in advance, and guidance significance for operation of rail transit is achieved. In some embodiments, prompt information may also be sent to users in the similar user set, for example, to prompt whether a travel jam will occur, prompt the user to make a plan in advance, and the like.

In an embodiment of the disclosure, information recommendation may be performed to a user according to the behavior prediction result. The recommended information may be merchandise information, reminder information (e.g., weather reminders, event reminders), and the like. For example, according to the behavior prediction result, recommendation of a corresponding product (e.g., a movie, an advertisement, etc.) is performed at the predicted user travel point.

According to the user behavior prediction method, the acquisition of similar users is realized by utilizing the long-term behavior feature cluster of a single user, so that the user behavior can be predicted, for example, whether the user will go out in a certain specific time, the place of the user going out and the like can be predicted, and the more accurate and fine behavior prediction of the single user can be realized; short-term behaviors in user behaviors are filtered out, so that the accuracy of prediction can be improved; by predicting the travel behaviors of the user, the guiding significance is provided for the operation of rail transit; in addition, information recommendation is performed according to the predicted user behaviors, targeted information recommendation can be achieved, and user experience and commercial value are improved.

Fig. 7 is a schematic structural diagram of a user behavior prediction apparatus according to an embodiment of the present disclosure. The user behavior prediction apparatus 700 includes:

an acquisition module 701, configured to acquire behavior record data of at least two users;

a clustering module 702, configured to cluster the behavior record data of each user to form multiple clusters;

a filtering module 703, configured to filter the multiple clusters corresponding to each user, respectively, to obtain a long-term behavior feature cluster of each user;

and the predicting module 704 is configured to determine similarity between users according to the long-term behavior feature cluster of each user, so as to predict user behavior.

In one embodiment, clustering module 702 includes:

a cluster set forming sub-module 7021, configured to use one or more behavior recording data in the sliding window as a single cluster, respectively, to form a cluster set;

a similarity obtaining sub-module 7022, configured to obtain similarities between the remaining behavior record data in the sliding window and each cluster in the cluster set, respectively;

a maximum similarity obtaining sub-module 7023, configured to, for each cluster in the cluster set, respectively obtain behavior record data having a maximum similarity to the each cluster;

a cluster updating submodule 7024, configured to, for the maximum similarity corresponding to each cluster, if the maximum similarity is greater than a preset threshold, assign behavior record data corresponding to the maximum similarity to the cluster, and recalculate a centroid of the cluster; and if the maximum similarity is smaller than a preset threshold value, adding the behavior record data corresponding to the maximum similarity into the cluster set as a new cluster.

In one embodiment, the filtering module 703 includes:

a statistics submodule 7031, configured to count the number of behavior record data in each cluster;

a deletion submodule 7032, configured to delete a cluster in which the amount of behavior recording data in the cluster is smaller than a preset threshold, to obtain a cluster to be processed;

and the dispersion filtering submodule 7033 is configured to filter, according to the preset behavior dispersion, clusters in the to-be-processed cluster, whose dispersion is smaller than the preset behavior dispersion, so as to obtain the long-term behavior feature cluster of each user.

It should be understood that the filtering module 703 in the embodiment of the present disclosure may also include a statistics sub-module 7031 and a deletion sub-module 7032 to implement filtering to obtain the long-term behavior feature cluster of the user. In some embodiments, the filtering module 703 may include a dispersion filtering sub-module 7033 to implement filtering to obtain the long-term behavior feature cluster of the user.

In one embodiment, the prediction module 704 includes:

a first cluster similarity obtaining sub-module 7041, configured to obtain cluster similarities of long-term behavior feature clusters of the user to be predicted and one or more users;

a first user similarity obtaining sub-module 7042, configured to obtain, according to the cluster similarity, similarities between a user to be predicted and one or more users;

the target user obtaining sub-module 7043 is configured to determine, among the one or more users, a target user similar to the user to be predicted according to the similarity between the users;

and the first behavior prediction sub-module 7044 is configured to predict the behavior of the user to be predicted according to the determined target user.

Referring to fig. 8, in one embodiment, prediction module 704 includes:

a second cluster similarity obtaining sub-module 7045, configured to obtain cluster similarities of long-term behavior feature clusters of the multiple users according to the long-term behavior feature clusters of the multiple users;

a second user similarity obtaining sub-module 7046, configured to obtain similarities of multiple users according to the cluster similarity;

the similar user set obtaining sub-module 7047 is configured to assign users whose user similarity exceeds a preset similarity threshold to a similar user set;

and the second behavior prediction sub-module 7048 is configured to perform user behavior prediction according to the similar user set.

In one embodiment, the apparatus 700 further comprises:

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 9 is a block diagram illustrating an apparatus 900 for a user behavior prediction method according to an example embodiment. For example, the apparatus 900 may be provided as a server. Referring to fig. 9, the apparatus 900 includes a processing component 901 that further includes one or more processors and memory resources, represented by memory 902, for storing instructions, e.g., applications, that are executable by the processing component 901. The application programs stored in memory 902 may include one or more modules that each correspond to a set of instructions. Further, the processing component 901 is configured to execute instructions to perform the user behavior prediction method described above.

The device 900 may also include a power component 903 configured to perform power management of the device 900, a wired or wireless network interface 904 configured to connect the device 900 to a network, and an input/output (I/O) interface 905. The apparatus 900 may operate based on an operating system stored in the memory 902, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In one embodiment, the data collection platform shown in fig. 2 can be disposed in the apparatus 900, and the user behavior record data can be obtained from various systems through the network interface 904 and/or the input/output interface 905, and processed through the processing component 901. These data may be stored in memory 902.

It should be understood that, in the above embodiments of the present disclosure, the user behavior prediction is performed by taking the user travel behavior data as an example, and the user behavior prediction can be performed on the consumption behavior data, the web browsing behavior data, and the usage behavior data of various applications in a manner similar to the user travel behavior data.

According to the user behavior prediction method and device, the acquisition of similar users is realized by utilizing the long-term behavior feature cluster of a single user, so that the user behavior can be predicted, for example, whether the user will go out in a certain specific time, the place of the user going out and the like can be predicted, and the more accurate and fine behavior prediction of the single user can be realized; short-term behaviors in user behaviors are filtered out, so that the accuracy of prediction can be improved; by predicting the travel behaviors of the user, the guiding significance is provided for the operation of rail transit; in addition, information recommendation is performed according to the predicted user behaviors, targeted information recommendation can be achieved, and user experience and commercial value are improved.

The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.

It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. In order to avoid unnecessary repetition, various possible combinations will not be separately described in this disclosure.

In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims

1. A method for predicting user behavior, comprising:

collecting behavior record data of at least two users;

determining the similarity between users according to the long-term behavior feature cluster of each user so as to predict the user behavior; wherein the content of the first and second substances,

the determining the similarity between the users according to the long-term behavior feature cluster of each user to predict the user behavior includes:

according to a preset similarity calculation formula and cluster similarity of a user to be predicted and one or more long-term behavior feature clusters of the user to be predicted, obtaining the similarity of the user to be predicted and one or more users so as to predict user behaviors of the user to be predicted; wherein the content of the first and second substances,

the similarity calculation formula is as follows:

，

wherein the content of the first and second substances,S(U _x ，U _y )for any user U in the one or more users_xAnd the user U to be predicted_ySimilarity between them, n is the dimension of the behavior record data in the long-term behavior feature cluster, C_xiFor the userU_xC of a target cluster in each of the long-term behavioral feature clusters_yiFor the user U to be predicted_yThe centroid of a target cluster in each of the clusters of long-term behavioral characteristic clusters,S(C _xi ，C _yi )is the C_xiAnd said C_yiThe cluster similarity between the two clusters is high,E(S(C _xi ，C _yi ))is the C_xiAnd said C_yiA mathematical expectation of cluster similarity between; wherein the content of the first and second substances,

the calculation formula of the cluster similarity is as follows:

，

wherein, the C_xiAnd said C_yiCalculating in a vector form in the calculation formula of the cluster similarity, wherein | | C_xiI is the C_xiLength of the corresponding vector, the | | | C_yiI is the C_yiThe length of the corresponding vector.

2. The method of claim 1, wherein the step of clustering the behavior records of each user separately to form a plurality of clusters comprises:

setting a sliding window with a preset length;

3. The method of claim 2, wherein clustering the behavior record data located within the sliding window to form a plurality of clusters comprises:

4. The method according to claim 1, wherein the step of filtering the plurality of clusters corresponding to each user respectively to obtain the long-term behavior feature cluster of each user comprises:

counting the quantity of behavior record data in each cluster;

5. The method according to claim 1, wherein the step of filtering the plurality of clusters corresponding to each user respectively to obtain the long-term behavior feature cluster of each user comprises:

counting the quantity of behavior record data in each cluster;

6. The method of claim 1, wherein the step of determining similarity between users according to the long-term behavior feature cluster of each user to predict user behavior comprises:

7. The method of claim 1, wherein the step of determining similarity between users according to the long-term behavior feature cluster of each user to predict user behavior comprises:

and predicting the user behavior according to the similar user set.

8. The method of claim 1, further comprising:

9. A user behavior prediction apparatus, comprising:

the prediction module is used for determining the similarity between the users according to the long-term behavior feature cluster of each user so as to predict the user behavior; wherein the content of the first and second substances,

the prediction module is configured to:

the similarity calculation formula is as follows:

，

wherein the content of the first and second substances,S(U _x ，U _y )for any user U in the one or more users_xAnd the user U to be predicted_ySimilarity between them, n is the dimension of the behavior record data in the long-term behavior feature cluster, C_xiIs the user U_xC of a target cluster in each of the long-term behavioral feature clusters_yiFor the user U to be predicted_yThe centroid of a target cluster in each of the clusters of long-term behavioral characteristic clusters,S(C _xi ，C _yi )is the C_xiAnd said C_yiThe cluster similarity between the two clusters is high,E(S(C _xi ，C _yi ))is the C_xiAnd said C_yiA mathematical expectation of cluster similarity between; wherein the content of the first and second substances,

the calculation formula of the cluster similarity is as follows:

，

wherein, the C_xiAnd said C_yiIn the calculation formula of the cluster similarity, the calculation is carried out in a vector modeCalculating, the | | C_xiI is the C_xiLength of the corresponding vector, the | | | C_yiI is the C_yiThe length of the corresponding vector.

10. The apparatus of claim 9, wherein the clustering module comprises:

11. The apparatus of claim 9, wherein the filtration module comprises:

12. The apparatus of claim 9, wherein the prediction module comprises:

13. The apparatus of claim 9, wherein the prediction module comprises:

14. The apparatus of claim 9, further comprising:

15. A user behavior prediction apparatus, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: collecting behavior record data of at least two users; clustering the behavior record data of each user respectively to form a plurality of clusters; respectively filtering the plurality of clusters corresponding to each user to obtain a long-term behavior characteristic cluster of each user; determining the similarity between users according to the long-term behavior feature cluster of each user so as to predict the user behavior; wherein the content of the first and second substances,

the similarity calculation formula is as follows:

，

the calculation formula of the cluster similarity is as follows:

，