CN105320702B

CN105320702B - A kind of analysis method of user behavior data, device and smart television

Info

Publication number: CN105320702B
Application number: CN201410380588.8A
Authority: CN
Inventors: 李明烈
Original assignee: TCL Corp
Current assignee: TCL Corp
Priority date: 2014-08-04
Filing date: 2014-08-04
Publication date: 2019-02-01
Anticipated expiration: 2034-08-04
Also published as: CN105320702A

Abstract

The present invention is suitable for technical field of data processing, provide analysis method, device and the smart television of a kind of user behavior data, the described method includes: first establishing user behavior data sample, clustering processing is carried out to the user behavior data sample of foundation again, the more similar user of behavioral data is incorporated into a cluster, a similar users group is formed.The more similar user of behavioral data is incorporated into a cluster by carrying out clustering processing to user behavior data sample, forms a similar users group by the present invention.Due to the preference generally having the same of the user in similar users group, therefore, the video that user similar with active user can have once been seen, the website once browsed or the article once bought recommend active user, personalized service is preferably provided for user, the usage experience of user is promoted.

Description

User behavior data analysis method and device and smart television

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a user behavior data analysis method and device and an intelligent television.

Background

At present, the share of the smart television in the market has been increased year by year, users tend to individualize and diversify watching and using the smart television, and applications and tools based on the smart television are all in a hundred.

However, the existing applications and tools of the smart tv cannot accurately, timely, and efficiently analyze the behavior data of the user to know the usage behavior of the user, so as to obtain the similarity between users in the user group.

Disclosure of Invention

The embodiment of the invention provides a method and a device for analyzing user behavior data and an intelligent television, and aims to solve the problem that the similarity among users in a user group cannot be obtained according to the user behavior data of the intelligent television provided by the prior art.

In one aspect, a method for analyzing user behavior data is provided, where the method includes:

step A, establishing a user behavior data sample;

b, selecting behavior data of k users from the user behavior data sample, and taking the behavior data of the k users as respective centers of k clusters;

step C, respectively calculating the dissimilarity degree of the behavior data of other users in the user behavior data sample and the centers of the k clusters, and respectively classifying the behavior data of the other users to the cluster with the lowest dissimilarity degree to obtain a clustering result;

step D, according to the clustering result, recalculating respective centers of the k clusters to obtain respective new centers of the k clusters;

and E, respectively calculating the dissimilarity degree of the behavior data of all the users in the user behavior data sample and the new centers of the k clusters, classifying the behavior data of all the users to the cluster with the lowest dissimilarity degree respectively to obtain a clustering result, and returning to the step D until the clustering result is not changed any more or the execution frequency of the step D reaches the preset frequency.

Further, the step B includes:

calculating distances between behavior data of users in the user behavior data samples;

calculating the average value of the distances to obtain the average value of the distance vectors of the distances among the behavior data of the user, wherein the average value of the distance vectors is the average value of the distance vectors of the kth point;

calculating the average value of the distance vector average value to obtain a distance average value;

calculating a deviation value between the distance vector average value and the distance average value according to the distance vector average value and the distance average value;

and if the deviation value meets a preset condition, calculating the behavior data of the user corresponding to the distance vector average value of the kth point, and taking the behavior data of the user corresponding to the distance vector average value of the kth point as the behavior data of the selected kth user.

Further, calculating the dissimilarity degree of the behavior data of the user with the respective centers of the k clusters, and classifying the behavior data of the user into the cluster with the lowest dissimilarity degree includes:

calculating Euclidean distances between behavior data of the user and respective centers of the k clusters;

and classifying the behavior data of the user into a cluster with the minimum Euclidean distance with the behavior data of the user.

Further, after the step E, the method further includes:

scanning the behavior data of all users in a specified cluster in the clustering result;

generating a frequent 1 item set to a frequent N item set according to the behavior data, and calculating the support degree of each item set in the frequent item sets, wherein only one item set exists in the frequent N item sets;

and calculating to obtain an association rule between the behavior data of the user according to the support degree of each item set in the frequent N item sets and the support degree of each item set from the frequent N-1 item set to the frequent 1 item set.

Further, if the deviation value satisfies a preset condition, calculating the behavior data of the user corresponding to the average value of the distance vectors of the kth point, and taking the behavior data of the user corresponding to the average value of the distance vectors of the kth point as the behavior data of the selected kth user specifically includes:

if by formulaIf the calculated delta value meets the preset condition, the delta value will be calculatedTaking the distance vector average value of the corresponding kth point as behavior data of the kth user to be selected;

wherein,is the distance of the kth pointThe average value of the deviation vector is obtained,is the distance average, lambda is the correction factor, delta is the deviation between the distance vector average and the distance average.

In another aspect, an apparatus for analyzing user behavior data is provided, the apparatus including:

the behavior data sample establishing unit is used for establishing a user behavior data sample;

a first cluster center determining unit, configured to select behavior data of k users from the user behavior data samples, and use the behavior data of the k users as respective centers of the k clusters;

the first clustering result generating unit is used for respectively calculating the dissimilarity degree of the behavior data of other users in the user behavior data sample and the centers of the k clusters, and classifying the behavior data of the other users to the cluster with the lowest dissimilarity degree to obtain a clustering result;

a second cluster center determining unit, configured to recalculate respective centers of the k clusters according to the clustering result to obtain respective new centers of the k clusters;

and a second clustering result generating unit, configured to calculate difference degrees between the behavior data of all users in the user behavior data sample and respective new centers of the k clusters, and classify the behavior data of all users to the cluster with the lowest difference degree, to obtain a clustering result, and return to call the second cluster center determining unit until the clustering result is not changed any more or the number of times of executing step D reaches a preset number of times.

Further, the first cluster center determining unit includes:

a distance calculation module for calculating distances between the behavior data of the users in the user behavior data samples;

the distance vector average value calculation module is used for calculating the average value of the distances to obtain the distance vector average value of the distances among the behavior data of the user, and the distance vector average value is the distance vector average value of the kth point;

the distance average value calculating module is used for calculating the average value of the distance vector average value to obtain a distance average value;

the deviation value calculating module is used for calculating a deviation value between the distance vector average value and the distance average value according to the distance vector average value and the distance average value;

and the cluster center determining module is used for calculating the behavior data of the user corresponding to the average value of the distance vectors of the kth point if the deviation value meets a preset condition, and taking the behavior data of the user corresponding to the average value of the distance vectors of the kth point as the behavior data of the selected kth user.

Further, the first clustering result generating unit and

the second clustering result generating units each include:

a euclidean distance calculation module for calculating euclidean distances of the behavioral data of the user from respective centers of the k clusters;

and the user classification module is used for classifying the behavior data of the user into a cluster with the minimum Euclidean distance from the behavior data of the user.

Further, the apparatus further comprises:

the behavior data scanning unit is used for scanning the behavior data of all users in a specified cluster in the clustering result;

a frequent item set and support degree generating unit, configured to generate frequent 1 item sets to frequent N item sets according to the behavior data, and calculate support degree of each item set in the frequent item sets, where there is only one item set in the frequent N item sets;

and the association rule generating unit is used for calculating and obtaining the association rule between the behavior data of the user according to the support degree of each item set in the frequent N item sets and the support degree of each item set from the frequent N-1 item set to the frequent 1 item set.

In still another aspect, a smart television is provided, where the smart television includes the apparatus for analyzing user behavior data as described above.

In the embodiment of the invention, users with similar behavior data are classified into a cluster by clustering user behavior data samples to form a similar user group. Because the users in the similar user group generally have the same preference, the videos watched by the users similar to the current user, websites browsed by the users or articles purchased by the users can be recommended to the current user, personalized services can be better provided for the users, and the use experience of the users is improved.

Drawings

Fig. 1 is a flowchart illustrating an implementation of a method for analyzing user behavior data according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a big data storage platform according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a clustering process of user behavior data according to an embodiment of the present invention;

fig. 4 is a flowchart of an implementation of a method for analyzing user behavior data according to a second embodiment of the present invention;

fig. 5 is a block diagram of a specific structure of an apparatus for analyzing user behavior data according to a third embodiment of the present invention;

fig. 6 is a block diagram of a user behavior data analysis device according to a fourth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the embodiment of the invention, the user behavior data samples are established firstly, then the established user behavior data samples are clustered, and users with similar behavior data are classified into a cluster to form a similar user group.

The following detailed description of the implementation of the present invention is made with reference to specific embodiments:

example one

Fig. 1 shows an implementation flow of a method for analyzing user behavior data according to an embodiment of the present invention. In the whole process, the smart television firstly establishes a user behavior data sample, then performs clustering processing on the established user behavior data sample, classifies users with similar behavior data into a cluster to form a plurality of similar user groups, and is detailed as follows:

in step S101, a user behavior data sample is created.

In the embodiment of the invention, the smart television firstly acquires original behavior data of a user, then cleans, formats and organizes the original behavior data according to a pre-established data standard to form a new user behavior data sample which accords with the standard, and finally establishes a data storage label and a classification catalogue for the complete user behavior data sample which accords with the standard and leads the data storage label and the classification catalogue into a big data storage platform.

The original behavior data are disorderly, varied and specifically disordered, and some dirty data appear in the process of collecting the original behavior data, so that a data specification needs to be established in advance, and the original behavior data are structured through the data specification.

The big data storage platform is shown in fig. 2 and includes a data storage service cluster, a metadata storage service cluster and an application server cluster.

The data storage service cluster is a loosely coupled node set formed by a plurality of nodes and provides services to the outside in a coordinated mode. The data storage service cluster not only has the advantages of high performance, high availability or load balance, but also can eliminate the problems of single-point failure and performance bottleneck, has Scale-Out horizontal high expansion capability, and can realize linear expansion of capacity and performance. The high availability of data storage service clusters may improve the availability of systems and applications.

The data storage service cluster provides transparent redundant processing capability through the D _1_1, D _1_2 …, D _2_ n data storage servers shown in fig. 2, thereby achieving the goal of uninterrupted application. These servers collectively provide a unified service to clients, where each server providing a service is referred to as a Node. When one node is unavailable or cannot process the request of the client, the request can be timely transferred to another available node for processing, and the process is invisible to the client and completely transparent. The data storage service cluster is used for improving the availability of the system, and can continuously meet the requirements of customers when a single node fails.

Each data storage server stores a certain number of copies (Replication) of a data file. Each copy is a complete copy of the original data. By rack sensing, copies in the big data storage platform are stored in different racks, so that the availability of files can be effectively improved, and data loss or unavailability caused by network disconnection or machine failure and other dynamic unmeasured factors at the nodes distributed in the racks is avoided.

The copy storage enables a rack sensing function and can also play a role in improving the system performance. By reasonably selecting the storage nodes to place the copies and matching with a routing protocol, near-end access of data can be realized, access delay is reduced, and system performance is improved. In addition, the data requests can be reasonably distributed to different nodes and network paths through a copy mechanism, the load is balanced by using other nodes, the data hotspot problem can be effectively solved, and the data access flood peak can also be effectively solved. For a larger file, the node load can be further dispersed and balanced by parallel reading of a plurality of copies, the file reading efficiency is improved, and the I/O performance of the system is improved.

In step S102, behavior data of k users are selected from the user behavior data samples, and the behavior data of the k users are respectively used as respective centers of k clusters.

In the embodiment of the invention, the smart television firstly obtains the user behavior data samples from the big data storage platform, then selects the behavior data of k users from the obtained user behavior data samples, and respectively uses the behavior data of the k users as respective centers of the k clusters.

Specifically, in the embodiment of the present invention, for the selection of the behavior data of k users, an algorithm of an electronic program coordinate system based on a time axis is adopted, and k time points and program lists corresponding to the k time points are selected as the behavior data of the k users.

Selecting behavior data of k users by the following steps:

step 1, calculating the distance between the behavior data of the users in the user behavior data sample.

Specifically, the distance d between the behavior data of the user i and the user j is calculated_kWherein d is_kSatisfies the following formula:

d_k＝d(χ_i,χ_j)

wherein, χ_iHexix-_jRespectively representing the behavior data of a user i and a user j, k isAnd n is the number of users in the user behavior data sample.

And 2, calculating the average value of the distances to obtain the average value of the distance vectors of the distances among the behavior data of the user.

In particular, d_kThe distance vector average of the distances between the behavior data of the user can be obtained by averaging the distances between the two behavior dataSatisfies the following formula:

and 3, calculating the average value of the distance vector average value to obtain the distance average value.

Specifically, the distance average value is calculated by the following formula

Wherein,is the average of the distance vectors at the k-th point,is the average of the distance vectors for n points.

And 4, calculating a deviation value between the distance vector average value and the distance average value according to the distance vector average value and the distance average value.

Specifically, the deviation value δ is calculated by the following formula:

wherein λ is a correction factor.

And 5, if the deviation value meets a preset condition, calculating the behavior data of the user corresponding to the distance vector average value of the kth point, and taking the behavior data of the user corresponding to the distance vector average value of the kth point as the behavior data of the selected kth user.

In particular, if by formulaIf the calculated delta value meets the preset condition, the delta value will be calculatedAnd taking the average value of the distance vectors of the corresponding kth point as the behavior data of the kth user to be selected.

The following exemplary steps 1 to 5 are performed:

1. according to the distances from the point P to other points, respectively 10, 262 and 23 … … 17;

2. calculating the average of these distances

3. Repeating the steps 1 and 2 to calculate other points32, 22, 23 … … 96, respectively;

4. calculating the average of the result of step 3Mean value

5. And if the lambda is 1.0, and the delta is greater than 0.2, the delta meets a preset condition, the point P is a selected point by calculating the delta to be 1.0 |56-88|/88 to be 0.36, and the average value of the distance vectors of the point P is taken as the behavior data of the P-th user to be selected.

Compared with the random selection method in the prior art, the method for selecting the behavior data of the k users in the embodiment of the invention ensures that the whole clustering algorithm is not easy to fall into low efficiency consumption, and the convergence rate of the clustering result is obviously accelerated in the subsequent processing of the clustering algorithm because the behavior data of the k users are accurately determined.

In step S103, the dissimilarity degree between the behavior data of the other users in the user behavior data sample and the respective centers of the k clusters is respectively calculated, and the behavior data of the other users are respectively classified into the cluster with the lowest dissimilarity degree, so as to obtain a clustering result.

In the embodiment of the present invention, the detailed step of calculating the dissimilarity degree between the behavior data of the user and the center of each of the k clusters and classifying the behavior data of the user into the cluster with the lowest dissimilarity degree by the smart television includes:

and 11, calculating the Euclidean distance between the behavior data of the user and the center of each of the k clusters.

Specifically, as shown in fig. 3, the user behavior data sample includes behavior data of the user a, the user B, the user C, the user D, and the user E, and behavior data of 2 users selected in step S102, the behavior data of the 2 users are respectively used as respective centers of the 2 clusters, and the dissimilarity degree between the behavior data of the user a, the user B, the user C, the user D, and the user E and the respective centers of the 2 clusters can be obtained by calculating the distances between the behavior data of the user a, the user B, the user C, the user D, and the user E and the respective centers of the 2 clusters.

The euclidean distance algorithm is adopted to calculate the distances between the behavior data of the user a, the user B, the user C, the user D and the user E and the respective centers of the 2 clusters, and the formula is as follows:

wherein x1 represents the i-th coordinate of the first point, and x2 represents the i-th coordinate of the second point

n is euclidean space and each point of it can be represented as (x (1), x (2), … x (n)), where x (i) ═ 1,2 … n is a real number, referred to as the ith coordinate of x, and d (x, y) represents the euclidean distance between point x and point y ═ y (1), y (2) … y (n)).

And step 12, classifying the behavior data of the user into a cluster with the minimum Euclidean distance with the behavior data of the user.

Specifically, after the distances between the behavior data of the user a, the user B, the user C, the user D, and the user E and the respective centers of the 2 clusters are calculated, the behavior data of the users are classified into the cluster with the minimum euclidean distance therebetween. For example, as shown in fig. 3, if the distance between the behavior data of the user a and the user B and the center of the upper right cluster is small as calculated in step 11, the behavior data of the user a and the user B is classified into the upper right cluster, and the distance between the behavior data of the user C, the user D, and the user E and the center of the lower left cluster is small, the behavior data of the user C, the user D, and the user E is classified into the lower left cluster.

In step S104, the respective centers of the k clusters are recalculated based on the clustering result, and new respective centers of the k clusters are obtained.

In the embodiment of the present invention, as shown in fig. 3, according to the clustering result, the center of the cluster at the upper right corner and the new center of the cluster at the lower left corner are respectively calculated. Specifically, the new center of each cluster is obtained by calculating the arithmetic mean of all the user behavior data in each cluster.

In step S105, the dissimilarity degree between the behavior data of all users in the user behavior data sample and the new center of each of the k clusters is calculated, the behavior data of all users are classified into the cluster with the lowest dissimilarity degree, so as to obtain a clustering result, and the step S104 is returned until the clustering result is not changed any more or the number of times of execution of the step S104 reaches a preset number of times.

In the embodiment of the present invention, the execution process of steps S104 and S105 is schematically shown in fig. 3, and details are not repeated. And when the clustering result is not changed any more or the number of times of execution of the step S104 reaches a preset number of times, taking the obtained clustering result as a final behavior data classification result of the user.

In this embodiment, users with similar behavior data are classified into a cluster by clustering user behavior data samples, so as to form a similar user group. Because the users in the similar user group generally have the same preference, the videos watched by the users similar to the current user, websites browsed by the users or articles purchased by the users can be recommended to the current user, personalized services can be better provided for the users, and the use experience of the users is improved. Especially, compared with the prior art, the behavior data of k users are not randomly selected, so that the whole clustering algorithm is not easy to fall into low efficiency consumption, and the convergence rate of clustering results is obviously accelerated in the subsequent processing of the clustering algorithm due to the fact that the behavior data of k users are accurately determined.

It will be understood by those skilled in the art that all or part of the steps in the method for implementing the embodiments described above may be implemented by using a program to instruct relevant hardware, and the corresponding program may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk or optical disk.

Example two

Fig. 4 shows an implementation flow of the analysis method for user behavior data according to the second embodiment of the present invention. In the whole process, the smart television firstly establishes a user behavior data sample, then performs clustering processing on the established user behavior data sample, classifies users with similar behavior data into a cluster to form a similar user group, and finally finds out the undiscovered association relationship among the behavior data of the users in the similar user group in the same cluster to find out the invisible association network contained in the behavior data, wherein the detailed specific process is as follows:

in step S401, a user behavior data sample is created.

In step S402, behavior data of k users are selected from the user behavior data samples, and the behavior data of the k users are respectively used as respective centers of k clusters.

In step S403, the dissimilarity degree between the behavior data of the other users in the user behavior data sample and the respective centers of the k clusters is respectively calculated, and the behavior data of the other users are respectively classified into the cluster with the lowest dissimilarity degree, so as to obtain a clustering result.

In step S404, the respective centers of the k clusters are recalculated according to the clustering result, and new respective centers of the k clusters are obtained.

In step S405, the dissimilarity degree between the behavior data of all users in the user behavior data sample and the new center of each of the k clusters is calculated, the behavior data of all users are classified into the cluster with the lowest dissimilarity degree, so as to obtain a clustering result, and the step S404 is returned until the clustering result is not changed any more or the number of times of execution of the step S404 reaches a preset number of times.

In step S406, the behavior data of all users in a specified cluster in the clustering result is scanned.

In the embodiment of the invention, the smart television scans the behavior data of all users in a specified cluster in the clustering result. For example, the behavior data of the user included in the scanned designated cluster is shown in table 1:

user recording	Viewing video ID
		R1	T1,T2,T5
R2	T2,T3
		R3	T2,T4
R4	T1,T2,T4
		R5	T1,T3
R6	T2,T3
		R7	T1,T3
R8	T1,T2,T3,T5
		R9	T1,T2,T3

TABLE 1

In step S407, a frequent 1 item set to a frequent N item set are generated according to the behavior data, and the support of each frequent item set is calculated, where there is only one item set in the frequent N item set.

In the embodiment of the present invention, according to the behavior data of the users in table 1, the occurrence frequency of the corresponding behavior of the user in the designated cluster can be calculated, and then different frequent item sets and the support degrees of the frequent item sets are generated according to the occurrence frequency of each behavior. For example, for the behavior data in table 1, a frequent 1 item set, a frequent 2 item set, a frequent 3 item set, and a frequent 4 item set may be generated. Wherein, the frequent 1 item set comprises an item set, the frequent 2 item set comprises 2 item sets, and so on, and the frequent N item set comprises N item sets.

Specifically, the frequent 1 item set is generated as follows:

[T1]6

[T2]7

[T3]6

[T4]2

[T5]2

the frequent 2 item set is as follows:

[T1,T2]4

[T1,T3]4

[T1,T5]2

[T2,T3]4

[T2,T4]2

[T2,T5]2

the frequent 3 item set is as follows:

[T1,T2,T3]2

[T1,T2,T5]2

the frequent 4 items set is as follows:

[T1,T2,T3,T5]1

and if only one item set exists in the frequent k item sets, the frequent k +1 item sets are not generated any more.

In step S408, an association rule between the behavior data of the user is calculated according to the support degree of each item set in the frequent N item sets and the support degree of each item set from the frequent N-1 item set to the frequent 1 item set.

Wherein the support of each item set corresponds to the number of times each action occurs. For example, the item set [ T1] in the frequent 1 item set appears 6 times in the behavior data of the user shown in Table 1, and therefore, the support degree of the item set [ T1] is 6.

In the embodiment of the present invention, taking the frequent 3-item set [ T1, T2, T5] as an example, its non-true subset has [ T1, T2], [ T1, T5], [ T2, T5], [ T1], [ T2], [ T5], and the confidence levels calculated for [ T1, T2, T5] corresponding to [ T1, T2], [ T1, T5], [ T2, T5], [ T1], [ T2], [ T5 ]:

[T1,T2]-》[T5] 2/4＝50％

[T1,T5]-》[T2] 2/2＝100％

[T2,T5]-》[T1] 2/2＝100％

[T1]-》[T2,T5] 2/6＝33％

[T2]-》[T1,T5] 2/7＝29％

[T5]-》[T1,T2] 2/2＝100％

if the preset minimum threshold of confidence is 60%, the generated association rules are [ T1, T5] - [ T2], [ T2, T5] - [ T1], [ T5] - [ T1, T2 ].

Wherein, the two events generate the association rule, which indicates that the probability of the two events occurring at the same time is higher. For example, [ T1, T5] and [ T2] in the present embodiment generate an association rule, which indicates that when [ T1, T5] occurs, the probability of occurrence of [ T2] is high.

According to the embodiment, the undiscovered association relationship among the behavior data of the users in the similar user group in the same cluster can be found out, the invisible association network contained in the behavior data can be found out, when the user is recommended to a certain video, other videos which generate association rules with the video can be recommended to the user, and the use experience of the user can be further improved.

EXAMPLE III

Fig. 5 is a block diagram showing a specific configuration of an apparatus for analyzing user behavior data according to a third embodiment of the present invention, and only a part related to the third embodiment of the present invention is shown for convenience of description.

The apparatus may be a software unit, a hardware unit or a combination of software and hardware unit built in the smart tv, and the apparatus 5 includes: a behavior data sample establishing unit 51, a first cluster center determining unit 52, a first clustering result generating unit 53, a second cluster center determining unit 54, and a second clustering result generating unit 55.

The behavior data sample establishing unit 51 is configured to establish a user behavior data sample;

a first cluster center determining unit 52, configured to select behavior data of k users from the user behavior data samples, and use the behavior data of the k users as respective centers of the k clusters;

a first clustering result generating unit 53, configured to calculate difference degrees between behavior data of other users in the user behavior data sample and respective centers of the k clusters, and classify the behavior data of the other users into a cluster with the lowest difference degree, respectively, to obtain a clustering result;

a second cluster center determining unit 54, configured to recalculate respective centers of the k clusters according to the clustering result, to obtain respective new centers of the k clusters;

and a second clustering result generating unit 55, configured to calculate the dissimilarity degree between the behavior data of all users in the user behavior data sample and the new centers of the k clusters, and classify the behavior data of all users into the cluster with the lowest dissimilarity degree, so as to obtain a clustering result, and return to call the second cluster center determining unit until the clustering result is not changed any more or the number of times of executing step D reaches a preset number of times.

Specifically, the first cluster center determining unit 52 includes: the device comprises a distance calculation module, a distance vector average value calculation module, a distance average value calculation module, a deviation value calculation module and a cluster center determination module.

The distance calculation module is used for calculating the distance between the behavior data of the users in the user behavior data sample;

Specifically, the first clustering result generating unit 53 and the second clustering result generating unit 55 each include:

The analysis apparatus for user behavior data provided in the embodiment of the present invention can be applied to the first corresponding method embodiment, and for details, reference is made to the description of the first embodiment, and details are not repeated here.

Example four

Fig. 6 is a block diagram showing a specific configuration of an apparatus for analyzing user behavior data according to a fourth embodiment of the present invention, and only a part related to the fourth embodiment of the present invention is shown for convenience of description. The apparatus may be a software unit, a hardware unit or a combination of software and hardware units built in the smart television, the apparatus 6 includes the behavior data sample establishing unit 51, the first cluster center determining unit 52, the first clustering result generating unit 53, the second cluster center determining unit 54 and the second clustering result generating unit 55 described in the third embodiment, and further includes:

the behavior data scanning unit 61 is configured to scan behavior data of all users in a specified cluster in the clustering result;

a frequent item set and support degree generating unit 62, configured to generate a frequent 1 item set to a frequent N item set according to the behavior data, and calculate a support degree of each item set in the frequent item set, where there is only one item set in the frequent N item set;

and the association rule generating unit 63 is configured to calculate an association rule between the behavior data of the user according to the support degree of each item set in the frequent N item sets and the support degree of each item set from the frequent N-1 item set to the frequent 1 item set.

The analysis apparatus for user behavior data provided in the embodiment of the present invention can be applied to the second corresponding method embodiment, and for details, reference is made to the description of the second embodiment, and details are not repeated here.

It should be noted that, in the above system embodiment, each included unit is only divided according to functional logic, but is not limited to the above division as long as the corresponding function can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A method for analyzing user behavior data, the method comprising:

step A, establishing a user behavior data sample;

step E, respectively calculating the dissimilarity degree of the behavior data of all the users in the user behavior data sample and the new centers of the k clusters, classifying the behavior data of all the users to the cluster with the lowest dissimilarity degree respectively to obtain a clustering result, and returning to the step D until the clustering result is not changed or the execution frequency of the step D reaches a preset frequency;

after the step E, the method further comprises the following steps:

and calculating to obtain an association rule between the behavior data of the user according to the support degree of the item set in the frequent N item set and the support degree of the item set from the frequent N-1 item set to the frequent 1 item set.

2. The method of claim 1, wherein step B comprises:

3. The method of claim 1, wherein calculating a degree of dissimilarity of the user's behavioral data to respective centers of the k clusters, and grouping the user's behavioral data into the lowest degree of dissimilarity cluster comprises:

4. The method of claim 2, wherein if the deviation value satisfies a preset condition, the method calculates the behavior data of the user corresponding to the average value of the distance vectors at the kth point, and the specific example of taking the behavior data of the user corresponding to the average value of the distance vectors at the kth point as the behavior data of the selected kth user is:

wherein,is the average of the distance vectors at the k-th point,is the distance average, λ is the correction factor, δ is the distance vector average and distance averageDeviation values between the mean values.

5. An apparatus for analyzing user behavior data, comprising:

a second clustering result generating unit, configured to calculate dissimilarity degrees of behavior data of all users in the user behavior data sample and respective new centers of the k clusters, and classify the behavior data of all users into a cluster with the lowest dissimilarity degree, respectively, to obtain a clustering result, and return to call the second cluster center determining unit until the clustering result is no longer changed or the number of times of execution of step D reaches a preset number of times;

the device further comprises:

and the association rule generating unit is used for calculating and obtaining the association rule between the behavior data of the user according to the support degree of the item set in the frequent N item set and the support degree of the item set from the frequent N-1 item set to the frequent 1 item set.

6. The apparatus of claim 5, wherein the first cluster center determining unit comprises:

7. The apparatus of claim 5, wherein the first clustering result generating unit and the second clustering result generating unit each comprise:

8. An intelligent television, characterized in that the intelligent television comprises analysis means of user behavior data according to any one of claims 5 to 7.