CN105320702A

CN105320702A - Analysis method and device for user behavior data and smart television

Info

Publication number: CN105320702A
Application number: CN201410380588.8A
Authority: CN
Inventors: 李明烈
Original assignee: TCL Corp
Current assignee: TCL Corp
Priority date: 2014-08-04
Filing date: 2014-08-04
Publication date: 2016-02-10
Anticipated expiration: 2034-08-04
Also published as: CN105320702B

Abstract

The invention is applicable to the technical field of data processing and provides an analysis method and device for user behavior data and a smart television. The method comprises the steps of firstly establishing user behavior data samples, then performing clustering treatment to the established user behavior data samples, and dividing users with similar behavior data into a cluster to form a similar user group. By performing the clustering processing to the user behavior data samples and dividing users with similar behavior data into a cluster to form a similar user group, since the users in the similar user group generally have the same preference, videos which were ever viewed by the users similar to a current user, websites which were ever browsed or objects which were ever brought can be recommended to the current user, personalized service can be better provided for the users and the user experience is improved.

Description

A kind of analytical approach of user behavior data, device and intelligent television

Technical field

The invention belongs to technical field of data processing, particularly relate to a kind of analytical approach of user behavior data, device and intelligent television.

Background technology

At present, intelligent television share in the market rises year by year, user's viewing and use intelligent television to be tending towards personalized and diversification, is also that a hundred flowers blossom based on the application of intelligent television and instrument.

But the application of existing intelligent television and instrument but can not accurately, in time, efficiently be analyzed the behavioral data of user, to understand the usage behavior of user, and then obtain the similarity between the user in customer group.

Summary of the invention

Embodiments provide a kind of analytical approach of user behavior data, device and intelligent television, be intended to the intelligent television that solution prior art provides, the problem of the similarity between the user in customer group can not be obtained according to the behavioral data of user.

On the one hand, provide a kind of analytical approach of user behavior data, described method comprises:

Steps A, set up user behavior data sample;

Step B, from described user behavior data sample, choose the behavioral data of k user, the respective center using the behavioral data of a described k user as k bunch;

The distinctiveness ratio at the respective center of step C, the behavioral data calculating all the other users in described user behavior data sample respectively and described k bunch, and the behavioral data of all the other users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result;

Step D, according to cluster result, recalculate the respective center of k bunch, obtain the respective new center of k bunch;

The distinctiveness ratio at the respective new center that step e, the behavioral data calculating all users in described user behavior data sample respectively and described k are individual bunch, and the behavioral data of all users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result, return step D, until the number of times that cluster result no longer changes or step D performs reaches default number of times.

Further, described step B comprises:

Distance between the behavioral data calculating the user in described user behavior data sample;

Calculate the mean value of described distance, the distance vector mean value of the distance between the behavioral data obtaining user, described distance vector mean value is the distance vector mean value of a kth point;

Calculate the mean value of described distance vector mean value, obtain distance average;

The deviate between described distance vector mean value and described distance average is calculated according to described distance vector mean value and described distance average;

If described deviate meets default condition, then calculate the behavioral data of the user corresponding to distance vector mean value of a described kth point, using the behavioral data of the behavioral data of the user corresponding to the distance vector mean value of a described kth point as a selected kth user.

Further, calculate the distinctiveness ratio at the respective center of individual bunch of behavioral data and the k of user, and the behavioral data of user incorporated into minimum bunch the comprising of distinctiveness ratio:

Calculate the Euclidean distance at the behavioral data of user and the respective center of described k bunch;

The behavioral data of user is incorporated into Euclidean distance between the behavioral data to user minimum bunch.

Further, after described step e, also comprise:

One in scanning cluster result to specify bunch in the behavioral data of all users;

According to described behavioral data, generate frequent 1 collection to frequent N item collection, and calculate the support of each collection in frequent item set, wherein, frequent N item is concentrated only has an item collection;

The support of each the collection concentrated according to frequent N item and frequent N-1 item collection to the support of frequent 1 concentrated each collection, the correlation rule between the behavioral data calculating user.

Further, if described deviate meets default condition, then calculate the behavioral data of the user corresponding to distance vector mean value of a described kth point, the behavioral data of the behavioral data of the user corresponding to the distance vector mean value of a described kth point as a selected kth user be specially:

If pass through formula the δ value calculated meets the condition preset, then will the distance vector mean value of corresponding kth point is as the behavioral data of a kth user that will choose;

Wherein, for the distance vector mean value of a kth point, for distance average, λ is modifying factor, and δ is the deviate between distance vector mean value and distance average.

On the other hand, provide a kind of analytical equipment of user behavior data, described device comprises:

Behavioral data Sample Establishing unit, for setting up user behavior data sample;

First bunch of center determining unit, for choosing the behavioral data of k user from described user behavior data sample, the respective center using the behavioral data of a described k user as k bunch;

First cluster result generation unit, for calculating the distinctiveness ratio at the respective center of the behavioral data of all the other users in described user behavior data sample and described k bunch respectively, and the behavioral data of all the other users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result;

Second bunch of center determining unit, for according to cluster result, recalculates the respective center of k bunch, obtains the respective new center of k bunch;

Second cluster result generation unit, for calculating the distinctiveness ratio at the respective new center of the behavioral data of all users in described user behavior data sample and described k bunch respectively, and the behavioral data of all users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result, return and call second bunch of center determining unit, until the number of times that cluster result no longer changes or step D performs reaches default number of times.

Further, described first bunch of center determining unit comprises:

Distance calculation module, for calculate the user in described user behavior data sample behavioral data between distance;

Distance vector mean value calculation module, for calculating the mean value of described distance, the distance vector mean value of the distance between the behavioral data obtaining user, described distance vector mean value is the distance vector mean value of a kth point;

Distance average computing module, for calculating the mean value of described distance vector mean value, obtains distance average;

Deviate computing module, for calculating the deviate between described distance vector mean value and described distance average according to described distance vector mean value and described distance average;

Bunch center determination module, if meet default condition for described deviate, then calculate the behavioral data of the user corresponding to distance vector mean value of a described kth point, using the behavioral data of the behavioral data of the user corresponding to the distance vector mean value of a described kth point as a selected kth user.

Further, described first cluster result generation unit and

Described second cluster result generation unit includes:

Euclidean distance computing module, for calculating the Euclidean distance at the respective center of the behavioral data of user and described k bunch;

Users classification module, for the behavioral data of user is incorporated into Euclidean distance between the behavioral data to user minimum bunch.

Further, described device also comprises:

Behavioral data scanning element, for scan in cluster result one to specify bunch in the behavioral data of all users;

Frequent item set and support generation unit, for according to described behavioral data, generate frequent 1 collection to frequent N item collection, and calculate the support of each collection in frequent item set, wherein, frequent N item is concentrated only has an item collection;

Correlation rule generation unit, for the support of concentrate according to frequent N item each collection and frequent N-1 item collection to the support of frequent 1 concentrated each collection, the correlation rule between the behavioral data calculating user.

Again on the one hand, provide a kind of intelligent television, described intelligent television comprises the analytical equipment of user behavior data as above.

In the embodiment of the present invention, by carrying out clustering processing to user behavior data sample, user more similar for behavioral data being incorporated into in one bunch, forming a similar users group.Because the user in similar users group generally has identical preference, therefore, the video user similar to active user once can seen, once browsed website or the article once bought recommend active user, better for user provides personalized service, promote the experience of user.

Accompanying drawing explanation

Fig. 1 is the realization flow figure of the analytical approach of the user behavior data that the embodiment of the present invention one provides;

Fig. 2 is the structural representation of the large data storing platform that the embodiment of the present invention one provides;

Fig. 3 is the cluster process schematic diagram of the user behavior data that the embodiment of the present invention one provides;

Fig. 4 is the realization flow figure of the analytical approach of the user behavior data that the embodiment of the present invention two provides;

Fig. 5 is the concrete structure block diagram of the analytical equipment of the user behavior data that the embodiment of the present invention three provides;

Fig. 6 is the structured flowchart of the analytical equipment of the user behavior data that the embodiment of the present invention four provides.

Embodiment

In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.

In embodiments of the present invention, first set up user behavior data sample, then clustering processing is carried out to the user behavior data sample set up, user more similar for behavioral data is incorporated into in one bunch, form a similar users group.

Below in conjunction with specific embodiment, realization of the present invention is described in detail:

Embodiment one

Fig. 1 shows the realization flow of the analytical approach of the user behavior data that the embodiment of the present invention one provides.In whole flow process, intelligent television first sets up user behavior data sample, then carries out clustering processing to the user behavior data sample set up, and incorporate user more similar for behavioral data in one bunch into, form multiple similar users group, details are as follows:

In step S101, set up user behavior data sample.

In embodiments of the present invention, first intelligent television obtains the primitive behavior data of user, then described primitive behavior data are arranged according to the data standard cleaning of setting up in advance, format, form the new user behavior data sample meeting specification, be finally these user behavior data Sample Establishing data storage tag that is complete, that meet specification and split catalog, and be directed in large data storing platform.

Wherein, because primitive behavior data appear in a jumble, vary, specifically disordering is presented, have " dirty data " in the process of simultaneously primitive behavior Data Collection to occur, therefore, need to set up data standard in advance, carry out regular by this data standard to primitive behavior data.

Large data storing platform as shown in Figure 2, comprises data storage service cluster and metadata store service cluster and application server cluster.

Data storage service cluster is the loosely-coupled node set of one be made up of multiple node, works in coordination with and externally provides service.Data storage service cluster not only has the advantage of high-performance, High Availabitity or load balancing, can also eliminate Single Point of Faliure and performance bottleneck problem, and has Scale-Out laterally high extended capability, can realize capacity and performance linear expansion.The high availability of data storage service cluster can improve the availability of system and application.

Data storage service cluster is by D_1_1, the D_1_2 shown in Fig. 2 ..., D_2_n data storage server provides transparent redundant processing capabilities, thus realizes the target of uninterrupted application.These servers are jointly for client is provided uniform services, and wherein each provides the server of service to be called node (Node).When node is unavailable or can not process the request of client, this request can forward other enabled node to process in time, and this process is sightless for client, completely transparent.Data storage service cluster is the availability in order to improve system, when individual node breaks down, can continue the demand meeting client.

The data file stored in every platform data storage server has the copy (Replication) of some.Each copy is the copy completely to raw data.By frame perception, copy in large data storing platform is stored in different frame, effectively can improve the availability of file, avoid maybe can not obtaining due to the loss of data that network disconnects or mechanical disorder etc. dynamically can not be surveyed factor and cause at the node of frame distribution.

Frame perceptional function is enabled in copy storage, can also play the effect improving system performance.By reasonably selecting memory node to place copy, and coordinating Routing Protocol, the access of data near-end can be realized, reduce access delay, improving system performance.In addition, request of data reasonably can be distributed in different nodes and network path by copy mechanism, utilize other node balance load, can effectively solve data hot issue, data access flood peak also can effectively solve.For larger file, can also by the parallel reading to multiple copy, dispersion and balance node load, improve the efficiency that file reads, improve the I/O performance of system further.

In step s 102, from described user behavior data sample, choose the behavioral data of k user, and the respective center using the behavioral data of a described k user as k bunch.

In embodiments of the present invention, intelligent television first obtains user behavior data sample from large data storing platform, from the user behavior data sample obtained, choose the behavioral data of k user again, and using the behavioral data of a described k user as k the respective center of individual bunch.

Concrete, in the embodiment of the present invention, for the choosing of behavioral data of k user, have employed the algorithm of the electronic programming coordinate system based on time shaft, choose k time point and the programme of this k the time point correspondence behavioral data as k user.

The behavioral data of k user is chosen by following steps:

Step 1, calculate the user in described user behavior data sample behavioral data between distance.

Concrete, calculate the distance d between user i and the behavioral data of user j _k, wherein, d _kmeet following formula:

d _k＝d(χ _i,χ _j)

Wherein, χ _iand χ _jrepresent the behavioral data of user i and user j respectively, k is more than or equal to the natural number that 1 is less than or equal to n, and n is the quantity of the user in user behavior data sample.

Step 2, calculate the mean value of described distance, the distance vector mean value of the distance between the behavioral data obtaining user.

Concrete, d _kbe the distance between two behavioral datas, these distances are averaged, the distance vector mean value of the distance between the behavioral data that can obtain user meet following formula:

\overset{&OverBar;}{d_{k}} = \frac{Σ_{k = 1}^{n} d_{k}}{n} .

Step 3, calculate the mean value of described distance vector mean value, obtain distance average.

Concrete, obtain distance average by following formulae discovery

\overset{&OverBar;}{D} = \frac{Σ_{k = 1}^{n} \overset{&OverBar;}{d_{k}}}{n}

Wherein, for the distance vector mean value of a kth point, for the mean value of the distance vector mean value of n point.

Step 4, calculate the deviate between described distance vector mean value and described distance average according to described distance vector mean value and described distance average.

Concrete, obtain deviate δ by following formulae discovery:

δ = λ (| \overset{&OverBar;}{d_{k}} - \overset{&OverBar;}{D} | / \overset{&OverBar;}{D})

Wherein, λ is modifying factor.

If the described deviate of step 5 meets default condition, then calculate the behavioral data of the user corresponding to distance vector mean value of described kth point, using the behavioral data of the behavioral data of the user corresponding to the distance vector mean value of a described kth point as a selected kth user.

Concrete, if pass through formula the δ value calculated meets the condition preset, then will the distance vector mean value of corresponding kth point is as the behavioral data of a kth user that will choose.

Illustrate that step 1 is to the implementation of step 5 below:

1,10 are respectively, 262,23 according to P point to the distance of other each point ... 17;

2, the mean value of these distances is calculated

3, repeat 1,2 steps, calculate other each point be respectively 32,22,23 ... 96;

4, the mean value of the 3rd step result is calculated

5, establish λ=1.0, if when δ is greater than 0.2, δ meets default condition, and calculating δ=1.0*|56-88|/88=0.36 then P point is the point of selection, using the behavioral data of the distance vector mean value of P point as P the user that will choose.

The method choosing the behavioral data of k user in the embodiment of the present invention, randomness choosing method compared to existing technology, whole clustering algorithm is made to be not easy to be absorbed in poor efficiency consumption, owing to determining the behavioral data of k user very accurately, in the subsequent treatment of clustering algorithm, the speed of convergence of cluster result is obviously accelerated.

In step s 103, calculate the distinctiveness ratio at the respective center of the behavioral data of all the other users in described user behavior data sample and described k bunch respectively, and the behavioral data of all the other users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result.

In embodiments of the present invention, intelligent television calculates the distinctiveness ratio at the respective center of individual bunch of behavioral data and the k of user, and the behavioral data of user is incorporated into distinctiveness ratio minimum bunch detailed step comprise:

The Euclidean distance at the respective center that step 11, the behavioral data calculating user and described k are individual bunch.

Concrete, as shown in Figure 3, user behavior data sample comprises user A, user B, user C, user D, the behavioral data of user E and the behavioral data of 2 users chosen by step S102, respective center using the behavioral data of these 2 users as 2 bunches, can by calculating user A, user B, user C, user D, the behavioral data of user E and the distance at this respective center of 2 bunches, user A is obtained by calculating this distance, user B, user C, user D, the behavioral data of user E and the distinctiveness ratio at this respective center of 2 bunches.

Wherein, adopt Euclidean distance algorithm to calculate user A, user B, user C, user D, the behavioral data of user E and the distance at this respective center of 2 bunches, formula is as follows:

d (x, y) = \sqrt{{(x_{1} - y_{1})}^{2} + {(x_{2} - y_{2})}^{2} + . . . + {(x_{n} - y_{n})}^{2}} = \sqrt{Σ_{i = 1}^{n} {(x_{i} - y_{i})}^{2}}

Wherein, x1 represents the i-th dimension coordinate of first point, and x2 represents the i-th dimension coordinate of second point

N is Euclidean space is a point set, its each point can be expressed as (x (1), x (2), x (n)), wherein x (i) (i=1,2 ... n) be real number, be called i-th coordinate of x, d (x, y) represents some x and some y=(y (1), y (2) ... y (n)) between Euclidean distance.

Step 12, the Euclidean distance behavioral data of user being incorporated into user between behavioral data minimum bunch.

Concrete, after calculating user A, user B, user C, user D, the behavioral data of user E and the distance at this respective center of 2 bunches, the behavioral data of these users is incorporated into the Euclidean distance between it minimum bunch in.Such as, as shown in Figure 3, the distance being calculated the center in user A, the behavioral data of user B and the upper right corner bunch by step 11 is little, then the behavioral data of user A, user B is incorporated into the upper right corner bunch in, the distance at the center in the behavioral data of user C, user D, user E and the lower left corner bunch is little, then the behavioral data of user C, user D, user E is incorporated into the lower left corner bunch in.

In step S104, according to cluster result, recalculate the respective center of k bunch, obtain the respective new center of k bunch.

In embodiments of the present invention, as shown in Figure 3, according to cluster result, respectively calculate the upper right corner bunch center and the lower left corner bunch new center.Arithmetic mean especially by all user behavior datas calculated in each bunch obtains the new center of each bunch.

In step S105, calculate the distinctiveness ratio at the behavioral data of all users in described user behavior data sample and the respective new center of described k bunch respectively, and the behavioral data of all users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result, return step S104, until the number of times that cluster result no longer changes or step S104 performs reaches default number of times.

In embodiments of the present invention, the implementation signal of step S104 and S105 as shown in Figure 3, specifically repeats no more.When the number of times that cluster result no longer changes or step S104 performs reaches default number of times, using the cluster result that the obtains behavioral data classification results as final user.

The present embodiment, by carrying out clustering processing to user behavior data sample, incorporates into user more similar for behavioral data in one bunch, forms a similar users group.Because the user in similar users group generally has identical preference, therefore, the video user similar to active user once can seen, once browsed website or the article once bought recommend active user, better for user provides personalized service, promote the experience of user.Especially, the behavioral data of k user, compared to existing technology, not that randomness is chosen, make whole clustering algorithm be not easy to be absorbed in poor efficiency consumption, owing to determining the behavioral data of k user very accurately, in the subsequent treatment of clustering algorithm, the speed of convergence of cluster result is obviously accelerated.

One of ordinary skill in the art will appreciate that all or part of step realized in the various embodiments described above method is that the hardware that can carry out instruction relevant by program has come, corresponding program can be stored in a computer read/write memory medium, described storage medium, as ROM/RAM, disk or CD etc.

Embodiment two

Fig. 4 shows the realization flow of the analytical approach of the user behavior data that the embodiment of the present invention two provides.In whole flow process, intelligent television first sets up user behavior data sample, again clustering processing is carried out to the user behavior data sample set up, user more similar for behavioral data is incorporated into in one bunch, form a similar users group, finally, the undiscovered incidence relation between the behavioral data of the user in the similar users group be in same cluster is found out, find out the related network of the stealth comprised in behavioral data, details are as follows for detailed process:

In step S401, set up user behavior data sample.

In step S402, from described user behavior data sample, choose the behavioral data of k user, and the respective center using the behavioral data of a described k user as k bunch.

In step S403, calculate the distinctiveness ratio at the respective center of the behavioral data of all the other users in described user behavior data sample and described k bunch respectively, and the behavioral data of all the other users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result.

In step s 404, according to cluster result, recalculate the respective center of k bunch, obtain the respective new center of k bunch.

In step S405, calculate the distinctiveness ratio at the behavioral data of all users in described user behavior data sample and the respective new center of described k bunch respectively, and the behavioral data of all users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result, return step S404, until the number of times that cluster result no longer changes or step S404 performs reaches default number of times.

In step S406, one in scanning cluster result to specify bunch in the behavioral data of all users.

In embodiments of the present invention, one in intelligent television scanning cluster result to specify bunch in the behavioral data of all users.The behavioral data such as scanning the user that the appointment bunch that obtains comprises is as shown in table 1:

User record	Viewing video ID
		R1	T1,T2,T5
R2	T2,T3
		R3	T2,T4
R4	T1,T2,T4
		R5	T1,T3
R6	T2,T3
		R7	T1,T3
R8	T1,T2,T3,T5
		R9	T1,T2,T3

Table 1

In step S 407, according to described behavioral data, generate frequent 1 collection to frequent N item collection, and calculate the support of each frequent item set, wherein, frequent N item is concentrated only has an item collection.

In embodiments of the present invention, according to the behavioral data of user in table 1, the frequency of respective behavior of the user in specifying bunch can be calculated, then according to the number of times that each behavior occurs, generate different frequent item sets and the support of each frequent item set.Such as, the behavioral data in his-and-hers watches 1, can generate frequent 1 collection, frequent 2 collection, frequent 3 collection and frequent 4 collection.Wherein, frequent 1 concentrate comprise an item collection, frequent 2 concentrate comprise 2 item collection, the like, frequent N item concentrate comprises N number of collection.

Concrete, frequent 1 collection of generation is as follows:

[T1]6

[T2]7

[T3]6

[T4]2

[T5]2

Frequent 2 collection are as follows:

[T1,T2]4

[T1,T3]4

[T1,T5]2

[T2,T3]4

[T2,T4]2

[T2,T5]2

Frequent 3 collection are as follows:

[T1,T2,T3]2

[T1,T2,T5]2

Frequent 4 collection are as follows:

[T1,T2,T3,T5]1

Wherein, if frequent k item is concentrated when only having an item collection, then not regeneration frequent k+1 item collection.

In step S408, the support of each the collection concentrated according to frequent N item and frequent N-1 item collection to the support of frequent 1 concentrated each collection, the correlation rule between the behavioral data calculating user.

Wherein, the number of times of the corresponding each behavior generation of the support of each collection.Such as, occur 6 times in the behavioral data of frequent 1 user of concentrated item collection [T1] shown in table 1, so the support of item collection [T1] is 6.

In embodiments of the present invention, with frequent 3 collection [T1, T2, T5] for example, its nonvoid proper subset has [T1, T2], [T1, T5], [T2, T5], [T1], [T2], [T5], calculates [T1, T2, T5] corresponding [T1, T2], [T1, T5], [T2, T5], [T1], [T2], the degree of confidence of [T5]:

[T1,T2]-》[T5]2/4＝50％

[T1,T5]-》[T2]2/2＝100％

[T2,T5]-》[T1]2/2＝100％

[T1]-》[T2,T5]2/6＝33％

[T2]-》[T1,T5]2/7＝29％

[T5]-》[T1,T2]2/2＝100％

If the minimum threshold of the degree of confidence preset is 60%, then the correlation rule produced has [T1, T5]-" [T2], [T2, T5]-" [T1], [T5]-" [T1, T2].

Wherein, two kinds of events produce correlation rule, represent that these two kinds of simultaneous likelihood ratios are higher.Such as, [T1, T5] and [T2] in the present embodiment produces correlation rule, represents, when [T1, T5] occurs, occurs that the probability of [T2] is very high.

The present embodiment, undiscovered incidence relation between the behavioral data of the user in the similar users group be in same cluster can be found out, find out the related network of the stealth comprised in behavioral data, when determining that recommending user recommends a certain video, other video recommendations of correlation rule can be produced to user by with this video, the experience of user can be promoted further.

Embodiment three

Fig. 5 shows the concrete structure block diagram of the analytical equipment of the user behavior data that the embodiment of the present invention three provides, and for convenience of explanation, illustrate only the part relevant to the embodiment of the present invention.

This device can be the unit of the software unit be built in intelligent television, hardware cell or software and hardware combining, and this device 5 comprises: behavioral data Sample Establishing unit 51, first bunch of center determining unit 52, first cluster result generation unit 53, second bunch of center determining unit 54 and the second cluster result generation unit 55.

Wherein, behavioral data Sample Establishing unit 51, for setting up user behavior data sample;

First bunch of center determining unit 52, for choosing the behavioral data of k user from described user behavior data sample, the respective center using the behavioral data of a described k user as k bunch;

First cluster result generation unit 53, for calculating the distinctiveness ratio at the respective center of the behavioral data of all the other users in described user behavior data sample and described k bunch respectively, and the behavioral data of all the other users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result;

Second bunch of center determining unit 54, for according to cluster result, recalculates the respective center of k bunch, obtains the respective new center of k bunch;

Second cluster result generation unit 55, for calculating the distinctiveness ratio at the respective new center of the behavioral data of all users in described user behavior data sample and described k bunch respectively, and the behavioral data of all users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result, return and call second bunch of center determining unit, until the number of times that cluster result no longer changes or step D performs reaches default number of times.

Concrete, described first bunch of center determining unit 52 comprises: distance calculation module, distance vector mean value calculation module, distance average computing module, deviate computing module and bunch center determination module.

Wherein, distance calculation module, for calculate the user in described user behavior data sample behavioral data between distance;

Concrete, described in described first cluster result generation unit 53 and person, the second cluster result generation unit 55 includes:

The analytical equipment of the user behavior data that the embodiment of the present invention provides can be applied in the embodiment of the method one of aforementioned correspondence, and details, see the description of above-described embodiment one, do not repeat them here.

Embodiment four

Fig. 6 shows the concrete structure block diagram of the analytical equipment of the user behavior data that the embodiment of the present invention four provides, and for convenience of explanation, illustrate only the part relevant to the embodiment of the present invention.This device can be the unit of the software unit be built in intelligent television, hardware cell or software and hardware combining, this device 6 comprises 51, first bunch of center determining unit 52, the first cluster result generation unit of the behavioral data Sample Establishing unit described in embodiment three, 53, second bunch of center determining unit 54 and the second cluster result generation unit 55, also comprises:

Wherein, behavioral data scanning element 61, for scan in cluster result one to specify bunch in the behavioral data of all users;

Frequent item set and support generation unit 62, for according to described behavioral data, generate frequent 1 collection to frequent N item collection, and calculate the support of each collection in frequent item set, wherein, frequent N item is concentrated only has an item collection;

Correlation rule generation unit 63, for the support of concentrate according to frequent N item each collection and frequent N-1 item collection to the support of frequent 1 concentrated each collection, the correlation rule between the behavioral data calculating user.

The analytical equipment of the user behavior data that the embodiment of the present invention provides can be applied in the embodiment of the method two of aforementioned correspondence, and details, see the description of above-described embodiment two, do not repeat them here.

It should be noted that in said system embodiment, included unit is carry out dividing according to function logic, but is not limited to above-mentioned division, as long as can realize corresponding function; In addition, the concrete title of each functional unit, also just for the ease of mutual differentiation, is not limited to protection scope of the present invention.

The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims

1. an analytical approach for user behavior data, is characterized in that, described method comprises:

Steps A, set up user behavior data sample;

2. the method for claim 1, is characterized in that, described step B comprises:

3. the method for claim 1, is characterized in that, calculates the distinctiveness ratio at the respective center of individual bunch of behavioral data and the k of user, and is incorporated into by the behavioral data of user to minimum bunch the comprising of distinctiveness ratio:

4. the method as described in claim 1 or 2 or 3, is characterized in that, after described step e, also comprise:

The support of the item collection concentrated according to frequent N item and frequent N-1 item collection to the support of frequent 1 concentrated item collection, the correlation rule between the behavioral data calculating user.

5. method as claimed in claim 2, it is characterized in that, if described deviate meets default condition, then calculate the behavioral data of the user corresponding to distance vector mean value of a described kth point, the behavioral data of the behavioral data of the user corresponding to the distance vector mean value of a described kth point as a selected kth user be specially:

6. an analytical equipment for user behavior data, is characterized in that, described in comprise:

7. device as claimed in claim 6, it is characterized in that, described first bunch of center determining unit comprises:

8. device as claimed in claim 6, it is characterized in that, described first cluster result generation unit and described second cluster result generation unit include:

9. the device as described in claim 6 or 7 or 8, it is characterized in that, described device also comprises:

Correlation rule generation unit, for the support of item collection concentrated according to frequent N item and frequent N-1 item collection to the support of frequent 1 concentrated item collection, the correlation rule between the behavioral data calculating user.

10. an intelligent television, is characterized in that, described intelligent television comprises the analytical equipment of the user behavior data as described in any one of claim 6 to 9.