CN105320702A - Analysis method and device for user behavior data and smart television - Google Patents

Analysis method and device for user behavior data and smart television Download PDF

Info

Publication number
CN105320702A
CN105320702A CN201410380588.8A CN201410380588A CN105320702A CN 105320702 A CN105320702 A CN 105320702A CN 201410380588 A CN201410380588 A CN 201410380588A CN 105320702 A CN105320702 A CN 105320702A
Authority
CN
China
Prior art keywords
user
behavioral data
bunch
mean value
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410380588.8A
Other languages
Chinese (zh)
Other versions
CN105320702B (en
Inventor
李明烈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TCL Corp
Original Assignee
TCL Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TCL Corp filed Critical TCL Corp
Priority to CN201410380588.8A priority Critical patent/CN105320702B/en
Publication of CN105320702A publication Critical patent/CN105320702A/en
Application granted granted Critical
Publication of CN105320702B publication Critical patent/CN105320702B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention is applicable to the technical field of data processing and provides an analysis method and device for user behavior data and a smart television. The method comprises the steps of firstly establishing user behavior data samples, then performing clustering treatment to the established user behavior data samples, and dividing users with similar behavior data into a cluster to form a similar user group. By performing the clustering processing to the user behavior data samples and dividing users with similar behavior data into a cluster to form a similar user group, since the users in the similar user group generally have the same preference, videos which were ever viewed by the users similar to a current user, websites which were ever browsed or objects which were ever brought can be recommended to the current user, personalized service can be better provided for the users and the user experience is improved.

Description

A kind of analytical approach of user behavior data, device and intelligent television
Technical field
The invention belongs to technical field of data processing, particularly relate to a kind of analytical approach of user behavior data, device and intelligent television.
Background technology
At present, intelligent television share in the market rises year by year, user's viewing and use intelligent television to be tending towards personalized and diversification, is also that a hundred flowers blossom based on the application of intelligent television and instrument.
But the application of existing intelligent television and instrument but can not accurately, in time, efficiently be analyzed the behavioral data of user, to understand the usage behavior of user, and then obtain the similarity between the user in customer group.
Summary of the invention
Embodiments provide a kind of analytical approach of user behavior data, device and intelligent television, be intended to the intelligent television that solution prior art provides, the problem of the similarity between the user in customer group can not be obtained according to the behavioral data of user.
On the one hand, provide a kind of analytical approach of user behavior data, described method comprises:
Steps A, set up user behavior data sample;
Step B, from described user behavior data sample, choose the behavioral data of k user, the respective center using the behavioral data of a described k user as k bunch;
The distinctiveness ratio at the respective center of step C, the behavioral data calculating all the other users in described user behavior data sample respectively and described k bunch, and the behavioral data of all the other users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result;
Step D, according to cluster result, recalculate the respective center of k bunch, obtain the respective new center of k bunch;
The distinctiveness ratio at the respective new center that step e, the behavioral data calculating all users in described user behavior data sample respectively and described k are individual bunch, and the behavioral data of all users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result, return step D, until the number of times that cluster result no longer changes or step D performs reaches default number of times.
Further, described step B comprises:
Distance between the behavioral data calculating the user in described user behavior data sample;
Calculate the mean value of described distance, the distance vector mean value of the distance between the behavioral data obtaining user, described distance vector mean value is the distance vector mean value of a kth point;
Calculate the mean value of described distance vector mean value, obtain distance average;
The deviate between described distance vector mean value and described distance average is calculated according to described distance vector mean value and described distance average;
If described deviate meets default condition, then calculate the behavioral data of the user corresponding to distance vector mean value of a described kth point, using the behavioral data of the behavioral data of the user corresponding to the distance vector mean value of a described kth point as a selected kth user.
Further, calculate the distinctiveness ratio at the respective center of individual bunch of behavioral data and the k of user, and the behavioral data of user incorporated into minimum bunch the comprising of distinctiveness ratio:
Calculate the Euclidean distance at the behavioral data of user and the respective center of described k bunch;
The behavioral data of user is incorporated into Euclidean distance between the behavioral data to user minimum bunch.
Further, after described step e, also comprise:
One in scanning cluster result to specify bunch in the behavioral data of all users;
According to described behavioral data, generate frequent 1 collection to frequent N item collection, and calculate the support of each collection in frequent item set, wherein, frequent N item is concentrated only has an item collection;
The support of each the collection concentrated according to frequent N item and frequent N-1 item collection to the support of frequent 1 concentrated each collection, the correlation rule between the behavioral data calculating user.
Further, if described deviate meets default condition, then calculate the behavioral data of the user corresponding to distance vector mean value of a described kth point, the behavioral data of the behavioral data of the user corresponding to the distance vector mean value of a described kth point as a selected kth user be specially:
If pass through formula the δ value calculated meets the condition preset, then will the distance vector mean value of corresponding kth point is as the behavioral data of a kth user that will choose;
Wherein, for the distance vector mean value of a kth point, for distance average, λ is modifying factor, and δ is the deviate between distance vector mean value and distance average.
On the other hand, provide a kind of analytical equipment of user behavior data, described device comprises:
Behavioral data Sample Establishing unit, for setting up user behavior data sample;
First bunch of center determining unit, for choosing the behavioral data of k user from described user behavior data sample, the respective center using the behavioral data of a described k user as k bunch;
First cluster result generation unit, for calculating the distinctiveness ratio at the respective center of the behavioral data of all the other users in described user behavior data sample and described k bunch respectively, and the behavioral data of all the other users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result;
Second bunch of center determining unit, for according to cluster result, recalculates the respective center of k bunch, obtains the respective new center of k bunch;
Second cluster result generation unit, for calculating the distinctiveness ratio at the respective new center of the behavioral data of all users in described user behavior data sample and described k bunch respectively, and the behavioral data of all users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result, return and call second bunch of center determining unit, until the number of times that cluster result no longer changes or step D performs reaches default number of times.
Further, described first bunch of center determining unit comprises:
Distance calculation module, for calculate the user in described user behavior data sample behavioral data between distance;
Distance vector mean value calculation module, for calculating the mean value of described distance, the distance vector mean value of the distance between the behavioral data obtaining user, described distance vector mean value is the distance vector mean value of a kth point;
Distance average computing module, for calculating the mean value of described distance vector mean value, obtains distance average;
Deviate computing module, for calculating the deviate between described distance vector mean value and described distance average according to described distance vector mean value and described distance average;
Bunch center determination module, if meet default condition for described deviate, then calculate the behavioral data of the user corresponding to distance vector mean value of a described kth point, using the behavioral data of the behavioral data of the user corresponding to the distance vector mean value of a described kth point as a selected kth user.
Further, described first cluster result generation unit and
Described second cluster result generation unit includes:
Euclidean distance computing module, for calculating the Euclidean distance at the respective center of the behavioral data of user and described k bunch;
Users classification module, for the behavioral data of user is incorporated into Euclidean distance between the behavioral data to user minimum bunch.
Further, described device also comprises:
Behavioral data scanning element, for scan in cluster result one to specify bunch in the behavioral data of all users;
Frequent item set and support generation unit, for according to described behavioral data, generate frequent 1 collection to frequent N item collection, and calculate the support of each collection in frequent item set, wherein, frequent N item is concentrated only has an item collection;
Correlation rule generation unit, for the support of concentrate according to frequent N item each collection and frequent N-1 item collection to the support of frequent 1 concentrated each collection, the correlation rule between the behavioral data calculating user.
Again on the one hand, provide a kind of intelligent television, described intelligent television comprises the analytical equipment of user behavior data as above.
In the embodiment of the present invention, by carrying out clustering processing to user behavior data sample, user more similar for behavioral data being incorporated into in one bunch, forming a similar users group.Because the user in similar users group generally has identical preference, therefore, the video user similar to active user once can seen, once browsed website or the article once bought recommend active user, better for user provides personalized service, promote the experience of user.
Accompanying drawing explanation
Fig. 1 is the realization flow figure of the analytical approach of the user behavior data that the embodiment of the present invention one provides;
Fig. 2 is the structural representation of the large data storing platform that the embodiment of the present invention one provides;
Fig. 3 is the cluster process schematic diagram of the user behavior data that the embodiment of the present invention one provides;
Fig. 4 is the realization flow figure of the analytical approach of the user behavior data that the embodiment of the present invention two provides;
Fig. 5 is the concrete structure block diagram of the analytical equipment of the user behavior data that the embodiment of the present invention three provides;
Fig. 6 is the structured flowchart of the analytical equipment of the user behavior data that the embodiment of the present invention four provides.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.
In embodiments of the present invention, first set up user behavior data sample, then clustering processing is carried out to the user behavior data sample set up, user more similar for behavioral data is incorporated into in one bunch, form a similar users group.
Below in conjunction with specific embodiment, realization of the present invention is described in detail:
Embodiment one
Fig. 1 shows the realization flow of the analytical approach of the user behavior data that the embodiment of the present invention one provides.In whole flow process, intelligent television first sets up user behavior data sample, then carries out clustering processing to the user behavior data sample set up, and incorporate user more similar for behavioral data in one bunch into, form multiple similar users group, details are as follows:
In step S101, set up user behavior data sample.
In embodiments of the present invention, first intelligent television obtains the primitive behavior data of user, then described primitive behavior data are arranged according to the data standard cleaning of setting up in advance, format, form the new user behavior data sample meeting specification, be finally these user behavior data Sample Establishing data storage tag that is complete, that meet specification and split catalog, and be directed in large data storing platform.
Wherein, because primitive behavior data appear in a jumble, vary, specifically disordering is presented, have " dirty data " in the process of simultaneously primitive behavior Data Collection to occur, therefore, need to set up data standard in advance, carry out regular by this data standard to primitive behavior data.
Large data storing platform as shown in Figure 2, comprises data storage service cluster and metadata store service cluster and application server cluster.
Data storage service cluster is the loosely-coupled node set of one be made up of multiple node, works in coordination with and externally provides service.Data storage service cluster not only has the advantage of high-performance, High Availabitity or load balancing, can also eliminate Single Point of Faliure and performance bottleneck problem, and has Scale-Out laterally high extended capability, can realize capacity and performance linear expansion.The high availability of data storage service cluster can improve the availability of system and application.
Data storage service cluster is by D_1_1, the D_1_2 shown in Fig. 2 ..., D_2_n data storage server provides transparent redundant processing capabilities, thus realizes the target of uninterrupted application.These servers are jointly for client is provided uniform services, and wherein each provides the server of service to be called node (Node).When node is unavailable or can not process the request of client, this request can forward other enabled node to process in time, and this process is sightless for client, completely transparent.Data storage service cluster is the availability in order to improve system, when individual node breaks down, can continue the demand meeting client.
The data file stored in every platform data storage server has the copy (Replication) of some.Each copy is the copy completely to raw data.By frame perception, copy in large data storing platform is stored in different frame, effectively can improve the availability of file, avoid maybe can not obtaining due to the loss of data that network disconnects or mechanical disorder etc. dynamically can not be surveyed factor and cause at the node of frame distribution.
Frame perceptional function is enabled in copy storage, can also play the effect improving system performance.By reasonably selecting memory node to place copy, and coordinating Routing Protocol, the access of data near-end can be realized, reduce access delay, improving system performance.In addition, request of data reasonably can be distributed in different nodes and network path by copy mechanism, utilize other node balance load, can effectively solve data hot issue, data access flood peak also can effectively solve.For larger file, can also by the parallel reading to multiple copy, dispersion and balance node load, improve the efficiency that file reads, improve the I/O performance of system further.
In step s 102, from described user behavior data sample, choose the behavioral data of k user, and the respective center using the behavioral data of a described k user as k bunch.
In embodiments of the present invention, intelligent television first obtains user behavior data sample from large data storing platform, from the user behavior data sample obtained, choose the behavioral data of k user again, and using the behavioral data of a described k user as k the respective center of individual bunch.
Concrete, in the embodiment of the present invention, for the choosing of behavioral data of k user, have employed the algorithm of the electronic programming coordinate system based on time shaft, choose k time point and the programme of this k the time point correspondence behavioral data as k user.
The behavioral data of k user is chosen by following steps:
Step 1, calculate the user in described user behavior data sample behavioral data between distance.
Concrete, calculate the distance d between user i and the behavioral data of user j k, wherein, d kmeet following formula:
d k=d(χ ij)
Wherein, χ iand χ jrepresent the behavioral data of user i and user j respectively, k is more than or equal to the natural number that 1 is less than or equal to n, and n is the quantity of the user in user behavior data sample.
Step 2, calculate the mean value of described distance, the distance vector mean value of the distance between the behavioral data obtaining user.
Concrete, d kbe the distance between two behavioral datas, these distances are averaged, the distance vector mean value of the distance between the behavioral data that can obtain user meet following formula:
d k ‾ = Σ k = 1 n d k n .
Step 3, calculate the mean value of described distance vector mean value, obtain distance average.
Concrete, obtain distance average by following formulae discovery
D ‾ = Σ k = 1 n d k ‾ n
Wherein, for the distance vector mean value of a kth point, for the mean value of the distance vector mean value of n point.
Step 4, calculate the deviate between described distance vector mean value and described distance average according to described distance vector mean value and described distance average.
Concrete, obtain deviate δ by following formulae discovery:
δ = λ ( | d k ‾ - D ‾ | / D ‾ )
Wherein, λ is modifying factor.
If the described deviate of step 5 meets default condition, then calculate the behavioral data of the user corresponding to distance vector mean value of described kth point, using the behavioral data of the behavioral data of the user corresponding to the distance vector mean value of a described kth point as a selected kth user.
Concrete, if pass through formula the δ value calculated meets the condition preset, then will the distance vector mean value of corresponding kth point is as the behavioral data of a kth user that will choose.
Illustrate that step 1 is to the implementation of step 5 below:
1,10 are respectively, 262,23 according to P point to the distance of other each point ... 17;
2, the mean value of these distances is calculated
3, repeat 1,2 steps, calculate other each point be respectively 32,22,23 ... 96;
4, the mean value of the 3rd step result is calculated
5, establish λ=1.0, if when δ is greater than 0.2, δ meets default condition, and calculating δ=1.0*|56-88|/88=0.36 then P point is the point of selection, using the behavioral data of the distance vector mean value of P point as P the user that will choose.
The method choosing the behavioral data of k user in the embodiment of the present invention, randomness choosing method compared to existing technology, whole clustering algorithm is made to be not easy to be absorbed in poor efficiency consumption, owing to determining the behavioral data of k user very accurately, in the subsequent treatment of clustering algorithm, the speed of convergence of cluster result is obviously accelerated.
In step s 103, calculate the distinctiveness ratio at the respective center of the behavioral data of all the other users in described user behavior data sample and described k bunch respectively, and the behavioral data of all the other users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result.
In embodiments of the present invention, intelligent television calculates the distinctiveness ratio at the respective center of individual bunch of behavioral data and the k of user, and the behavioral data of user is incorporated into distinctiveness ratio minimum bunch detailed step comprise:
The Euclidean distance at the respective center that step 11, the behavioral data calculating user and described k are individual bunch.
Concrete, as shown in Figure 3, user behavior data sample comprises user A, user B, user C, user D, the behavioral data of user E and the behavioral data of 2 users chosen by step S102, respective center using the behavioral data of these 2 users as 2 bunches, can by calculating user A, user B, user C, user D, the behavioral data of user E and the distance at this respective center of 2 bunches, user A is obtained by calculating this distance, user B, user C, user D, the behavioral data of user E and the distinctiveness ratio at this respective center of 2 bunches.
Wherein, adopt Euclidean distance algorithm to calculate user A, user B, user C, user D, the behavioral data of user E and the distance at this respective center of 2 bunches, formula is as follows:
d ( x , y ) = ( x 1 - y 1 ) 2 + ( x 2 - y 2 ) 2 + . . . + ( x n - y n ) 2 = Σ i = 1 n ( x i - y i ) 2
Wherein, x1 represents the i-th dimension coordinate of first point, and x2 represents the i-th dimension coordinate of second point
N is Euclidean space is a point set, its each point can be expressed as (x (1), x (2), x (n)), wherein x (i) (i=1,2 ... n) be real number, be called i-th coordinate of x, d (x, y) represents some x and some y=(y (1), y (2) ... y (n)) between Euclidean distance.
Step 12, the Euclidean distance behavioral data of user being incorporated into user between behavioral data minimum bunch.
Concrete, after calculating user A, user B, user C, user D, the behavioral data of user E and the distance at this respective center of 2 bunches, the behavioral data of these users is incorporated into the Euclidean distance between it minimum bunch in.Such as, as shown in Figure 3, the distance being calculated the center in user A, the behavioral data of user B and the upper right corner bunch by step 11 is little, then the behavioral data of user A, user B is incorporated into the upper right corner bunch in, the distance at the center in the behavioral data of user C, user D, user E and the lower left corner bunch is little, then the behavioral data of user C, user D, user E is incorporated into the lower left corner bunch in.
In step S104, according to cluster result, recalculate the respective center of k bunch, obtain the respective new center of k bunch.
In embodiments of the present invention, as shown in Figure 3, according to cluster result, respectively calculate the upper right corner bunch center and the lower left corner bunch new center.Arithmetic mean especially by all user behavior datas calculated in each bunch obtains the new center of each bunch.
In step S105, calculate the distinctiveness ratio at the behavioral data of all users in described user behavior data sample and the respective new center of described k bunch respectively, and the behavioral data of all users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result, return step S104, until the number of times that cluster result no longer changes or step S104 performs reaches default number of times.
In embodiments of the present invention, the implementation signal of step S104 and S105 as shown in Figure 3, specifically repeats no more.When the number of times that cluster result no longer changes or step S104 performs reaches default number of times, using the cluster result that the obtains behavioral data classification results as final user.
The present embodiment, by carrying out clustering processing to user behavior data sample, incorporates into user more similar for behavioral data in one bunch, forms a similar users group.Because the user in similar users group generally has identical preference, therefore, the video user similar to active user once can seen, once browsed website or the article once bought recommend active user, better for user provides personalized service, promote the experience of user.Especially, the behavioral data of k user, compared to existing technology, not that randomness is chosen, make whole clustering algorithm be not easy to be absorbed in poor efficiency consumption, owing to determining the behavioral data of k user very accurately, in the subsequent treatment of clustering algorithm, the speed of convergence of cluster result is obviously accelerated.
One of ordinary skill in the art will appreciate that all or part of step realized in the various embodiments described above method is that the hardware that can carry out instruction relevant by program has come, corresponding program can be stored in a computer read/write memory medium, described storage medium, as ROM/RAM, disk or CD etc.
Embodiment two
Fig. 4 shows the realization flow of the analytical approach of the user behavior data that the embodiment of the present invention two provides.In whole flow process, intelligent television first sets up user behavior data sample, again clustering processing is carried out to the user behavior data sample set up, user more similar for behavioral data is incorporated into in one bunch, form a similar users group, finally, the undiscovered incidence relation between the behavioral data of the user in the similar users group be in same cluster is found out, find out the related network of the stealth comprised in behavioral data, details are as follows for detailed process:
In step S401, set up user behavior data sample.
In step S402, from described user behavior data sample, choose the behavioral data of k user, and the respective center using the behavioral data of a described k user as k bunch.
In step S403, calculate the distinctiveness ratio at the respective center of the behavioral data of all the other users in described user behavior data sample and described k bunch respectively, and the behavioral data of all the other users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result.
In step s 404, according to cluster result, recalculate the respective center of k bunch, obtain the respective new center of k bunch.
In step S405, calculate the distinctiveness ratio at the behavioral data of all users in described user behavior data sample and the respective new center of described k bunch respectively, and the behavioral data of all users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result, return step S404, until the number of times that cluster result no longer changes or step S404 performs reaches default number of times.
In step S406, one in scanning cluster result to specify bunch in the behavioral data of all users.
In embodiments of the present invention, one in intelligent television scanning cluster result to specify bunch in the behavioral data of all users.The behavioral data such as scanning the user that the appointment bunch that obtains comprises is as shown in table 1:
User record Viewing video ID
R1 T1,T2,T5
R2 T2,T3
R3 T2,T4
R4 T1,T2,T4
R5 T1,T3
R6 T2,T3
R7 T1,T3
R8 T1,T2,T3,T5
R9 T1,T2,T3
Table 1
In step S 407, according to described behavioral data, generate frequent 1 collection to frequent N item collection, and calculate the support of each frequent item set, wherein, frequent N item is concentrated only has an item collection.
In embodiments of the present invention, according to the behavioral data of user in table 1, the frequency of respective behavior of the user in specifying bunch can be calculated, then according to the number of times that each behavior occurs, generate different frequent item sets and the support of each frequent item set.Such as, the behavioral data in his-and-hers watches 1, can generate frequent 1 collection, frequent 2 collection, frequent 3 collection and frequent 4 collection.Wherein, frequent 1 concentrate comprise an item collection, frequent 2 concentrate comprise 2 item collection, the like, frequent N item concentrate comprises N number of collection.
Concrete, frequent 1 collection of generation is as follows:
[T1]6
[T2]7
[T3]6
[T4]2
[T5]2
Frequent 2 collection are as follows:
[T1,T2]4
[T1,T3]4
[T1,T5]2
[T2,T3]4
[T2,T4]2
[T2,T5]2
Frequent 3 collection are as follows:
[T1,T2,T3]2
[T1,T2,T5]2
Frequent 4 collection are as follows:
[T1,T2,T3,T5]1
Wherein, if frequent k item is concentrated when only having an item collection, then not regeneration frequent k+1 item collection.
In step S408, the support of each the collection concentrated according to frequent N item and frequent N-1 item collection to the support of frequent 1 concentrated each collection, the correlation rule between the behavioral data calculating user.
Wherein, the number of times of the corresponding each behavior generation of the support of each collection.Such as, occur 6 times in the behavioral data of frequent 1 user of concentrated item collection [T1] shown in table 1, so the support of item collection [T1] is 6.
In embodiments of the present invention, with frequent 3 collection [T1, T2, T5] for example, its nonvoid proper subset has [T1, T2], [T1, T5], [T2, T5], [T1], [T2], [T5], calculates [T1, T2, T5] corresponding [T1, T2], [T1, T5], [T2, T5], [T1], [T2], the degree of confidence of [T5]:
[T1,T2]-》[T5]2/4=50%
[T1,T5]-》[T2]2/2=100%
[T2,T5]-》[T1]2/2=100%
[T1]-》[T2,T5]2/6=33%
[T2]-》[T1,T5]2/7=29%
[T5]-》[T1,T2]2/2=100%
If the minimum threshold of the degree of confidence preset is 60%, then the correlation rule produced has [T1, T5]-" [T2], [T2, T5]-" [T1], [T5]-" [T1, T2].
Wherein, two kinds of events produce correlation rule, represent that these two kinds of simultaneous likelihood ratios are higher.Such as, [T1, T5] and [T2] in the present embodiment produces correlation rule, represents, when [T1, T5] occurs, occurs that the probability of [T2] is very high.
The present embodiment, undiscovered incidence relation between the behavioral data of the user in the similar users group be in same cluster can be found out, find out the related network of the stealth comprised in behavioral data, when determining that recommending user recommends a certain video, other video recommendations of correlation rule can be produced to user by with this video, the experience of user can be promoted further.
Embodiment three
Fig. 5 shows the concrete structure block diagram of the analytical equipment of the user behavior data that the embodiment of the present invention three provides, and for convenience of explanation, illustrate only the part relevant to the embodiment of the present invention.
This device can be the unit of the software unit be built in intelligent television, hardware cell or software and hardware combining, and this device 5 comprises: behavioral data Sample Establishing unit 51, first bunch of center determining unit 52, first cluster result generation unit 53, second bunch of center determining unit 54 and the second cluster result generation unit 55.
Wherein, behavioral data Sample Establishing unit 51, for setting up user behavior data sample;
First bunch of center determining unit 52, for choosing the behavioral data of k user from described user behavior data sample, the respective center using the behavioral data of a described k user as k bunch;
First cluster result generation unit 53, for calculating the distinctiveness ratio at the respective center of the behavioral data of all the other users in described user behavior data sample and described k bunch respectively, and the behavioral data of all the other users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result;
Second bunch of center determining unit 54, for according to cluster result, recalculates the respective center of k bunch, obtains the respective new center of k bunch;
Second cluster result generation unit 55, for calculating the distinctiveness ratio at the respective new center of the behavioral data of all users in described user behavior data sample and described k bunch respectively, and the behavioral data of all users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result, return and call second bunch of center determining unit, until the number of times that cluster result no longer changes or step D performs reaches default number of times.
Concrete, described first bunch of center determining unit 52 comprises: distance calculation module, distance vector mean value calculation module, distance average computing module, deviate computing module and bunch center determination module.
Wherein, distance calculation module, for calculate the user in described user behavior data sample behavioral data between distance;
Distance vector mean value calculation module, for calculating the mean value of described distance, the distance vector mean value of the distance between the behavioral data obtaining user, described distance vector mean value is the distance vector mean value of a kth point;
Distance average computing module, for calculating the mean value of described distance vector mean value, obtains distance average;
Deviate computing module, for calculating the deviate between described distance vector mean value and described distance average according to described distance vector mean value and described distance average;
Bunch center determination module, if meet default condition for described deviate, then calculate the behavioral data of the user corresponding to distance vector mean value of a described kth point, using the behavioral data of the behavioral data of the user corresponding to the distance vector mean value of a described kth point as a selected kth user.
Concrete, described in described first cluster result generation unit 53 and person, the second cluster result generation unit 55 includes:
Euclidean distance computing module, for calculating the Euclidean distance at the respective center of the behavioral data of user and described k bunch;
Users classification module, for the behavioral data of user is incorporated into Euclidean distance between the behavioral data to user minimum bunch.
The analytical equipment of the user behavior data that the embodiment of the present invention provides can be applied in the embodiment of the method one of aforementioned correspondence, and details, see the description of above-described embodiment one, do not repeat them here.
Embodiment four
Fig. 6 shows the concrete structure block diagram of the analytical equipment of the user behavior data that the embodiment of the present invention four provides, and for convenience of explanation, illustrate only the part relevant to the embodiment of the present invention.This device can be the unit of the software unit be built in intelligent television, hardware cell or software and hardware combining, this device 6 comprises 51, first bunch of center determining unit 52, the first cluster result generation unit of the behavioral data Sample Establishing unit described in embodiment three, 53, second bunch of center determining unit 54 and the second cluster result generation unit 55, also comprises:
Wherein, behavioral data scanning element 61, for scan in cluster result one to specify bunch in the behavioral data of all users;
Frequent item set and support generation unit 62, for according to described behavioral data, generate frequent 1 collection to frequent N item collection, and calculate the support of each collection in frequent item set, wherein, frequent N item is concentrated only has an item collection;
Correlation rule generation unit 63, for the support of concentrate according to frequent N item each collection and frequent N-1 item collection to the support of frequent 1 concentrated each collection, the correlation rule between the behavioral data calculating user.
The analytical equipment of the user behavior data that the embodiment of the present invention provides can be applied in the embodiment of the method two of aforementioned correspondence, and details, see the description of above-described embodiment two, do not repeat them here.
It should be noted that in said system embodiment, included unit is carry out dividing according to function logic, but is not limited to above-mentioned division, as long as can realize corresponding function; In addition, the concrete title of each functional unit, also just for the ease of mutual differentiation, is not limited to protection scope of the present invention.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. an analytical approach for user behavior data, is characterized in that, described method comprises:
Steps A, set up user behavior data sample;
Step B, from described user behavior data sample, choose the behavioral data of k user, the respective center using the behavioral data of a described k user as k bunch;
The distinctiveness ratio at the respective center of step C, the behavioral data calculating all the other users in described user behavior data sample respectively and described k bunch, and the behavioral data of all the other users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result;
Step D, according to cluster result, recalculate the respective center of k bunch, obtain the respective new center of k bunch;
The distinctiveness ratio at the respective new center that step e, the behavioral data calculating all users in described user behavior data sample respectively and described k are individual bunch, and the behavioral data of all users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result, return step D, until the number of times that cluster result no longer changes or step D performs reaches default number of times.
2. the method for claim 1, is characterized in that, described step B comprises:
Distance between the behavioral data calculating the user in described user behavior data sample;
Calculate the mean value of described distance, the distance vector mean value of the distance between the behavioral data obtaining user, described distance vector mean value is the distance vector mean value of a kth point;
Calculate the mean value of described distance vector mean value, obtain distance average;
The deviate between described distance vector mean value and described distance average is calculated according to described distance vector mean value and described distance average;
If described deviate meets default condition, then calculate the behavioral data of the user corresponding to distance vector mean value of a described kth point, using the behavioral data of the behavioral data of the user corresponding to the distance vector mean value of a described kth point as a selected kth user.
3. the method for claim 1, is characterized in that, calculates the distinctiveness ratio at the respective center of individual bunch of behavioral data and the k of user, and is incorporated into by the behavioral data of user to minimum bunch the comprising of distinctiveness ratio:
Calculate the Euclidean distance at the behavioral data of user and the respective center of described k bunch;
The behavioral data of user is incorporated into Euclidean distance between the behavioral data to user minimum bunch.
4. the method as described in claim 1 or 2 or 3, is characterized in that, after described step e, also comprise:
One in scanning cluster result to specify bunch in the behavioral data of all users;
According to described behavioral data, generate frequent 1 collection to frequent N item collection, and calculate the support of each collection in frequent item set, wherein, frequent N item is concentrated only has an item collection;
The support of the item collection concentrated according to frequent N item and frequent N-1 item collection to the support of frequent 1 concentrated item collection, the correlation rule between the behavioral data calculating user.
5. method as claimed in claim 2, it is characterized in that, if described deviate meets default condition, then calculate the behavioral data of the user corresponding to distance vector mean value of a described kth point, the behavioral data of the behavioral data of the user corresponding to the distance vector mean value of a described kth point as a selected kth user be specially:
If pass through formula the δ value calculated meets the condition preset, then will the distance vector mean value of corresponding kth point is as the behavioral data of a kth user that will choose;
Wherein, for the distance vector mean value of a kth point, for distance average, λ is modifying factor, and δ is the deviate between distance vector mean value and distance average.
6. an analytical equipment for user behavior data, is characterized in that, described in comprise:
Behavioral data Sample Establishing unit, for setting up user behavior data sample;
First bunch of center determining unit, for choosing the behavioral data of k user from described user behavior data sample, the respective center using the behavioral data of a described k user as k bunch;
First cluster result generation unit, for calculating the distinctiveness ratio at the respective center of the behavioral data of all the other users in described user behavior data sample and described k bunch respectively, and the behavioral data of all the other users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result;
Second bunch of center determining unit, for according to cluster result, recalculates the respective center of k bunch, obtains the respective new center of k bunch;
Second cluster result generation unit, for calculating the distinctiveness ratio at the respective new center of the behavioral data of all users in described user behavior data sample and described k bunch respectively, and the behavioral data of all users is incorporated into respectively to distinctiveness ratio minimum bunch, obtain cluster result, return and call second bunch of center determining unit, until the number of times that cluster result no longer changes or step D performs reaches default number of times.
7. device as claimed in claim 6, it is characterized in that, described first bunch of center determining unit comprises:
Distance calculation module, for calculate the user in described user behavior data sample behavioral data between distance;
Distance vector mean value calculation module, for calculating the mean value of described distance, the distance vector mean value of the distance between the behavioral data obtaining user, described distance vector mean value is the distance vector mean value of a kth point;
Distance average computing module, for calculating the mean value of described distance vector mean value, obtains distance average;
Deviate computing module, for calculating the deviate between described distance vector mean value and described distance average according to described distance vector mean value and described distance average;
Bunch center determination module, if meet default condition for described deviate, then calculate the behavioral data of the user corresponding to distance vector mean value of a described kth point, using the behavioral data of the behavioral data of the user corresponding to the distance vector mean value of a described kth point as a selected kth user.
8. device as claimed in claim 6, it is characterized in that, described first cluster result generation unit and described second cluster result generation unit include:
Euclidean distance computing module, for calculating the Euclidean distance at the respective center of the behavioral data of user and described k bunch;
Users classification module, for the behavioral data of user is incorporated into Euclidean distance between the behavioral data to user minimum bunch.
9. the device as described in claim 6 or 7 or 8, it is characterized in that, described device also comprises:
Behavioral data scanning element, for scan in cluster result one to specify bunch in the behavioral data of all users;
Frequent item set and support generation unit, for according to described behavioral data, generate frequent 1 collection to frequent N item collection, and calculate the support of each collection in frequent item set, wherein, frequent N item is concentrated only has an item collection;
Correlation rule generation unit, for the support of item collection concentrated according to frequent N item and frequent N-1 item collection to the support of frequent 1 concentrated item collection, the correlation rule between the behavioral data calculating user.
10. an intelligent television, is characterized in that, described intelligent television comprises the analytical equipment of the user behavior data as described in any one of claim 6 to 9.
CN201410380588.8A 2014-08-04 2014-08-04 A kind of analysis method of user behavior data, device and smart television Active CN105320702B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410380588.8A CN105320702B (en) 2014-08-04 2014-08-04 A kind of analysis method of user behavior data, device and smart television

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410380588.8A CN105320702B (en) 2014-08-04 2014-08-04 A kind of analysis method of user behavior data, device and smart television

Publications (2)

Publication Number Publication Date
CN105320702A true CN105320702A (en) 2016-02-10
CN105320702B CN105320702B (en) 2019-02-01

Family

ID=55248102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410380588.8A Active CN105320702B (en) 2014-08-04 2014-08-04 A kind of analysis method of user behavior data, device and smart television

Country Status (1)

Country Link
CN (1) CN105320702B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106412635A (en) * 2016-09-29 2017-02-15 北京赢点科技有限公司 Intelligent advertising method and system
CN107526735A (en) * 2016-06-20 2017-12-29 杭州海康威视数字技术股份有限公司 A kind of recognition methods of incidence relation and device
CN107623715A (en) * 2017-08-08 2018-01-23 阿里巴巴集团控股有限公司 A kind of identity information acquisition methods and device
CN109753994A (en) * 2018-12-11 2019-05-14 东软集团股份有限公司 User's portrait method, apparatus, computer readable storage medium and electronic equipment
CN109861953A (en) * 2018-05-14 2019-06-07 新华三信息安全技术有限公司 A kind of abnormal user recognition methods and device
CN110929145A (en) * 2019-10-17 2020-03-27 平安科技(深圳)有限公司 Public opinion analysis method, public opinion analysis device, computer device and storage medium
CN111159555A (en) * 2019-12-30 2020-05-15 北京每日优鲜电子商务有限公司 Commodity recommendation method, commodity recommendation device, server and storage medium
CN112783956A (en) * 2019-11-08 2021-05-11 北京沃东天骏信息技术有限公司 Information processing method and device
CN113378020A (en) * 2021-06-08 2021-09-10 深圳Tcl新技术有限公司 Acquisition method, device and computer readable storage medium for similar film watching users

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120136861A1 (en) * 2010-11-25 2012-05-31 Samsung Electronics Co., Ltd. Content-providing method and system
CN103353880A (en) * 2013-06-20 2013-10-16 兰州交通大学 Data mining method adopting dissimilarity degree clustering and association
CN103886003A (en) * 2013-09-22 2014-06-25 天津思博科科技发展有限公司 Collaborative filtering processor
CN103927347A (en) * 2014-04-01 2014-07-16 复旦大学 Collaborative filtering recommendation algorithm based on user behavior models and ant colony clustering

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120136861A1 (en) * 2010-11-25 2012-05-31 Samsung Electronics Co., Ltd. Content-providing method and system
CN103353880A (en) * 2013-06-20 2013-10-16 兰州交通大学 Data mining method adopting dissimilarity degree clustering and association
CN103886003A (en) * 2013-09-22 2014-06-25 天津思博科科技发展有限公司 Collaborative filtering processor
CN103927347A (en) * 2014-04-01 2014-07-16 复旦大学 Collaborative filtering recommendation algorithm based on user behavior models and ant colony clustering

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
潘莹等: ""基于K-means算法的校园网用户行为聚类分析"", 《计算技术与自动化》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107526735A (en) * 2016-06-20 2017-12-29 杭州海康威视数字技术股份有限公司 A kind of recognition methods of incidence relation and device
CN107526735B (en) * 2016-06-20 2020-12-11 杭州海康威视数字技术股份有限公司 Method and device for identifying incidence relation
CN106412635A (en) * 2016-09-29 2017-02-15 北京赢点科技有限公司 Intelligent advertising method and system
CN106412635B (en) * 2016-09-29 2019-07-30 北京赢点科技有限公司 A kind of intelligence advertisement placement method and system
CN107623715A (en) * 2017-08-08 2018-01-23 阿里巴巴集团控股有限公司 A kind of identity information acquisition methods and device
CN107623715B (en) * 2017-08-08 2020-06-09 阿里巴巴集团控股有限公司 Identity information acquisition method and device
CN109861953B (en) * 2018-05-14 2020-08-21 新华三信息安全技术有限公司 Abnormal user identification method and device
CN109861953A (en) * 2018-05-14 2019-06-07 新华三信息安全技术有限公司 A kind of abnormal user recognition methods and device
US11671434B2 (en) 2018-05-14 2023-06-06 New H3C Security Technologies Co., Ltd. Abnormal user identification
CN109753994A (en) * 2018-12-11 2019-05-14 东软集团股份有限公司 User's portrait method, apparatus, computer readable storage medium and electronic equipment
CN110929145A (en) * 2019-10-17 2020-03-27 平安科技(深圳)有限公司 Public opinion analysis method, public opinion analysis device, computer device and storage medium
CN112783956A (en) * 2019-11-08 2021-05-11 北京沃东天骏信息技术有限公司 Information processing method and device
CN112783956B (en) * 2019-11-08 2024-03-05 北京沃东天骏信息技术有限公司 Information processing method and device
CN111159555A (en) * 2019-12-30 2020-05-15 北京每日优鲜电子商务有限公司 Commodity recommendation method, commodity recommendation device, server and storage medium
CN113378020A (en) * 2021-06-08 2021-09-10 深圳Tcl新技术有限公司 Acquisition method, device and computer readable storage medium for similar film watching users

Also Published As

Publication number Publication date
CN105320702B (en) 2019-02-01

Similar Documents

Publication Publication Date Title
CN105320702A (en) Analysis method and device for user behavior data and smart television
CN112352234B (en) System for processing concurrent attribute map queries
CN106528787B (en) query method and device based on multidimensional analysis of mass data
CN108090208A (en) Fused data processing method and processing device
US11100073B2 (en) Method and system for data assignment in a distributed system
CN102298650B (en) Distributed recommendation method of massive digital information
WO2016123808A1 (en) Data processing system, calculation node and data processing method
CN103218404A (en) Multi-dimensional metadata management method and system based on association characteristics
CN103455531A (en) Parallel indexing method supporting real-time biased query of high dimensional data
CN110990372A (en) Dimensional data processing method and device and data query method and device
Gupta et al. Faster as well as early measurements from big data predictive analytics model
CN107818116B (en) Method and equipment for determining user behavior area position information
JP2023536621A (en) Image grouping method and apparatus for three-dimensional reconstruction, electronic device, and computer-readable storage medium
CN111400301B (en) Data query method, device and equipment
Bao et al. Optimizing segmented trajectory data storage with HBase for improved spatio-temporal query efficiency
CN102043857A (en) All-nearest-neighbor query method and system
CN108304404B (en) Data frequency estimation method based on improved Sketch structure
Rahman et al. Hdbscan: Density based clustering over location based services
Ryu et al. MapReduce-based skyline query processing scheme using adaptive two-level grids
CN106919946B (en) A kind of method and device of audience selection
CN106250565A (en) Querying method based on burst relevant database and system
US8554757B2 (en) Determining a score for a product based on a location of the product
US9547711B1 (en) Shard data based on associated social relationship
CN106484747A (en) A kind of webpage item recommendation method based on alternative events and device
Soltani et al. MovePattern: Interactive framework to provide scalable visualization of movement patterns

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant