CN103425698B - Excavate the data handling system and method for television viewing mode - Google Patents

Excavate the data handling system and method for television viewing mode Download PDF

Info

Publication number
CN103425698B
CN103425698B CN201210164390.7A CN201210164390A CN103425698B CN 103425698 B CN103425698 B CN 103425698B CN 201210164390 A CN201210164390 A CN 201210164390A CN 103425698 B CN103425698 B CN 103425698B
Authority
CN
China
Prior art keywords
user
viewing
history data
mining
viewing history
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210164390.7A
Other languages
Chinese (zh)
Other versions
CN103425698A (en
Inventor
汪灏泓
董延平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TCL Corp
Original Assignee
TCL Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TCL Corp filed Critical TCL Corp
Priority to CN201210164390.7A priority Critical patent/CN103425698B/en
Publication of CN103425698A publication Critical patent/CN103425698A/en
Application granted granted Critical
Publication of CN103425698B publication Critical patent/CN103425698B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

The invention discloses a kind of data handling system and method for excavating television viewing mode, mapping the problem of excavating the user's watching mode being hidden under user's viewing history, which is turned into, can use the optimization problem of rate distortion theory, historical record is watched by given user, the present invention can efficiently find out user's watching mode very much, and ensure the data and initial data distortion rate minimum reconstructed by these user's watching modes, the present invention not only facilitates viewing of the user to TV programme, but also the service quality of tv product is improved, beneficial to the popularization of product.

Description

Data processing system and method for mining television viewing mode
Technical Field
The invention relates to mining of television watching modes of users, in particular to a data processing system and a data processing method for mining television watching modes based on a rate-distortion theory.
Background
At present, user habits play a crucial role in the aspect of providing consumer goods and services, and users using the products and services must be known to promote the products and services, so that many solutions are available in the prior art for the problem, but the solutions are not ideal because the behaviors of the users have strong deterministic factors. In addition, some technical solutions study user feedback in a hidden manner, such as using a program selection history or a viewing history of a user to find viewer preferences, but this manner is too simple to really reveal a deeper pattern hidden under user behavior.
In the television field, the mining of the user viewing modes also determines the service quality of products and the development of the television industry, and the mining of the user viewing modes is very difficult due to the composition diversification of one family, the overlapping characteristic of the viewing time of different family members, the inevitable accidental interference of the subtle connections existing between the titles of the viewing record contents and the like.
There is a strong autocorrelation in the user viewing history, especially for stable households where the viewing mode of the television is more easily explored. As shown in fig. 1, where circles, triangles and squares represent key tv program sequences, 'X' represents other non-key programs, and points represented by circles, squares and triangles are characterized by periodic repetition in a long-term statistic. Although autocorrelation studies are used in a variety of data statistical analyses, such as web access, music and audio, image and video, etc., no attempt has been made to introduce this approach into television viewing history learning. Autocorrelation can be masked by noise points (as unexpected) or multiple patterns in the same household can often overlap, which is a difficulty faced in introducing autocorrelation into the field. That is, how to find patterns in a noisy data and how to segment overlapping patterns in a log of user usage histories that intersect each other, are challenges faced by those skilled in the art.
On the other hand, there are complex inter-related relationships among the large amount of user data in movies, television programs, news, music, games, and the like. These relationships must be considered during the user's tv viewing pattern mining process.
Accordingly, the prior art is yet to be improved and developed.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a data processing system and method for mining a tv viewing mode, which aims at the above-mentioned defects in the prior art, and the mining problem of the tv user viewing mode is converted into an optimization problem, and the optimization problem is solved by adopting a rate distortion theory, so as to mine an optimal user viewing behavior mode.
The technical scheme adopted by the invention for solving the technical problem is as follows:
a data processing system for mining television viewing patterns, comprising
The data collection module is used for collecting the watching history data of the television programs;
the data filtering module is used for filtering and identifying the watching history data of the user from the watching history data of the television programs collected by the data collecting module and filtering the noise data;
the data mining and feature classification module is used for mining and classifying data features of a content database storing user watching history and television program information, finding out the watching information of the user and forming a cluster set by the characteristic-classified user watching information;
the analysis module is used for analyzing the characteristics of the viewing history content of the user and the cluster set obtained from the data mining and characteristic classification module, selecting classification from the cluster set and obtaining a viewing history data set representing the history of the behavior content of the original user;
and the pattern mining module is used for mining a main user viewing pattern set from the viewing history data set acquired by the analysis module.
The data processing system for mining television viewing patterns, wherein the pattern mining module comprises:
the user viewing mode analysis module is used for carrying out mathematical modeling on the viewing history data set, generating a viewing history data set of the user and carrying out reconstruction on the viewing history data set;
and the comparison module is used for comparing the viewing history data set with the viewing history data set reconstructed by the user viewing mode analysis module and finding out the optimal user viewing mode combination.
A data processing method for mining television viewing modes comprises the following steps:
A. the data collection module collects the watching history data of the television programs in advance, filters and identifies the watching history data of the user from the watching history data through the data filtering module, and filters noise data;
B. the data mining and feature classification module is used for mining and classifying data features of a content database in which user watching history and television program information are stored, finding out the watching information of the user, and forming a cluster set by the characteristic-classified user watching information;
C. the analysis module selects classification from the user viewing history content characteristics and the cluster set obtained from the last step to obtain a viewing history data set representing the original user behavior content history;
D. and the mode mining module is used for mining a main user viewing mode set from the viewing history data set and outputting the main user viewing mode set.
The data processing method for mining the television viewing mode, wherein the step D specifically comprises the following steps:
d1, defining the viewing history data set as follows:
specifying N as the length of a user-specific time period, { ViH (i ═ 0, 1.., N-1) is the ith time point viewing history data set;
defining a user viewing mode as P (c, s, N, m, P), wherein c represents a content category, s represents a user viewing mode start time point, and s ∈ {0, 1., N-1 }; n is the length of the occurrence period of the user viewing behavior, and N belongs to {1, 2., N/2 }; m represents the number of cycle repetitions, and m ∈ { 1.., N/N }; p represents the pattern string length, and p ∈ { 1.., n };
definition Pr(c, s, n, m, p) } (R0, 1.., R-1) is a result set of a user viewing pattern, and a viewing history data set is re-structured by the result set, and a result is defined asWherein R represents the number of representative patterns found from the historical data;
d2, contrast viewing history dataset { ViAnd re-structured viewing history data setFinding the smallest differenceAnd pass throughAn optimal user viewing pattern is determined.
The data processing method for mining the television viewing mode comprises the step of comparing a viewing history data set { V }iAnd re-structured viewing history data setThe method specifically comprises the following steps:
defining a viewing history data set ViAnd re-structured viewing history data setIs the degree of distortion, and defines an expressionQuantization measures the degree of distortion, wherein,
the data processing method for mining the television viewing mode comprises the step of viewing a historical data set { V ] when the distortion degree is minimumiAnd re-structured viewing history data setWherein the distortion degree is minimized toR is less than or equal to RThreshold,RThresholdIs a mode number limitation.
The data processing method for mining the television viewing mode is characterized in that the step D is also realized by adopting a Lagrange multiplier method and defines a Lagrange function
Wherein λ is the Lagrange multiplier,
determining a lambda*Let u stand for*=arg[minuJλ(u)]And satisfy R (u)*)=RThreshold,u*To representR is less than or equal to RThresholdThe optimal solution of (1).
The data processing method for mining television viewing modes, wherein the step D further comprises:
defining a decision point vector U, UkVector set (u) representing the k-th decision pointk{ (c, s, n, m, p) k, o } (o stands for the total number of modes), and a cost function G is definedk(uk-q,...,uk) Denotes the minimum cost of the k term, and GN(uN-q,...,uN) Representing the minimum cost required for the last item;
according to given q +1 decision vector sets uk-q-1,...,uk-1Calculate to uk-1Cost function G of termsk-1(uk-q-1,...,uk-1) A value of, and tokCost function value of term independent of u1,u2,...,uk-q-1Determining the optimal number of user viewing modes.
The data processing method for mining the television viewing mode is characterized in that a K-means algorithm is adopted to classify the data characteristics of the content database.
The data processing method for mining the television viewing mode is characterized in that the viewing history content characteristics of a user are analyzed by a principal component analysis method.
According to the data processing method for mining the television watching mode, the mining problem of the user watching mode is converted into the optimization problem, the optimization problem is solved through the rate distortion theory, the optimal television user watching mode is further mined, the television watching mode of the user can be intelligently given when the user watches the television, the user can conveniently watch television programs, the service quality of television products is improved, and the popularization of the products is facilitated.
Drawings
FIG. 1 is a graph of statistical viewing history using symbolic displays in the prior art.
Fig. 2 is a schematic structural diagram of a data processing system for mining television viewing patterns provided by the present invention.
Fig. 3 is a schematic structural diagram of a pattern mining module in the data processing system for mining television viewing patterns according to the present invention.
FIG. 4 is a system workflow diagram of the present invention.
Fig. 5 is a specific flowchart of the data processing method for mining tv viewing modes according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The method utilizes the self-similarity characteristic of the watching history of the user, converts the mode of mining the watching television of the user to the rate distortion optimization problem, and utilizes the distortion degree theory to search a solution. The user viewing mode is found by using the rate distortion theory, so that the difference between the original historical record statistical information and the information obtained by reconstructing and counting the representative viewing mode is minimized. This new data mining process for viewing patterns includes: content classification, selection principle of categories and user viewing pattern recognition. And simultaneously, an efficient dynamic programming algorithm is provided to solve the provided optimization problem. This work can be conveniently used to solve other data statistics problems with self-similar features, in other words, it can be easily applied to television and movie program recommendations, advertisement and service recommendations, user/visitor discovery and identification, family and mobile device personalization, social activities, and so on.
The method aims to solve the optimization problem by converting the problem of mining the user viewing mode, establish a mathematical model for the viewing history content of the user, convert the viewing history content into the optimization solution and further mine the optimal user viewing mode.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a data processing system for mining television viewing patterns provided by the present invention, including:
the data collection module 10 is used for collecting the watching history data of the television programs;
a data filtering module 20, configured to filter and identify viewing history data of the user from the viewing history data of the television program collected by the data collecting module 10, and filter noise data, for example, data with a large difference from a normal viewing record of the user;
the data mining and feature classification module 30 is used for performing data mining and data feature classification on a content database in which user viewing history and television program information are stored, finding out viewing information of users, and forming a cluster set by using the user viewing information after feature classification;
the analysis module 40 is used for analyzing the characteristics of the viewing history content of the user and the cluster set obtained from the data mining and characteristic classification module, selecting classification from the cluster set, and obtaining a viewing history data set representing the history of the behavior content of the original user;
a pattern mining module 50 for mining a main set of user viewing patterns from the viewing history data set obtained by the analysis module 40.
As shown in fig. 3, the pattern mining module 50 further includes:
a user viewing pattern analysis module 51, configured to mathematically model the viewing history data set, generate a viewing history data set of a user, and reconstruct the viewing history data set;
a comparison module 52 for comparing the viewing history data set with the viewing history data set reconstructed by the user viewing pattern analysis module 51, and determining an optimal user viewing pattern set by finding two viewing history data sets.
Based on the data processing system for mining the television viewing mode, the invention also provides a data processing method for mining the television viewing mode, and fig. 4 is a system work flow chart of the invention, which mainly comprises the following steps:
after the program is started, data collection and filtration are carried out;
content mining and classification are carried out on data in a content database;
defining a selection principle of a category;
identifying a user viewing mode;
outputting a user viewing mode;
with reference to the above system work flow chart of the present invention, as shown in fig. 5, it is a specific method flow chart of the present invention, which mainly includes the following steps:
step S10, the data collection module collects the watching history data of the TV program in advance, and filters and identifies the watching history data of the user through the data filtering module, and filters the noise data;
step S20, the data mining and feature classification module searches the user watching information by mining and classifying the data features of the content database storing the user watching history and the television program information, and forms a cluster set with the user watching information after feature classification;
step S30, the analysis module selects classification from the user viewing history content characteristics and the cluster set obtained from the previous step, and obtains a viewing history data set representing the original user behavior content history;
and step S40, the pattern mining module mines a main user viewing pattern set from the viewing history data set and outputs the main user viewing pattern set.
The above steps are specifically described below with reference to specific examples.
In step S10, the user' S viewing history data is collected and filtered primarily to select key point data for subsequent user viewing pattern recognition, such as programs that the user likes to watch. These data include the type of programs the user watches at a fixed point in time, the duration of watching, the number of times of watching, etc. on a daily, weekly or monthly basis.
In step S20, the content database stores the viewing history of the user and the information of the tv programs, and the K-means algorithm or other algorithms are used to find out useful viewing information of the user during data mining and classification, and the viewing history of the user is represented in the form of a cluster, and the category is comprehensively described, which is convenient for the subsequent processing.
In step S30, the content characteristics of the user viewing history and the cluster set formed in step S20 are analyzed, and a classification is automatically selected from the analyzed data, so as to obtain data that most represents the original user viewing content history, and the data is expressed in the form of a data set. The analysis of the user viewing history content characteristics may be Principal Component Analysis (PCA), or a Principal characteristic analysis algorithm (PFA), or the like.
Step S40 is to mine the viewing history data set and to mine the main user viewing patterns, and the number of patterns and the number of categories are closely related, and pattern recognition is also the key and difficult point of the present invention. Because the robustness of the pattern recognition algorithm must be guaranteed, its main difficulties are as follows:
1) a typical family is composed of a plurality of members, and therefore the history of all family members watching tv programs constitutes the family's viewing history. Thus, the history of each person is difficult to be independently drawn out;
2) the current television user identification technology cannot be completely applied to a remote control system and a television system, so that users using the television cannot be distinguished if the users do not log in the television through definite users;
3) a user may have a wide range of interests, and his (her) interests may slowly shift over time, which also makes it difficult to identify user patterns;
4) subtle associations exist among programs, and some patterns are hidden under the associations, but the patterns are difficult to discover;
5) many unexpected events, such as the occurrence of a large event, change the user's short-term daily viewing pattern, how we distinguish it from noise.
To reduce technical processing complexity, the present invention divides a full time into multiple time bands because the primary time bands in which users of different age groups watch television are different, e.g., most children watch television from late afternoon to early evening, and parents who work watch television at midnight. Of course, the time-band division can be determined by the family, and the time-band division is different for different families. So that a time band division suitable for one family can be easily found. Therefore, a complex problem is simplified by a time band division method, the data of pattern recognition is formulated, and the pattern recognition problem is converted into an optimization solving problem.
The following is a detailed description of the formulation of pattern recognition data and the process of converting the pattern recognition problem into an optimization solution problem.
The present invention mines a user's viewing pattern from a piece of historical data of the user, so the length of a user-specific time period is represented by N, { ViDenotes the ith time point viewing history data set, V-1 (i ═ 0, 1.., N-1)iThe data set may be empty, representing that no record was viewed for this period of time; if there is a record, the record set is Vi={Tij(j ═ 0, 1.., M-1), where M represents the total number of programs viewed at the ith time point; c (T)ij) Represents TijTo which useful user viewing pattern data is ultimately mined from these data.
A user viewing pattern may be defined as a mathematical expression of the form P (c, s, N, m, P), c representing a content category, s representing a pattern start time point, and s ∈ {0, 1.., N-1}, N being a behavior occurrence period length, and N ∈ {1, 2.., N/2}, m representing a period repetition number and m ∈ {1, 2.., N/N }, P representing a pattern string length, and P ∈ { 1., N }. for example, for P (c, s, N, m), P represents a pattern string length, and P ∈ { 1.,. N }0(love, 20120206180000, 1, 3, 1), the P0Shows that the love type appears once every 1 day at 18:00 pm from 2012/2/6, and has been circulated for 3 times with a length of 1 day; for P1 (P)020120206180000, 2, 2, 5), the P1Denotes, starting from 2012/2/6, p every 2 days at 18:000Type appeared once, and had been cycled 2 times for a length of 5 days. With the above periodic rule, p can be selected0It is speculated that 18:00 pm at 1, 3, 5 pm every week has a higher probability of watching love-type programs.
Similar to the K-means algorithm, an output set size is defined in the form of a set, with the result set defined as { P }r(c, s, n, m, p) } (R0, 1.., R-1), where R represents data to be found from the history dataWith a representative number of modes.
The following compares the similarity to the original data: let pass through { Pr(c, s, n, m, p) } (R-0, 1.., R-1) reconstructs the viewing history datasetThe difference between the two sets of viewing history data, compared to the original viewing history data set, is the rate distortion, which is referred to herein as the distortion factor, and can be used to quantify the distortion factor by expression (1).
Wherein,
by expression (3), it can be easily derived: the viewing history data set generated from the viewing pattern and the data in the original viewing history data set are calculated in a way of not matching with each other for many times, however, the optimal situation is only calculated once, but the correctness of the algorithm is not influenced by the situation, the optimal most representative user viewing pattern set can still be found, and the distortion degree is minimized.
Due to the fact thatIs a set of viewing patterns by a user Pr(c, s, n, m, p) }, soIt may be empty or consist of multiple viewing mode data and R is the number of combinations for this mode.
The most important work of the present invention is to find out the most representative user viewing modes that meet the conditions, and the distortion rate of the viewing history data set reconstructed by these modes and the original viewing history data set is the minimum. Assuming that the selection is optimal from the beginning with the least distortion from the original data, the larger R, the more patterns are found and the less distortion from the original data set, i.e. D. The problem can thus be transformed into an optimization problem solution under the condition R, the result of which satisfies expression (4):
r is less than or equal to RThreshold(4)
Wherein R isThresholdIs a mode number limitation.
In the invention, for the problem represented by the formula (4), the lagrange multiplier method is adopted to process the mode limitation problem, and the whole problem can be solved by the shortest path algorithm in the graph theory. Suppose U is the set of all possible decision point vectors, and U iskVector set (u) representing the k-th decision pointk{ (c, s, n, m, p) k, o } (o represents the total number of modes)). Expression (5) is the lagrange function:
λ is the lagrange multiplier. If there is a lambda*So that u is*=arg[minuJλ(u)]The expression holds, and R (u) is satisfied*)=RThresholdThen u*It is the optimal solution representing expression (4). Thus, the solution of expression (4) can be equivalently converted into solution Jλ(u), and Jλ(u) can be solved by finding an appropriate lagrange multiplier.
Because the formation of each mode depends on the object in the time period near the object of the type. Without loss of generality, the present invention assumes whether a type object belongs to a known type schema, only in relation to the schema contained by the q type objects that precede it. In order to quantitatively solve the optimization problem, the invention defines a cost function Gk(uk-q,...,uk) It represents the minimum cost to the kth term, GN(uN-q,...,uN) Representing the minimum cost required to the last term. The optimization problem of expression (5) can thus be translated into a solution of expression (6):
the efficiency of the algorithm can be greatly improved by defining a dependency set, assuming a vector set u based on given q +1 decisionsk-q-1,...,uk-1Has calculated to uk-1Cost function G of termsk-1(uk-q-1,...,uk-1) Value of (d) to ukCost function value of term independent of u1,u2,...,uk-q-1And so on. The above description can be expressed as a mathematical expression of (7):
wherein r isk(uk-q,...,uk) Represents to ukThe number of new user viewing modes not included in uk-q,...,uk-1Number of user viewing modes, Rk(uk-q,...,uk) Representing the total number of viewing modes for K users. It can be observed from expression (7) that the user patterns can be superimposed, e.g. Pr(c, s, n, m, P) can be viewed as two user viewing patterns Pr(c, s, n, 1, P) and Pr(c, s + n, n, m-1, p) and the distortion factor with the original record set is a cumulative score of the distortion factors of the two sub-modes, but does not affect the value of R.
The value of the next optimization iteration calculation is independent of the previous process, and the characteristic determines that the dynamic programming can be used for processing the problem.
Preferably, the problem can also be solved by a shortest path algorithm of a directed acyclic graph in graph theory. The time complexity of the algorithm is O (N)*|U|q+1) (| U | is the base of U), it can be seen that its temporal complexity is exponential with q. In general, q is a small number (typically not exceeding 14, since short-term user behavior is more relevant to the current usage habits of the user for a real-time commercial system), so the efficiency of the algorithm is far higher than the exponential time complexity of the exhaustive method.
The data processing system and method for mining the television watching mode map the problem of mining the user watching mode hidden in the user watching history into the optimization problem of the usability distortion theory, and by giving the user watching history record, the invention can find out the user watching mode very efficiently, and ensure that the distortion rate of the data reconstructed by the user watching modes and the original data is minimum.
It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims (9)

1. A data processing system for mining television viewing patterns, comprising
The data collection module is used for collecting the watching history data of the television programs;
the data filtering module is used for filtering and identifying the watching history data of the user from the watching history data of the television programs collected by the data collecting module and filtering the noise data;
the data mining and feature classification module is used for mining and classifying data features of a content database storing user watching history and television program information, finding out the watching information of the user and forming a cluster set by the characteristic-classified user watching information;
the analysis module is used for analyzing the characteristics of the viewing history content of the user and the cluster set obtained from the data mining and characteristic classification module, selecting classification from the cluster set and obtaining a viewing history data set representing the history of the behavior content of the original user;
the mode mining module is used for mining a user viewing mode set from the viewing history data set acquired by the analysis module;
the mode centralized mode number is related to the classification number, the viewing history data set is classified by a time band division method, a complete time is divided into a plurality of time bands, the division of the time bands is determined by families, and the division of different family time bands is different;
wherein the pattern mining module comprises:
the user viewing mode analysis module is used for carrying out mathematical modeling on the viewing history data set, generating a viewing history data set of the user and carrying out reconstruction on the viewing history data set;
and the comparison module is used for comparing the viewing history data set with the viewing history data set reconstructed by the user viewing mode analysis module and finding out the optimal user viewing mode combination.
2. A data processing method for mining television viewing modes is characterized by comprising the following steps:
A. the data collection module collects the watching history data of the television programs in advance, filters and identifies the watching history data of the user from the watching history data through the data filtering module, and filters noise data;
B. the data mining and feature classification module is used for mining and classifying data features of a content database in which user watching history and television program information are stored, finding out the watching information of the user, and forming a cluster set by the characteristic-classified user watching information;
C. the analysis module selects classification from the user viewing history content characteristics and the cluster set obtained from the last step to obtain a viewing history data set representing the original user behavior content history;
D. the mode mining module is used for mining a user viewing mode set from the viewing history data set acquired by the analysis module; the mode centralized mode number is related to the classification number, the viewing history data set is classified by a time band division method, a complete time is divided into a plurality of time bands, the division of the time bands is determined by families, and the division of different family time bands is different; and the mode mining module reconstructs the viewing history data set, and determines and outputs an optimal user viewing mode set by comparing the viewing history data set with the reconstructed viewing history data set.
3. The data processing method for mining television viewing patterns according to claim 2, wherein the step D specifically comprises:
d1, defining the viewing history data set as follows:
specifying N as the length of a user-specific time period, { Vi(i ═ 0,1, …, N-1) is the ith time point viewing history data set;
defining a user viewing mode as P (c, s, N, m, P), wherein c represents a content category, s represents a user viewing mode start time point, and s belongs to {0,1, …, N-1 }; n is the length of the period of occurrence of the user's viewing behavior, and N is equal to {1,2, …, N/2 }; m represents the number of cycle repetitions, and m ∈ {1, …, N/N }; p represents the pattern string length, and p ∈ {1, …, n };
definition Pr(c, s, n, m, p) } (R0, 1, …, R-1) is a result set of user viewing patterns, and a viewing history data set is re-structured by the result set, with a result defined asWherein R represents the number of representative patterns found from the historical data;
d2, contrast viewing history dataset { ViView history of the } and the restructuringData setFinding the smallest differenceAnd pass throughAn optimal user viewing pattern is determined.
4. The method of claim 3, wherein the comparative viewing history data set { V } isiAnd re-structured viewing history data setThe method specifically comprises the following steps:
defining a viewing history data set ViAnd re-structured viewing history data setIs the degree of distortion, and defines an expressionQuantization measures the degree of distortion, wherein,
<mrow> <msub> <mi>D</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>V</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mover> <mi>V</mi> <mo>~</mo> </mover> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>r</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>R</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>d</mi> <mi>i</mi> <mi>f</mi> <mi>f</mi> <mrow> <mo>(</mo> <msub> <mi>V</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>P</mi> <mi>r</mi> </msub> <mo>(</mo> <mrow> <mi>c</mi> <mo>,</mo> <mi>s</mi> <mo>,</mo> <mi>n</mi> <mo>,</mo> <mi>m</mi> <mo>,</mo> <mi>p</mi> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
5. the method of claim 4, wherein the viewing history data set { V } is processed when the distortion is minimizediAnd re-structured viewing history data setWherein the distortion degree is minimized toR is less than or equal to RThreshold,RThresholdIs a mode number limitation.
6. The method as claimed in claim 4, wherein said step D is further implemented by using Lagrangian multiplier method to define Lagrangian function
Wherein λ is the Lagrange multiplier,
determining a lambda*Let u stand for*=arg[minuJλ(u)]And satisfy R (u)*)=RThreshold,u*To representR is less than or equal to RThresholdThe optimal solution of (1).
7. The method for processing data mining television viewing patterns according to claim 6, wherein said step D further comprises:
defining a decision point vector U, UkVector set (u) representing the k-th decision pointk{ (c, s, n, m, p) k, o } (o stands for the total number of modes), and a cost function G is definedk(uk-q,…,uk) Denotes the minimum cost of the k term, and GN(uN-q,…,uN) Representing the minimum cost required for the last item;
according to given q +1 decision vector sets uk-q-1,…,uk-1Calculate to uk-1Cost function G of termsk-1(uk-q-1,…,uk-1) A value of, and tokCost function value of term independent of u1,u2,…,uk-q-1Determining the optimal number of user viewing modes.
8. The method of claim 2, wherein the content database is subjected to data feature classification using a K-means algorithm.
9. The data processing method for mining TV viewing patterns according to claim 2, wherein the user viewing history content characteristics are analyzed by a principal component analysis method.
CN201210164390.7A 2012-05-23 2012-05-23 Excavate the data handling system and method for television viewing mode Active CN103425698B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210164390.7A CN103425698B (en) 2012-05-23 2012-05-23 Excavate the data handling system and method for television viewing mode

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210164390.7A CN103425698B (en) 2012-05-23 2012-05-23 Excavate the data handling system and method for television viewing mode

Publications (2)

Publication Number Publication Date
CN103425698A CN103425698A (en) 2013-12-04
CN103425698B true CN103425698B (en) 2017-10-24

Family

ID=49650454

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210164390.7A Active CN103425698B (en) 2012-05-23 2012-05-23 Excavate the data handling system and method for television viewing mode

Country Status (1)

Country Link
CN (1) CN103425698B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107071578B (en) * 2017-05-24 2019-11-22 中国科学技术大学 IPTV program commending method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101408944A (en) * 2008-11-17 2009-04-15 深圳市天威视讯股份有限公司 Method for extracting hidden user characteristics based on MDS algorithm
CN102184235A (en) * 2011-05-13 2011-09-14 广州星海传媒有限公司 Set top box-based digital television program recommending method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6801917B2 (en) * 2001-11-13 2004-10-05 Koninklijke Philips Electronics N.V. Method and apparatus for partitioning a plurality of items into groups of similar items in a recommender of such items

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101408944A (en) * 2008-11-17 2009-04-15 深圳市天威视讯股份有限公司 Method for extracting hidden user characteristics based on MDS algorithm
CN102184235A (en) * 2011-05-13 2011-09-14 广州星海传媒有限公司 Set top box-based digital television program recommending method and system

Also Published As

Publication number Publication date
CN103425698A (en) 2013-12-04

Similar Documents

Publication Publication Date Title
CN103377242B (en) User behavior analysis method, analyzing and predicting method and television program push system
CN107071578B (en) IPTV program commending method
JP5421469B2 (en) System for targeted television program delivery, preference engine, machine-readable medium, and method for determining television viewing habits
US9330162B2 (en) Method and apparatus for providing temporal context for recommending content for consumption by a user device
Costa-Montenegro et al. Which App? A recommender system of applications in markets: Implementation of the service for monitoring users’ interaction
US9235574B2 (en) Systems and methods for providing media recommendations
US8220023B2 (en) Method for content presentation
CN103229169B (en) Content providing and system
Kwon et al. Personalized smart TV program recommender based on collaborative filtering and a novel similarity method
US8510250B2 (en) System and method for multi-source semantic content exploration on a TV receiver set
US20070106656A1 (en) Apparatus and method for performing profile based collaborative filtering
US10740397B2 (en) User modelling and metadata of transmedia content data
Wang et al. User identification for enhancing IP-TV recommendation
CN110149556B (en) IPTV user behavior pattern mining method based on TDC-LDA model
CN103425698B (en) Excavate the data handling system and method for television viewing mode
De Pessemier et al. Extending the Bayesian classifier to a context-aware recommender system for mobile devices
EP2151799A1 (en) Recommander method and system, in particular for IPTV
Mukherjee et al. A context-aware recommendation system considering both user preferences and learned behavior
EP2828798B1 (en) Method and apparatus for controlling a system
Suri et al. The Role of Big Data in the Media andEntertainment Industry
Li et al. Room-based playlist arrangement system using group recommendation
CN107395418A (en) Statistical processing methods, system and the server of network behavior data
Menchón et al. Behavioural patterns discovery for lifestyle analysis from egocentric photo-streams
Lousame et al. Multicriteria predictors using aggregation functions based on item views
Ntalianis et al. Multiresolution organization of social media users’ profiles: Fast detection and efficient transmission of characteristic profiles

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant