CN106127594B - User room interest degree calculation method and system based on principal component analysis - Google Patents

User room interest degree calculation method and system based on principal component analysis Download PDF

Info

Publication number
CN106127594B
CN106127594B CN201610514089.2A CN201610514089A CN106127594B CN 106127594 B CN106127594 B CN 106127594B CN 201610514089 A CN201610514089 A CN 201610514089A CN 106127594 B CN106127594 B CN 106127594B
Authority
CN
China
Prior art keywords
principal component
index
matrix
calculating
contribution rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610514089.2A
Other languages
Chinese (zh)
Other versions
CN106127594A (en
Inventor
程晓歌
吴瑞诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Urumqi Bangbangjun Technology Co ltd
Original Assignee
Wuhan Douyu Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Douyu Network Technology Co Ltd filed Critical Wuhan Douyu Network Technology Co Ltd
Priority to CN201610514089.2A priority Critical patent/CN106127594B/en
Publication of CN106127594A publication Critical patent/CN106127594A/en
Application granted granted Critical
Publication of CN106127594B publication Critical patent/CN106127594B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Abstract

The invention discloses a method and a system for calculating the room interestingness of a user based on principal component analysis, wherein the method comprises the following steps: s1, constructing an interestingness calculation index system; s2, constructing an initialization matrix X, and carrying out standardization processing to obtain a standardization matrix Z; calculating a correlation coefficient matrix R of the standardized matrix Z and calculating a corresponding characteristic value lambda; arranging the characteristic values lambda from large to small, wherein each characteristic value corresponds to one principal component; s3, calculating the variance contribution rate and the accumulated variance contribution rate of each principal component; s4, b principal components are selected, and the feature vectors of the principal components are multiplied by the normalized data to obtain linear expressions of the principal components; s5, normalizing the comprehensive weight of the n main evaluation indexes to obtain the weight value of each main evaluation index, and performing weighted calculation to obtain the interestingness score of each user on different rooms. The method and the device are beneficial to accurately judging the preference of the user, accurately recommending rooms which may be interested to the user and increasing the user experience.

Description

User room interest degree calculation method and system based on principal component analysis
Technical Field
The invention relates to the field of live broadcast platform user data analysis, in particular to a method and a system for calculating user room interestingness based on principal component analysis.
Background
When a user uses a live platform, the user usually enters a large number of different rooms to watch live. When the same user is in different rooms, the behavior is different, and the watching time length, the watching frequency and whether the present information exists reflect the interest degree of the user in the room. In order to accurately judge the preference of the user, recommend a room which may be interested in the user, and increase the user experience, it is necessary to obtain the interest level of the user in the room. However, how to determine the interest level of the user in the room is not an effective calculation method.
Disclosure of Invention
The invention aims to overcome the defects of the background art and provides a method and a system for calculating the room interestingness of a user based on principal component analysis.
The invention provides a method for calculating the room interestingness of a user based on principal component analysis, which comprises the following steps:
s1, extracting a plurality of behavior indexes according to the behavior information of the user to the room, analyzing the behavior indexes, mutually replacing the behavior indexes with correlation coefficient absolute values larger than a correlation threshold value, screening n representative behavior indexes as main evaluation indexes, wherein n is a positive integer larger than or equal to 3, and constructing an interestingness calculation index system;
s2, acquiring room behavior information generated by a plurality of users within a certain time period, constructing an initialization matrix X, and carrying out standardization processing on the initialization matrix X to obtain a standardization matrix Z; calculating a correlation coefficient matrix R of the standardized matrix Z, and calculating a characteristic value lambda corresponding to the correlation coefficient matrix R; arranging the characteristic value lambda from large to small to obtain lambda1、λ2、……、λnThe corresponding feature vector is e1、e2、……、enThe feature vector e satisfies | e | ═ 1, each feature value corresponds to one principal component, and each feature vector contains n elements;
s3, calculating the variance contribution rate v, v of each principal componenta=λa/(λ12+……+λn) A is a positive integer, a is more than or equal to 1 and less than or equal to n, vaRepresents the variance contribution ratio, lambda, of the a-th principal componentaRepresenting the characteristic value corresponding to the a-th principal component; calculating the cumulative variance contribution rate of the principal component, wherein the cumulative variance contribution rate of the a-th principal component is the sum of the variance contribution rate of the 1 st principal component and the variance contribution rate of the (a-1) th principal component;
s4, extracting the number of characteristic values meeting the conditions as the number of the finally selected principal components according to the principle that the characteristic value is greater than 1 and the cumulative variance contribution rate is greater than a specified threshold; if the number of the characteristic values meeting the condition is b, b main components, lambda, are selected1,λ2,……,λbThe characteristic values respectively corresponding to the b main components and the characteristic vectors e respectively corresponding to the b main components1,e2,……,ebMultiplying the feature vectors of the b principal components by the normalized data to obtain a linear expression of the principal components;
s5, taking the variance contribution rate of the principal component as the weight, carrying out weighted average on the coefficients of the main evaluation indexes in each principal component linear expression, and calculating each main evaluation indexA target composite weight; normalizing the comprehensive weights of all n main evaluation indexes to obtain the weight value w ', w ' of each main evaluation index 'jAnd performing weighted calculation according to the obtained weight value to obtain the interestingness score of each user on different rooms.
Based on the above technical solution, in step S2, the process of normalizing the initialization matrix X is as follows:
initializing the elements in matrix X includes Xij、xj,xijData case, x, representing the jth behavior index of the ith userjRepresenting the data condition of the jth behavior index, wherein i and j are positive integers, and j is more than or equal to 1 and less than or equal to n; carrying out standardization processing on the initialization matrix X by adopting a maximum function max and a minimum function min, wherein the concrete formula is as follows: z is a radical ofij=(xij-min(xj))/(max(xj)-min(xj) Wherein z isijRepresenting one element of the normalized matrix Z.
On the basis of the above technical solution, in step S4, the linear expression of the principal component is as follows:
Yc=ec1*Z1+ec2*Z2+……+ecn*Zn
wherein c is a positive integer, c is more than or equal to 1 and less than or equal to b, and YcDenotes the c-th principal component, ec1Representing the 1 st element in the c-th feature vector for the coefficient of the 1 st index in the c-th principal component linear expression, ecnRepresenting the nth element of the c-th feature vector for the coefficient of the nth index in the c-th principal component linear expression, Z1Denotes the value of the 1 st index after normalization, ZnIndicates the normalized value of the nth index.
Based on the above technical solution, in step S5, the formula for calculating the comprehensive weight is as follows:
wj=v1*e1j/(v1+v2……+vb)+v2*e2j/(v1+v2……+vb)+……+vb*ebj/(v1+v2……+vb),
wherein, wjThe comprehensive weight, v, representing the jth primary evaluation index1Represents the variance contribution rate, v, of the 1 st principal componentbRepresents the variance contribution rate of the b-th principal component.
Based on the above technical solution, in step S5, the formula for calculating the interestingness score is as follows:
S=w′1*Z1+w′2*Z2+……+w′n*Znwherein S represents an interestingness score, w1Weight value, Z, representing the 1 st index1Denotes the value of the 1 st index after normalization, wnWeight value, Z, representing the nth indexnIndicates the normalized value of the nth index.
The invention also provides a system for calculating the room interestingness of the user based on the principal component analysis, which comprises: a system building unit, a principal component analysis unit, a weight analysis unit, wherein,
the system building unit is used for: extracting a plurality of behavior indexes according to the behavior information of a user on a room, analyzing the behavior indexes, mutually replacing the behavior indexes of which the absolute values of correlation coefficients are larger than a correlation threshold value, screening n representative behavior indexes as main evaluation indexes, wherein n is a positive integer larger than or equal to 3, and constructing an interestingness calculation index system;
the principal component analysis unit is configured to: acquiring room behavior information generated by a plurality of users within a certain time period, constructing an initialization matrix X, and carrying out standardization processing on the initialization matrix X to obtain a standardization matrix Z; calculating a correlation coefficient matrix R of the standardized matrix Z, and calculating a characteristic value lambda corresponding to the correlation coefficient matrix R; arranging the characteristic value lambda from large to small to obtain lambda1、λ2、……、λnThe corresponding feature vector is e1、e2、……、enThe eigenvector e satisfies1, each feature value corresponds to a principal component, and each feature vector comprises n elements;
calculating the variance contribution rate v, v of each principal componenta=λa/(λ12+……+λn) A is a positive integer, a is more than or equal to 1 and less than or equal to n, vaRepresents the variance contribution ratio, lambda, of the a-th principal componentaRepresenting the characteristic value corresponding to the a-th principal component; calculating the cumulative variance contribution rate of the principal component, wherein the cumulative variance contribution rate of the a-th principal component is the sum of the variance contribution rate of the 1 st principal component and the variance contribution rate of the (a-1) th principal component;
extracting the number of characteristic values meeting the conditions as the number of finally selected principal components according to the principle that the characteristic value is greater than 1 and the cumulative variance contribution rate is greater than a specified threshold; if the number of the characteristic values meeting the condition is b, b main components, lambda, are selected1,λ2,……,λbThe characteristic values respectively corresponding to the b main components and the characteristic vectors e respectively corresponding to the b main components1,e2,……,ebMultiplying the feature vectors of the b principal components by the normalized data to obtain a linear expression of the principal components;
the weight analysis unit is configured to: taking the variance contribution rate of the principal component as the weight, carrying out weighted average on the coefficients of the main evaluation indexes in each principal component linear expression, and calculating the comprehensive weight of each main evaluation index; normalizing the comprehensive weights of all n main evaluation indexes to obtain the weight value w ', w ' of each main evaluation index 'jAnd performing weighted calculation according to the obtained weight value to obtain the interestingness score of each user on different rooms.
On the basis of the above technical solution, the process of normalizing the initialization matrix X is as follows:
initializing the elements in matrix X includes Xij、xj,xijData case, x, representing the jth behavior index of the ith userjThe data case representing the jth behavior index,i. j is a positive integer, j is more than or equal to 1 and less than or equal to n; carrying out standardization processing on the initialization matrix X by adopting a maximum function max and a minimum function min, wherein the concrete formula is as follows: z is a radical ofij=(xij-min(xj))/(max(xj)-min(xj) Wherein z isijRepresenting one element of the normalized matrix Z.
On the basis of the technical scheme, the linear expression of the principal component is as follows:
Yc=ec1*Z1+ec2*Z2+……+ecn*Zn
wherein c is a positive integer, c is more than or equal to 1 and less than or equal to b, and YcDenotes the c-th principal component, ec1Representing the 1 st element in the c-th feature vector for the coefficient of the 1 st index in the c-th principal component linear expression, ecnRepresenting the nth element of the c-th feature vector for the coefficient of the nth index in the c-th principal component linear expression, Z1Denotes the value of the 1 st index after normalization, ZnIndicates the normalized value of the nth index.
On the basis of the technical scheme, the formula for calculating the comprehensive weight is as follows:
wj=v1*e1j/(v1+v2……+vb)+v2*e2j/(v1+v2……+vb)+……+vb*ebj/(v1+v2……+vb),
wherein, wjThe comprehensive weight, v, representing the jth primary evaluation index1Represents the variance contribution rate, v, of the 1 st principal componentbRepresents the variance contribution rate of the b-th principal component.
On the basis of the technical scheme, the interestingness score is calculated according to the following formula:
S=w′1*Z1+w′2*Z2+……+w′n*Znwherein S represents an interestingness score, w1Weight value, Z, representing the 1 st index1Is shown asValue, w, of 1 index after normalizationnWeight value, Z, representing the nth indexnIndicates the normalized value of the nth index.
Compared with the prior art, the invention has the following advantages:
according to the method, different behaviors of the user on the room are analyzed, an index evaluation system is constructed, the weight of an evaluation index is determined, the interest degree of the user on the room is quantitatively measured, and the method is beneficial to accurately judging the preference of the user; by using the interestingness score, the interest ranking of the user on the observed rooms can be obtained, the rooms which are possibly interested in can be accurately recommended to the user, and the user experience is improved.
Drawings
Fig. 1 is a flowchart of a method for calculating a user room interest level in an embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and the embodiments.
Referring to fig. 1, an embodiment of the present invention provides a method for calculating a user room interestingness based on principal component analysis, including the following steps:
s1, extracting a plurality of behavior indexes according to the behavior information of the user to the room, analyzing the behavior indexes, mutually replacing the behavior indexes with correlation coefficient absolute values larger than a correlation threshold value, screening n representative behavior indexes as main evaluation indexes, and constructing an interestingness calculation index system; in practical application, the correlation threshold is generally greater than 0.7, preferably 0.8, and n is generally a positive integer greater than or equal to 3;
the specific process of step S1 is as follows:
constructing a first-level index according to the behavior information of the user on the room, wherein the first-level index comprises a presenting behavior, a watching behavior, a paying behavior and a bullet screen behavior; subdividing the first-level indexes, and refining the behavior information indexes of the user to the room to obtain initial evaluation indexes, for example, under the first-level index of 'watching behavior', the two initial evaluation indexes including effective watching duration and effective watching days; carrying out correlation analysis on the initial evaluation indexes, carrying out mutual substitution on high correlation indexes with correlation coefficient absolute values larger than 0.8, screening n representative behavior indexes as main evaluation indexes, and determining an interestingness calculation index system;
in this embodiment, the above analysis yields n ═ 6, which indicates that there are 6 main evaluation indexes, and the 6 main evaluation indexes are: effective watching duration, effective watching days, number of bullet screens sent, number of free gifts, number of paid gifts and whether to pay attention.
S2, room behavior information generated by a plurality of users in a period of a week is obtained, an initialization matrix X is constructed, the initialization matrix X is subjected to standardization processing to obtain a standardization matrix Z, and the standardization processing of the initialization matrix X is carried out in the following process:
initializing the elements in matrix X includes Xij、xj,xijData case, x, representing the jth behavior index of the ith userjRepresenting the data condition of the jth behavior index, wherein i and j are positive integers, and j is more than or equal to 1 and less than or equal to n; carrying out standardization processing on the initialization matrix X by adopting a maximum function max and a minimum function min, wherein the concrete formula is as follows: z is a radical ofij=(xij-min(xj))/(max(xj)-min(xj) Wherein z isijRepresents one element of the normalized matrix Z;
calculating a correlation coefficient matrix R of the standardized matrix Z, and calculating a characteristic value lambda corresponding to the correlation coefficient matrix R; arranging the characteristic value lambda from large to small to obtain lambda1、λ2、……、λ6The corresponding feature vector is e1、e2、……、e6The feature vector e satisfies | e | ═ 1, each feature value corresponds to a principal component, and each feature vector contains 6 elements, for example, the vector e1Comprising e12、e12、e13、……、e16
S3, calculating the variance contribution rate v, v of each principal componenta=λa/(λ12+……+λ6) A is a positive integer, a is more than or equal to 1 and less than or equal to n, vaRepresents the variance contribution ratio, lambda, of the a-th principal componentaRepresenting the characteristic value corresponding to the a-th principal component; calculating the cumulative variance contribution rate of the principal component, wherein the cumulative variance contribution rate of the a-th principal component is the sum of the variance contribution rate of the 1 st principal component and the variance contribution rate of the (a-1) th principal component; for example, the cumulative variance contribution of the second principal component is: the variance contribution rate of the first principal component + the variance contribution rate of the second principal component;
s4, extracting the number of characteristic values meeting the conditions as the number of the finally selected principal components according to the principle that the characteristic value is greater than 1 and the cumulative variance contribution rate is greater than a specified threshold; in practical application, the threshold value is specified to be 3, and the invention finally extracts 3 main components, lambda, according to the rule1,λ2,λ3The feature values corresponding to the 3 principal components respectively and the feature vectors e corresponding to the 3 principal components respectively1,e2,e3The feature vector of 3 principal components is multiplied by the normalized data to obtain a linear expression of the principal components, such as:
Y1=e11*Z1+e12*Z2+……+e16*Z6
Y2=e21*Z1+e22*Z2+……+e26*Z6
Y3=e31*Z1+e32*Z2+……+e36*Z6
wherein, Y1Denotes the 1 st principal component, e11Representing the feature vector e for the coefficient of the 1 st index in the 1 st principal component linear expression11 st element of (1), e16Representing the feature vector e for the coefficient of the 6 th index in the 1 st principal component linear expression1The 6 th element in (1), Z1The normalized value of the 1 st index is shown, and so on.
S5, defining the variance contribution rate as the weight of different principal components, wherein the greater the variance contribution rate of the principal component, the greater the importance of the principal component; taking the variance contribution rate of the principal component as the weight, carrying out weighted average on the coefficients of the main evaluation indexes in each principal component linear expression, and calculating the comprehensive weight of each main evaluation index; the calculation formula is as follows:
w1=v1*e11/(v1+v2+v3)+v2*e21/(v1+v2+v3)+v3*e31/(v1+v2+v3);
w2=v2*e12/(v1+v2+v3)+v2*e22/(v1+v2+v3)+v3*e32/(v1+v2+v3);
……;
w6=v1*e16/(v1+v2+v3)+v2*e26/(v1+v2+v3)+v3*e36/(v1+v2+v3);
w1the comprehensive weight of the 1 st main evaluation index is represented, and by analogy, the comprehensive weight corresponding to each main evaluation index can be obtained;
normalizing the comprehensive weights of all 6 main evaluation indexes to obtain the weight value w ', w ' of each main evaluation index 'jWeighting calculation is carried out according to the obtained weight value to obtain the interestingness score of each user for different rooms, wherein the calculation formula of the interestingness score is as follows:
S=w′1*Z1+w′2*Z2+……+w′6*Z6wherein S represents an interestingness score.
The embodiment of the invention also provides a system for calculating the room interest degree of a user based on principal component analysis, which comprises: a system building unit, a principal component analysis unit, a weight analysis unit, wherein,
the system building unit is used for: extracting a plurality of behavior indexes according to the behavior information of a user to a room, analyzing the behavior indexes, mutually replacing the behavior indexes of which the absolute values of correlation coefficients are larger than a correlation threshold value, screening n representative behavior indexes as main evaluation indexes, and constructing an interestingness calculation index system; in practical application, the correlation threshold is generally greater than 0.7, preferably 0.8, and n is generally a positive integer greater than or equal to 3;
the specific process is as follows:
constructing a first-level index according to the behavior information of the user on the room, wherein the first-level index comprises a presenting behavior, a watching behavior, a paying behavior and a bullet screen behavior; subdividing the first-level indexes, and refining the behavior information indexes of the user to the room to obtain initial evaluation indexes, for example, under the first-level index of 'watching behavior', the two initial evaluation indexes including effective watching duration and effective watching days; carrying out correlation analysis on the initial evaluation indexes, carrying out mutual substitution on high correlation indexes with correlation coefficient absolute values larger than 0.8, screening n representative behavior indexes as main evaluation indexes, and determining an interestingness calculation index system; in this embodiment, the above analysis yields n ═ 6, which indicates that there are 6 main evaluation indexes, and the 6 main evaluation indexes are: effective watching duration, effective watching days, number of bullet screens sent, number of free gifts, number of paid gifts and whether to pay attention.
A principal component analysis unit for: acquiring room behavior information generated by a plurality of users in a period of a week, constructing an initialization matrix X, and carrying out standardization processing on the initialization matrix X to obtain a standardization matrix Z, wherein the standardization processing process on the initialization matrix X is as follows:
initializing the elements in matrix X includes Xij、xj,xijData case, x, representing the jth behavior index of the ith userjRepresenting the data condition of the jth behavior index, wherein i and j are positive integers, and j is more than or equal to 1 and less than or equal to n; carrying out standardization processing on the initialization matrix X by adopting a maximum function max and a minimum function min, wherein the concrete formula is as follows: z is a radical ofij=(xij-min(xj))/(max(xj)-min(xj) Wherein z isijRepresents one of the normalized matrices ZAn element;
calculating a correlation coefficient matrix R of the standardized matrix Z, and calculating a characteristic value lambda corresponding to the correlation coefficient matrix R; arranging the characteristic value lambda from large to small to obtain lambda1、λ2、……、λ6The corresponding feature vector is e1、e2、……、e6The feature vector e satisfies | e | ═ 1, each feature value corresponds to a principal component, and each feature vector contains 6 elements, for example, the vector e1Comprising e12、e12、e13、……、e16
Calculating the variance contribution rate v, v of each principal componenta=λa/(λ12+……+λ6) A is a positive integer, a is more than or equal to 1 and less than or equal to n, vaRepresents the variance contribution ratio, lambda, of the a-th principal componentaRepresenting the characteristic value corresponding to the a-th principal component; calculating the cumulative variance contribution rate of the principal component, wherein the cumulative variance contribution rate of the a-th principal component is the sum of the variance contribution rate of the 1 st principal component and the variance contribution rate of the (a-1) th principal component; for example, the cumulative variance contribution of the second principal component is: the variance contribution rate of the first principal component + the variance contribution rate of the second principal component;
extracting the number of characteristic values meeting the conditions as the number of finally selected principal components according to the principle that the characteristic value is greater than 1 and the cumulative variance contribution rate is greater than a specified threshold; in practical application, the threshold value is specified to be 3, and the invention finally extracts 3 main components, lambda, according to the rule1,λ2,λ3The feature values corresponding to the 3 principal components respectively and the feature vectors e corresponding to the 3 principal components respectively1,e2,e3The feature vector of 3 principal components is multiplied by the normalized data to obtain a linear expression of the principal components, such as:
Y1=e11*Z1+e12*Z2+……+e16*Z6
Y2=e21*Z1+e22*Z2+……+e26*Z6
Y3=e31*Z1+e32*Z2+……+e36*Z6
wherein, Y1Denotes the 1 st principal component, e11Representing the feature vector e for the coefficient of the 1 st index in the 1 st principal component linear expression11 st element of (1), e16Representing the feature vector e for the coefficient of the 6 th index in the 1 st principal component linear expression1The 6 th element in (1), Z1The normalized value of the 1 st index is shown, and so on.
The weight analysis unit is used for: taking the variance contribution rate of the principal component as the weight, carrying out weighted average on the coefficients of the main evaluation indexes in each principal component linear expression, and calculating the comprehensive weight of each main evaluation index; the calculation formula is as follows:
w1=v1*e11/(v1+v2+v3)+v2*e21/(v1+v2+v3)+v3*e31/(v1+v2+v3);
w2=v2*e12/(v1+v2+v3)+v2*e22/(v1+v2+v3)+v3*e32/(v1+v2+v3);
……;
w6=v1*e16/(v1+v2+v3)+v2*e26/(v1+v2+v3)+v3*e36/(v1+v2+v3);
w1the comprehensive weight of the 1 st main evaluation index is represented, and by analogy, the comprehensive weight corresponding to each main evaluation index can be obtained;
normalizing the comprehensive weights of all 6 main evaluation indexes to obtain the weight value w ', w ' of each main evaluation index 'jIntegrated weight/all principal of jth primary evaluation indexAnd performing weighted calculation according to the obtained weight value to obtain the interestingness score of each user for different rooms, wherein the calculation formula of the interestingness score is as follows:
S=w′1*Z1+w′2*Z2+……+w′6*Z6wherein S represents an interestingness score.
Various modifications and variations of the embodiments of the present invention may be made by those skilled in the art, and they are also within the scope of the present invention, provided they are within the scope of the claims of the present invention and their equivalents.
What is not described in detail in the specification is prior art that is well known to those skilled in the art.

Claims (8)

1. A user room interestingness calculation method based on principal component analysis is characterized by comprising the following steps:
s1, extracting a plurality of behavior indexes according to the behavior information of the user to the room, analyzing the behavior indexes, mutually replacing the behavior indexes with correlation coefficient absolute values larger than a correlation threshold value, screening 6 representative behavior indexes as main evaluation indexes, and constructing an interestingness calculation index system, wherein the 6 main evaluation indexes are respectively as follows: effective watching duration, effective watching days, number of transmitted barrages, number of free gifts, number of paid gifts and whether to pay attention;
s2, acquiring room behavior information generated by a plurality of users within a certain time period, constructing an initialization matrix X, and carrying out standardization processing on the initialization matrix X to obtain a standardization matrix Z; calculating a correlation coefficient matrix R of the standardized matrix Z, and calculating a characteristic value lambda corresponding to the correlation coefficient matrix R; arranging the characteristic value lambda from large to small to obtain lambda1、λ2、……、λnThe corresponding feature vector is e1、e2、……、enThe feature vector e satisfies | e | ═ 1, each feature value corresponds to one principal component, and each feature vector contains n elements;
s3, calculatingVariance contribution rate v, v of each principal componenta=λa/(λ12+……+λn) A is a positive integer, a is more than or equal to 1 and less than or equal to n, vaRepresents the variance contribution ratio, lambda, of the a-th principal componentaRepresenting the characteristic value corresponding to the a-th principal component; calculating the cumulative variance contribution rate of the principal component, wherein the cumulative variance contribution rate of the a-th principal component is the sum of the variance contribution rate of the 1 st principal component and the variance contribution rate of the (a-1) th principal component;
s4, extracting the number of characteristic values meeting the conditions as the number of the finally selected principal components according to the principle that the characteristic value is greater than 1 and the cumulative variance contribution rate is greater than a specified threshold; if the number of the characteristic values meeting the condition is b, b main components, lambda, are selected1,λ2,……,λbThe characteristic values respectively corresponding to the b main components and the characteristic vectors e respectively corresponding to the b main components1,e2,……,ebMultiplying the feature vectors of the b principal components by the normalized data to obtain a linear expression of the principal components;
s5, taking the variance contribution rate of the principal component as the weight, carrying out weighted average on the coefficients of the main evaluation indexes in each principal component linear expression, and calculating the comprehensive weight of each main evaluation index; normalizing the comprehensive weights of all n main evaluation indexes to obtain the weight value w ', w ' of each main evaluation index 'jIntegrated weight w for jth main evaluation indexjThe sum of the comprehensive weights of all the main evaluation indexes, j is a positive integer, and j is more than or equal to 1 and less than or equal to n; and performing weighted calculation according to the obtained weight values to obtain the interestingness score S of each user on different rooms: s ═ w'1*Z1+w′2*Z2+……+w′n*ZnWherein Z is1Denotes the value of the 1 st index after normalization, ZnAnd (4) representing the value of the n index after the normalization processing, and recommending interested rooms to the user according to the interestingness score.
2. The method for calculating room interest of a user according to claim 1, wherein in step S2, the initialization matrix X is normalized as follows:
initializing the elements in matrix X includes Xij、xj,xijData case, x, representing the jth behavior index of the ith userjRepresenting the data condition of the jth behavior index, wherein i is a positive integer; carrying out standardization processing on the initialization matrix X by adopting a maximum function max and a minimum function min, wherein the concrete formula is as follows: z is a radical ofij=(xij-min(xj))/(max(xj)-min(xj) Wherein z isijRepresenting one element of the normalized matrix Z.
3. The method for calculating user room interest level based on principal component analysis of claim 1, wherein in step S4, the linear expression of the principal component is as follows:
Yc=ec1*Z1+ec2*Z2+……+ecn*Zn
wherein c is a positive integer, c is more than or equal to 1 and less than or equal to b, and YcDenotes the c-th principal component, ec1Representing the 1 st element in the c-th feature vector for the coefficient of the 1 st index in the c-th principal component linear expression, ecnRepresenting the nth element of the c-th feature vector for the coefficient of the nth index in the c-th principal component linear expression, Z1Denotes the value of the 1 st index after normalization, ZnIndicates the normalized value of the nth index.
4. The method for calculating user room interest level based on principal component analysis of claim 1, wherein in step S5, the formula for calculating the comprehensive weight is as follows:
wj=v1*e1j/(v1+v2……+vb)+v2*e2j/(v1+v2……+vb)+……+vb*ebj/(v1+v2……+vb),
wherein v is1Represents the variance contribution rate, v, of the 1 st principal componentbRepresents the variance contribution rate of the b-th principal component.
5. A user room interest level calculation system based on principal component analysis, the system comprising: a system building unit, a principal component analysis unit, a weight analysis unit, wherein,
the system building unit is used for: extracting a plurality of behavior indexes according to the behavior information of a user on a room, analyzing the behavior indexes, mutually replacing the behavior indexes of which the absolute values of correlation coefficients are larger than a correlation threshold value, screening 6 representative behavior indexes as main evaluation indexes, and constructing an interestingness calculation index system, wherein the 6 main evaluation indexes are respectively as follows: effective watching duration, effective watching days, number of transmitted barrages, number of free gifts, number of paid gifts and whether to pay attention;
the principal component analysis unit is configured to: acquiring room behavior information generated by a plurality of users within a certain time period, constructing an initialization matrix X, and carrying out standardization processing on the initialization matrix X to obtain a standardization matrix Z; calculating a correlation coefficient matrix R of the standardized matrix Z, and calculating a characteristic value lambda corresponding to the correlation coefficient matrix R; arranging the characteristic value lambda from large to small to obtain lambda1、λ2、……、λnThe corresponding feature vector is e1、e2、……、enThe feature vector e satisfies | e | ═ 1, each feature value corresponds to one principal component, and each feature vector contains n elements;
calculating the variance contribution rate v, v of each principal componenta=λa/(λ12+……+λn) A is a positive integer, a is more than or equal to 1 and less than or equal to n, vaRepresents the variance contribution ratio, lambda, of the a-th principal componentaRepresenting the characteristic value corresponding to the a-th principal component; calculating the cumulative variance contribution rate of the principal component, wherein the cumulative variance contribution rate of the a-th principal component is the sum of the variance contribution rate of the 1 st principal component and the variance contribution rate of the (a-1) th principal component;
according to the characteristic value being greater than 1,extracting the number of characteristic values meeting the conditions as the number of finally selected principal components on the basis that the accumulated variance contribution rate is greater than a specified threshold value; if the number of the characteristic values meeting the condition is b, b main components, lambda, are selected1,λ2,……,λbThe characteristic values respectively corresponding to the b main components and the characteristic vectors e respectively corresponding to the b main components1,e2,……,ebMultiplying the feature vectors of the b principal components by the normalized data to obtain a linear expression of the principal components;
the weight analysis unit is configured to: taking the variance contribution rate of the principal component as the weight, carrying out weighted average on the coefficients of the main evaluation indexes in each principal component linear expression, and calculating the comprehensive weight of each main evaluation index; normalizing the comprehensive weights of all n main evaluation indexes to obtain the weight value w ', w ' of each main evaluation index 'jIntegrated weight w for jth main evaluation indexjThe sum of the comprehensive weights of all the main evaluation indexes, j is a positive integer, and j is more than or equal to 1 and less than or equal to n; and performing weighted calculation according to the obtained weight values to obtain the interestingness score S of each user on different rooms: s ═ w'1*Z1+w′2*Z2+……+w′n*ZnWherein Z is1Denotes the value of the 1 st index after normalization, ZnAnd (4) representing the value of the n index after the normalization processing, and recommending interested rooms to the user according to the interestingness score.
6. The principal component analysis-based room interestingness computing system for users according to claim 5, wherein the initialization matrix X is normalized as follows:
initializing the elements in matrix X includes Xij、xj,xijData case, x, representing the jth behavior index of the ith userjRepresenting the data condition of the jth behavior index, wherein i is a positive integer; carrying out standardization processing on the initialization matrix X by adopting a maximum function max and a minimum function min, wherein the concrete formula is as follows: z is a radical ofij=(xij-min(xj))/(max(xj)-min(xj) Wherein z isijRepresenting one element of the normalized matrix Z.
7. The principal component analysis-based room interestingness computing system of a user according to claim 5, wherein the linear expression of the principal component is as follows:
Yc=ec1*Z1+ec2*Z2+……+ecn*Zn
wherein c is a positive integer, c is more than or equal to 1 and less than or equal to b, and YcDenotes the c-th principal component, ec1Representing the 1 st element in the c-th feature vector for the coefficient of the 1 st index in the c-th principal component linear expression, ecnRepresenting the nth element of the c-th feature vector for the coefficient of the nth index in the c-th principal component linear expression, Z1Denotes the value of the 1 st index after normalization, ZnIndicates the normalized value of the nth index.
8. The principal component analysis-based room interestingness calculation system for a user according to claim 5, wherein the formula for calculating the composite weight is as follows:
wj=v1*e1j/(v1+v2……+vb)+v2*e2j/(v1+v2……+vb)+……+vb*ebj/(v1+v2……+vb),
wherein v is1Represents the variance contribution rate, v, of the 1 st principal componentbRepresents the variance contribution rate of the b-th principal component.
CN201610514089.2A 2016-06-30 2016-06-30 User room interest degree calculation method and system based on principal component analysis Active CN106127594B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610514089.2A CN106127594B (en) 2016-06-30 2016-06-30 User room interest degree calculation method and system based on principal component analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610514089.2A CN106127594B (en) 2016-06-30 2016-06-30 User room interest degree calculation method and system based on principal component analysis

Publications (2)

Publication Number Publication Date
CN106127594A CN106127594A (en) 2016-11-16
CN106127594B true CN106127594B (en) 2021-09-07

Family

ID=57468842

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610514089.2A Active CN106127594B (en) 2016-06-30 2016-06-30 User room interest degree calculation method and system based on principal component analysis

Country Status (1)

Country Link
CN (1) CN106127594B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108259938B (en) * 2016-12-28 2020-10-16 武汉斗鱼网络科技有限公司 Method and device for evaluating interest degree of user in live broadcast room
TW201837814A (en) * 2017-03-22 2018-10-16 國立臺灣師範大學 Method and system for forecasting product sales on model-free prediction basis
CN106933211B (en) * 2017-04-18 2019-04-09 中南大学 A kind of method and apparatus that identification industrial process dynamic adjusts section
CN108681820B (en) * 2018-05-21 2021-07-30 成都信息工程大学 Analysis method for increasing influence of information security mechanism on system performance
CN109165974A (en) * 2018-08-06 2019-01-08 深圳乐信软件技术有限公司 A kind of commercial product recommending model training method, device, equipment and storage medium
CN110598949B (en) * 2019-09-20 2021-10-15 腾讯科技(深圳)有限公司 User interest degree analysis method and device, electronic equipment and storage medium
CN112862279A (en) * 2021-01-26 2021-05-28 上海应用技术大学 Method for evaluating pavement condition of expressway lane
CN116894165B (en) * 2023-09-11 2023-12-08 阳谷新太平洋电缆有限公司 Cable aging state assessment method based on data analysis

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101127875A (en) * 2007-09-13 2008-02-20 深圳市融合视讯科技有限公司 An audience interaction method for broadcasting video stream media program
CN101136816A (en) * 2006-08-31 2008-03-05 腾讯科技(深圳)有限公司 Method and system for performing declaration to program contents in network living broadcast

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101136816A (en) * 2006-08-31 2008-03-05 腾讯科技(深圳)有限公司 Method and system for performing declaration to program contents in network living broadcast
CN101127875A (en) * 2007-09-13 2008-02-20 深圳市融合视讯科技有限公司 An audience interaction method for broadcasting video stream media program

Also Published As

Publication number Publication date
CN106127594A (en) 2016-11-16

Similar Documents

Publication Publication Date Title
CN106127594B (en) User room interest degree calculation method and system based on principal component analysis
KAbAKLARLI et al. High-technology exports and economic growth: panel data analysis for selected OECD countriesHigh-technology exports and economic growth: panel data analysis for selected OECD countries
Ghadiyaram et al. Massive online crowdsourced study of subjective and objective picture quality
Molnar et al. Pitfalls to avoid when interpreting machine learning models
US20230336637A1 (en) Method and apparatus for moderating abnormal users, electronic device, and storage medium
CN111242310B (en) Feature validity evaluation method and device, electronic equipment and storage medium
JP6547070B2 (en) Method, device and computer storage medium for push information coarse selection sorting
US10956716B2 (en) Method for building a computer-implemented tool for assessment of qualitative features from face images
CN107862551B (en) Method and device for predicting network application promotion effect and terminal equipment
Neelamegham et al. Modeling and forecasting the sales of technology products
CN113221104A (en) User abnormal behavior detection method and user behavior reconstruction model training method
CN112040254A (en) Risk control method and device, storage medium and computer equipment
CN107885754B (en) Method and device for extracting credit variable from transaction data based on LDA model
CN112486784A (en) Method, apparatus and medium for diagnosing and optimizing data analysis system
Bailey et al. Bioassessment of stream ecosystems enduring a decade of simulated degradation: lessons for the real world
Zhu et al. Gaussian mixture model based prediction method of movie rating
CN111881007B (en) Operation behavior judgment method, device, equipment and computer readable storage medium
CN107644042B (en) Software program click rate pre-estimation sorting method and server
CN110502639B (en) Information recommendation method and device based on problem contribution degree and computer equipment
Hartung et al. Are ordinal rating scales better than percent ratings? a statistical and “psychological” view
CN110634006B (en) Advertisement click rate prediction method, device, equipment and readable storage medium
CN115860856A (en) Data processing method and device, electronic equipment and storage medium
Corani et al. Structural risk minimization: a robust method for density‐dependence detection and model selection
CN113205363A (en) Service index monitoring method and device based on big data
CN115221663A (en) Data processing method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240302

Address after: 830000, Room 17A, Building 17, Block A, Times Square Community, No. 59 Guangming Road, Tianshan District, Urumqi, Xinjiang Uygur Autonomous Region BD00244

Patentee after: Urumqi Bangbangjun Technology Co.,Ltd.

Country or region after: China

Address before: 430000 East Lake Development Zone, Wuhan City, Hubei Province, No. 1 Software Park East Road 4.1 Phase B1 Building 11 Building

Patentee before: WUHAN DOUYU NETWORK TECHNOLOGY Co.,Ltd.

Country or region before: China