Disclosure of Invention
The invention aims to overcome the defects of the background art and provides a method and a system for calculating the room interestingness of a user based on principal component analysis.
The invention provides a method for calculating the room interestingness of a user based on principal component analysis, which comprises the following steps:
s1, extracting a plurality of behavior indexes according to the behavior information of the user to the room, analyzing the behavior indexes, mutually replacing the behavior indexes with correlation coefficient absolute values larger than a correlation threshold value, screening n representative behavior indexes as main evaluation indexes, wherein n is a positive integer larger than or equal to 3, and constructing an interestingness calculation index system;
s2, acquiring room behavior information generated by a plurality of users within a certain time period, constructing an initialization matrix X, and carrying out standardization processing on the initialization matrix X to obtain a standardization matrix Z; calculating a correlation coefficient matrix R of the standardized matrix Z, and calculating a characteristic value lambda corresponding to the correlation coefficient matrix R; arranging the characteristic value lambda from large to small to obtain lambda1、λ2、……、λnThe corresponding feature vector is e1、e2、……、enThe feature vector e satisfies | e | ═ 1, each feature value corresponds to one principal component, and each feature vector contains n elements;
s3, calculating the variance contribution rate v, v of each principal componenta=λa/(λ1+λ2+……+λn) A is a positive integer, a is more than or equal to 1 and less than or equal to n, vaRepresents the variance contribution ratio, lambda, of the a-th principal componentaRepresenting the characteristic value corresponding to the a-th principal component; calculating the cumulative variance contribution rate of the principal component, wherein the cumulative variance contribution rate of the a-th principal component is the sum of the variance contribution rate of the 1 st principal component and the variance contribution rate of the (a-1) th principal component;
s4, extracting the number of characteristic values meeting the conditions as the number of the finally selected principal components according to the principle that the characteristic value is greater than 1 and the cumulative variance contribution rate is greater than a specified threshold; if the number of the characteristic values meeting the condition is b, b main components, lambda, are selected1,λ2,……,λbThe characteristic values respectively corresponding to the b main components and the characteristic vectors e respectively corresponding to the b main components1,e2,……,ebMultiplying the feature vectors of the b principal components by the normalized data to obtain a linear expression of the principal components;
s5, taking the variance contribution rate of the principal component as the weight, carrying out weighted average on the coefficients of the main evaluation indexes in each principal component linear expression, and calculating each main evaluation indexA target composite weight; normalizing the comprehensive weights of all n main evaluation indexes to obtain the weight value w ', w ' of each main evaluation index 'jAnd performing weighted calculation according to the obtained weight value to obtain the interestingness score of each user on different rooms.
Based on the above technical solution, in step S2, the process of normalizing the initialization matrix X is as follows:
initializing the elements in matrix X includes Xij、xj,xijData case, x, representing the jth behavior index of the ith userjRepresenting the data condition of the jth behavior index, wherein i and j are positive integers, and j is more than or equal to 1 and less than or equal to n; carrying out standardization processing on the initialization matrix X by adopting a maximum function max and a minimum function min, wherein the concrete formula is as follows: z is a radical ofij=(xij-min(xj))/(max(xj)-min(xj) Wherein z isijRepresenting one element of the normalized matrix Z.
On the basis of the above technical solution, in step S4, the linear expression of the principal component is as follows:
Yc=ec1*Z1+ec2*Z2+……+ecn*Zn,
wherein c is a positive integer, c is more than or equal to 1 and less than or equal to b, and YcDenotes the c-th principal component, ec1Representing the 1 st element in the c-th feature vector for the coefficient of the 1 st index in the c-th principal component linear expression, ecnRepresenting the nth element of the c-th feature vector for the coefficient of the nth index in the c-th principal component linear expression, Z1Denotes the value of the 1 st index after normalization, ZnIndicates the normalized value of the nth index.
Based on the above technical solution, in step S5, the formula for calculating the comprehensive weight is as follows:
wj=v1*e1j/(v1+v2……+vb)+v2*e2j/(v1+v2……+vb)+……+vb*ebj/(v1+v2……+vb),
wherein, wjThe comprehensive weight, v, representing the jth primary evaluation index1Represents the variance contribution rate, v, of the 1 st principal componentbRepresents the variance contribution rate of the b-th principal component.
Based on the above technical solution, in step S5, the formula for calculating the interestingness score is as follows:
S=w′1*Z1+w′2*Z2+……+w′n*Znwherein S represents an interestingness score, w1Weight value, Z, representing the 1 st index1Denotes the value of the 1 st index after normalization, wnWeight value, Z, representing the nth indexnIndicates the normalized value of the nth index.
The invention also provides a system for calculating the room interestingness of the user based on the principal component analysis, which comprises: a system building unit, a principal component analysis unit, a weight analysis unit, wherein,
the system building unit is used for: extracting a plurality of behavior indexes according to the behavior information of a user on a room, analyzing the behavior indexes, mutually replacing the behavior indexes of which the absolute values of correlation coefficients are larger than a correlation threshold value, screening n representative behavior indexes as main evaluation indexes, wherein n is a positive integer larger than or equal to 3, and constructing an interestingness calculation index system;
the principal component analysis unit is configured to: acquiring room behavior information generated by a plurality of users within a certain time period, constructing an initialization matrix X, and carrying out standardization processing on the initialization matrix X to obtain a standardization matrix Z; calculating a correlation coefficient matrix R of the standardized matrix Z, and calculating a characteristic value lambda corresponding to the correlation coefficient matrix R; arranging the characteristic value lambda from large to small to obtain lambda1、λ2、……、λnThe corresponding feature vector is e1、e2、……、enThe eigenvector e satisfies1, each feature value corresponds to a principal component, and each feature vector comprises n elements;
calculating the variance contribution rate v, v of each principal componenta=λa/(λ1+λ2+……+λn) A is a positive integer, a is more than or equal to 1 and less than or equal to n, vaRepresents the variance contribution ratio, lambda, of the a-th principal componentaRepresenting the characteristic value corresponding to the a-th principal component; calculating the cumulative variance contribution rate of the principal component, wherein the cumulative variance contribution rate of the a-th principal component is the sum of the variance contribution rate of the 1 st principal component and the variance contribution rate of the (a-1) th principal component;
extracting the number of characteristic values meeting the conditions as the number of finally selected principal components according to the principle that the characteristic value is greater than 1 and the cumulative variance contribution rate is greater than a specified threshold; if the number of the characteristic values meeting the condition is b, b main components, lambda, are selected1,λ2,……,λbThe characteristic values respectively corresponding to the b main components and the characteristic vectors e respectively corresponding to the b main components1,e2,……,ebMultiplying the feature vectors of the b principal components by the normalized data to obtain a linear expression of the principal components;
the weight analysis unit is configured to: taking the variance contribution rate of the principal component as the weight, carrying out weighted average on the coefficients of the main evaluation indexes in each principal component linear expression, and calculating the comprehensive weight of each main evaluation index; normalizing the comprehensive weights of all n main evaluation indexes to obtain the weight value w ', w ' of each main evaluation index 'jAnd performing weighted calculation according to the obtained weight value to obtain the interestingness score of each user on different rooms.
On the basis of the above technical solution, the process of normalizing the initialization matrix X is as follows:
initializing the elements in matrix X includes Xij、xj,xijData case, x, representing the jth behavior index of the ith userjThe data case representing the jth behavior index,i. j is a positive integer, j is more than or equal to 1 and less than or equal to n; carrying out standardization processing on the initialization matrix X by adopting a maximum function max and a minimum function min, wherein the concrete formula is as follows: z is a radical ofij=(xij-min(xj))/(max(xj)-min(xj) Wherein z isijRepresenting one element of the normalized matrix Z.
On the basis of the technical scheme, the linear expression of the principal component is as follows:
Yc=ec1*Z1+ec2*Z2+……+ecn*Zn,
wherein c is a positive integer, c is more than or equal to 1 and less than or equal to b, and YcDenotes the c-th principal component, ec1Representing the 1 st element in the c-th feature vector for the coefficient of the 1 st index in the c-th principal component linear expression, ecnRepresenting the nth element of the c-th feature vector for the coefficient of the nth index in the c-th principal component linear expression, Z1Denotes the value of the 1 st index after normalization, ZnIndicates the normalized value of the nth index.
On the basis of the technical scheme, the formula for calculating the comprehensive weight is as follows:
wj=v1*e1j/(v1+v2……+vb)+v2*e2j/(v1+v2……+vb)+……+vb*ebj/(v1+v2……+vb),
wherein, wjThe comprehensive weight, v, representing the jth primary evaluation index1Represents the variance contribution rate, v, of the 1 st principal componentbRepresents the variance contribution rate of the b-th principal component.
On the basis of the technical scheme, the interestingness score is calculated according to the following formula:
S=w′1*Z1+w′2*Z2+……+w′n*Znwherein S represents an interestingness score, w1Weight value, Z, representing the 1 st index1Is shown asValue, w, of 1 index after normalizationnWeight value, Z, representing the nth indexnIndicates the normalized value of the nth index.
Compared with the prior art, the invention has the following advantages:
according to the method, different behaviors of the user on the room are analyzed, an index evaluation system is constructed, the weight of an evaluation index is determined, the interest degree of the user on the room is quantitatively measured, and the method is beneficial to accurately judging the preference of the user; by using the interestingness score, the interest ranking of the user on the observed rooms can be obtained, the rooms which are possibly interested in can be accurately recommended to the user, and the user experience is improved.
Detailed Description
The invention is described in further detail below with reference to the figures and the embodiments.
Referring to fig. 1, an embodiment of the present invention provides a method for calculating a user room interestingness based on principal component analysis, including the following steps:
s1, extracting a plurality of behavior indexes according to the behavior information of the user to the room, analyzing the behavior indexes, mutually replacing the behavior indexes with correlation coefficient absolute values larger than a correlation threshold value, screening n representative behavior indexes as main evaluation indexes, and constructing an interestingness calculation index system; in practical application, the correlation threshold is generally greater than 0.7, preferably 0.8, and n is generally a positive integer greater than or equal to 3;
the specific process of step S1 is as follows:
constructing a first-level index according to the behavior information of the user on the room, wherein the first-level index comprises a presenting behavior, a watching behavior, a paying behavior and a bullet screen behavior; subdividing the first-level indexes, and refining the behavior information indexes of the user to the room to obtain initial evaluation indexes, for example, under the first-level index of 'watching behavior', the two initial evaluation indexes including effective watching duration and effective watching days; carrying out correlation analysis on the initial evaluation indexes, carrying out mutual substitution on high correlation indexes with correlation coefficient absolute values larger than 0.8, screening n representative behavior indexes as main evaluation indexes, and determining an interestingness calculation index system;
in this embodiment, the above analysis yields n ═ 6, which indicates that there are 6 main evaluation indexes, and the 6 main evaluation indexes are: effective watching duration, effective watching days, number of bullet screens sent, number of free gifts, number of paid gifts and whether to pay attention.
S2, room behavior information generated by a plurality of users in a period of a week is obtained, an initialization matrix X is constructed, the initialization matrix X is subjected to standardization processing to obtain a standardization matrix Z, and the standardization processing of the initialization matrix X is carried out in the following process:
initializing the elements in matrix X includes Xij、xj,xijData case, x, representing the jth behavior index of the ith userjRepresenting the data condition of the jth behavior index, wherein i and j are positive integers, and j is more than or equal to 1 and less than or equal to n; carrying out standardization processing on the initialization matrix X by adopting a maximum function max and a minimum function min, wherein the concrete formula is as follows: z is a radical ofij=(xij-min(xj))/(max(xj)-min(xj) Wherein z isijRepresents one element of the normalized matrix Z;
calculating a correlation coefficient matrix R of the standardized matrix Z, and calculating a characteristic value lambda corresponding to the correlation coefficient matrix R; arranging the characteristic value lambda from large to small to obtain lambda1、λ2、……、λ6The corresponding feature vector is e1、e2、……、e6The feature vector e satisfies | e | ═ 1, each feature value corresponds to a principal component, and each feature vector contains 6 elements, for example, the vector e1Comprising e12、e12、e13、……、e16;
S3, calculating the variance contribution rate v, v of each principal componenta=λa/(λ1+λ2+……+λ6) A is a positive integer, a is more than or equal to 1 and less than or equal to n, vaRepresents the variance contribution ratio, lambda, of the a-th principal componentaRepresenting the characteristic value corresponding to the a-th principal component; calculating the cumulative variance contribution rate of the principal component, wherein the cumulative variance contribution rate of the a-th principal component is the sum of the variance contribution rate of the 1 st principal component and the variance contribution rate of the (a-1) th principal component; for example, the cumulative variance contribution of the second principal component is: the variance contribution rate of the first principal component + the variance contribution rate of the second principal component;
s4, extracting the number of characteristic values meeting the conditions as the number of the finally selected principal components according to the principle that the characteristic value is greater than 1 and the cumulative variance contribution rate is greater than a specified threshold; in practical application, the threshold value is specified to be 3, and the invention finally extracts 3 main components, lambda, according to the rule1,λ2,λ3The feature values corresponding to the 3 principal components respectively and the feature vectors e corresponding to the 3 principal components respectively1,e2,e3The feature vector of 3 principal components is multiplied by the normalized data to obtain a linear expression of the principal components, such as:
Y1=e11*Z1+e12*Z2+……+e16*Z6;
Y2=e21*Z1+e22*Z2+……+e26*Z6;
Y3=e31*Z1+e32*Z2+……+e36*Z6;
wherein, Y1Denotes the 1 st principal component, e11Representing the feature vector e for the coefficient of the 1 st index in the 1 st principal component linear expression11 st element of (1), e16Representing the feature vector e for the coefficient of the 6 th index in the 1 st principal component linear expression1The 6 th element in (1), Z1The normalized value of the 1 st index is shown, and so on.
S5, defining the variance contribution rate as the weight of different principal components, wherein the greater the variance contribution rate of the principal component, the greater the importance of the principal component; taking the variance contribution rate of the principal component as the weight, carrying out weighted average on the coefficients of the main evaluation indexes in each principal component linear expression, and calculating the comprehensive weight of each main evaluation index; the calculation formula is as follows:
w1=v1*e11/(v1+v2+v3)+v2*e21/(v1+v2+v3)+v3*e31/(v1+v2+v3);
w2=v2*e12/(v1+v2+v3)+v2*e22/(v1+v2+v3)+v3*e32/(v1+v2+v3);
……;
w6=v1*e16/(v1+v2+v3)+v2*e26/(v1+v2+v3)+v3*e36/(v1+v2+v3);
w1the comprehensive weight of the 1 st main evaluation index is represented, and by analogy, the comprehensive weight corresponding to each main evaluation index can be obtained;
normalizing the comprehensive weights of all 6 main evaluation indexes to obtain the weight value w ', w ' of each main evaluation index 'jWeighting calculation is carried out according to the obtained weight value to obtain the interestingness score of each user for different rooms, wherein the calculation formula of the interestingness score is as follows:
S=w′1*Z1+w′2*Z2+……+w′6*Z6wherein S represents an interestingness score.
The embodiment of the invention also provides a system for calculating the room interest degree of a user based on principal component analysis, which comprises: a system building unit, a principal component analysis unit, a weight analysis unit, wherein,
the system building unit is used for: extracting a plurality of behavior indexes according to the behavior information of a user to a room, analyzing the behavior indexes, mutually replacing the behavior indexes of which the absolute values of correlation coefficients are larger than a correlation threshold value, screening n representative behavior indexes as main evaluation indexes, and constructing an interestingness calculation index system; in practical application, the correlation threshold is generally greater than 0.7, preferably 0.8, and n is generally a positive integer greater than or equal to 3;
the specific process is as follows:
constructing a first-level index according to the behavior information of the user on the room, wherein the first-level index comprises a presenting behavior, a watching behavior, a paying behavior and a bullet screen behavior; subdividing the first-level indexes, and refining the behavior information indexes of the user to the room to obtain initial evaluation indexes, for example, under the first-level index of 'watching behavior', the two initial evaluation indexes including effective watching duration and effective watching days; carrying out correlation analysis on the initial evaluation indexes, carrying out mutual substitution on high correlation indexes with correlation coefficient absolute values larger than 0.8, screening n representative behavior indexes as main evaluation indexes, and determining an interestingness calculation index system; in this embodiment, the above analysis yields n ═ 6, which indicates that there are 6 main evaluation indexes, and the 6 main evaluation indexes are: effective watching duration, effective watching days, number of bullet screens sent, number of free gifts, number of paid gifts and whether to pay attention.
A principal component analysis unit for: acquiring room behavior information generated by a plurality of users in a period of a week, constructing an initialization matrix X, and carrying out standardization processing on the initialization matrix X to obtain a standardization matrix Z, wherein the standardization processing process on the initialization matrix X is as follows:
initializing the elements in matrix X includes Xij、xj,xijData case, x, representing the jth behavior index of the ith userjRepresenting the data condition of the jth behavior index, wherein i and j are positive integers, and j is more than or equal to 1 and less than or equal to n; carrying out standardization processing on the initialization matrix X by adopting a maximum function max and a minimum function min, wherein the concrete formula is as follows: z is a radical ofij=(xij-min(xj))/(max(xj)-min(xj) Wherein z isijRepresents one of the normalized matrices ZAn element;
calculating a correlation coefficient matrix R of the standardized matrix Z, and calculating a characteristic value lambda corresponding to the correlation coefficient matrix R; arranging the characteristic value lambda from large to small to obtain lambda1、λ2、……、λ6The corresponding feature vector is e1、e2、……、e6The feature vector e satisfies | e | ═ 1, each feature value corresponds to a principal component, and each feature vector contains 6 elements, for example, the vector e1Comprising e12、e12、e13、……、e16;
Calculating the variance contribution rate v, v of each principal componenta=λa/(λ1+λ2+……+λ6) A is a positive integer, a is more than or equal to 1 and less than or equal to n, vaRepresents the variance contribution ratio, lambda, of the a-th principal componentaRepresenting the characteristic value corresponding to the a-th principal component; calculating the cumulative variance contribution rate of the principal component, wherein the cumulative variance contribution rate of the a-th principal component is the sum of the variance contribution rate of the 1 st principal component and the variance contribution rate of the (a-1) th principal component; for example, the cumulative variance contribution of the second principal component is: the variance contribution rate of the first principal component + the variance contribution rate of the second principal component;
extracting the number of characteristic values meeting the conditions as the number of finally selected principal components according to the principle that the characteristic value is greater than 1 and the cumulative variance contribution rate is greater than a specified threshold; in practical application, the threshold value is specified to be 3, and the invention finally extracts 3 main components, lambda, according to the rule1,λ2,λ3The feature values corresponding to the 3 principal components respectively and the feature vectors e corresponding to the 3 principal components respectively1,e2,e3The feature vector of 3 principal components is multiplied by the normalized data to obtain a linear expression of the principal components, such as:
Y1=e11*Z1+e12*Z2+……+e16*Z6;
Y2=e21*Z1+e22*Z2+……+e26*Z6;
Y3=e31*Z1+e32*Z2+……+e36*Z6;
wherein, Y1Denotes the 1 st principal component, e11Representing the feature vector e for the coefficient of the 1 st index in the 1 st principal component linear expression11 st element of (1), e16Representing the feature vector e for the coefficient of the 6 th index in the 1 st principal component linear expression1The 6 th element in (1), Z1The normalized value of the 1 st index is shown, and so on.
The weight analysis unit is used for: taking the variance contribution rate of the principal component as the weight, carrying out weighted average on the coefficients of the main evaluation indexes in each principal component linear expression, and calculating the comprehensive weight of each main evaluation index; the calculation formula is as follows:
w1=v1*e11/(v1+v2+v3)+v2*e21/(v1+v2+v3)+v3*e31/(v1+v2+v3);
w2=v2*e12/(v1+v2+v3)+v2*e22/(v1+v2+v3)+v3*e32/(v1+v2+v3);
……;
w6=v1*e16/(v1+v2+v3)+v2*e26/(v1+v2+v3)+v3*e36/(v1+v2+v3);
w1the comprehensive weight of the 1 st main evaluation index is represented, and by analogy, the comprehensive weight corresponding to each main evaluation index can be obtained;
normalizing the comprehensive weights of all 6 main evaluation indexes to obtain the weight value w ', w ' of each main evaluation index 'jIntegrated weight/all principal of jth primary evaluation indexAnd performing weighted calculation according to the obtained weight value to obtain the interestingness score of each user for different rooms, wherein the calculation formula of the interestingness score is as follows:
S=w′1*Z1+w′2*Z2+……+w′6*Z6wherein S represents an interestingness score.
Various modifications and variations of the embodiments of the present invention may be made by those skilled in the art, and they are also within the scope of the present invention, provided they are within the scope of the claims of the present invention and their equivalents.
What is not described in detail in the specification is prior art that is well known to those skilled in the art.