Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a user management system and a method based on a full life cycle, which can quickly and effectively acquire user characteristic information in complicated user information by collecting behavior information of a user for advertisements and marking key information of the user, and create a user characteristic database to output the characteristic information of the user, calculate the activity value of the user with the characteristic information for different types of advertisements based on the user characteristic information, and realize user management for different life cycles according to the user characteristic information and the user activity value.
The aim of the invention can be achieved by the following technical scheme:
a user management method based on a full life cycle comprises the following steps:
s101: collecting user information, wherein the user information is information related to a user after the user browses, clicks or downloads advertisements;
s102: marking user key information, extracting a training sample set of the user key information, and splitting the training sample set of the user key information into a sample subset according to the actual marking condition of the user key information;
s103: preprocessing the user key information by using a k-means algorithm;
s104: according to the user key information, an original user characteristic database is created, a weight matrix of the original data network is designed by building the original data network, and a final user characteristic database is obtained after processing;
s105: building a learning model by using the user characteristic database and using a class II classification method to infer a set of possible user characteristic information;
s106: collecting user behavior information, wherein the user behavior information comprises browsing time, clicking times and downloading times of advertisements by a user;
s107: and marking the user behavior information, and acquiring a user activity value according to the user behavior information.
Further, the specific implementation method of S102 is as follows:
for any user information a k E, A is the extracted training sample set of the user key information, and the training sample set of the user key information is split into positive classes P of the user key information according to the actual marking condition of the user key information k Negative class N of user key information k Two mutually disjoint subsets of user key information samples are embodied as:
P k ={x i |(x i ,Y i )∈T,α k ∈Y i },
wherein x is i For the key information sample of the user to be learned, Y i To predict user key information sets, P k Representing the result of possession of said a k Sample-formed subset of user key information, N k Is made up of all the materials not described in a k And a subset formed by the samples marked by the user key information, wherein T is the sample set of the user key information.
Further, the specific implementation method of S103 is as follows:
respectively using k-means algorithm in subset P k Subset N k Preprocessing user key information tissue analysis clustering:
wherein m is k For the number of clusters to be clustered,represents an upward integer, |·| is used to find the number of samples in the set, γ is the control parameter for the number of clusters, and γε [0,1 ]]。
Further, the specific implementation method of step S104 is as follows:
s201: creating an original user feature database from user key information
Wherein A is k In order to train the sample set,for positive sample, ++>Is a negative sample, d is used to represent the interval length between samples, i is the sample count, x i For the ith training sample characteristic value, k is the cluster count, m is the number of samples in the cluster, m k Sample number for kth cluster, +.>For the attribute feature of the kth cluster corresponding to the ith training sample feature value, ++>Is the center feature of the 1 st sample in the kth cluster in positive samples,/>Is the mth in the kth cluster in the positive sample k Center feature of individual samples, +.>Is the center feature of the 1 st sample in the kth cluster in the negative samples,/>Is the mth in the kth cluster in the negative sample k The central characteristics of the samples are that X is a sample containing the user key information in a sample set, and Y is a sample not containing the user key information in the sample set;
s202: preprocessing the user key information training sample set A in a centralized way k Obtaining a basic user characteristic database C k =Tr k ∪Ts k ;
S203: at the Tr k P features are arbitrarily selected to create a P-dimensional random user feature subspace H t ;
S204: in the subspace H t Building user characteristic proximity data network for foundationAdjacent data network between user characteristic classes>And subscriber feature non-adjacent data network->Respectively designing corresponding weight matrixes:
wherein,weight matrix for user feature proximity data network, < ->Weight matrix for non-adjacent data network for user feature, < ->Weight matrix for adjacent data network between user characteristic classes>For reaction sample->To->Is used for the distance of euclidean distance,representative sample->Average to all other sample spacings, +.>Representative sample->Average to all other sample spacings, +.>For representing samples->The distance value to the same kind of sample which is furthest from the distance value, C is the adjacent relation negative constraint matrix;
s205: returning to step S203, the cycle is repeated T times in this way;
s206: creating a fusion relation user characteristic data network, and calculating weight matrixes corresponding to the fusion relation user characteristic data networks:
wherein,respectively representing the specific weight values of the weight matrix;
s207: respectively constructing a spreading matrix Q m 、Q rsb 、Q rsn And Q rsf :
Wherein S is m Representing a symmetric matrix; d (D) m Represented by S m The derived diagonal matrix, M, is the original positive constraint set, (x i x j ) For sample x i And sample x j A constituent sample array, (x) j x i ) For sample x j And sample x i The sample array is composed of a conversion matrix w T For the transpose matrix of the transform matrix, X is the sample matrix of the sample set containing the user key information, X T Is the transpose matrix of the sample matrix containing the user key information in the sample set, L m For the tag value of the user key information,for sample x i And sample x j A symmetrical matrix is formed;
wherein rsb is an inter-class fusion and dispersion method,conjugate matrix corresponding to weight matrix representing fusion spread among classes, < ->Representation inter-class fusionConjugate matrix corresponding to symmetric matrix in scattered weight matrix, L rsb A laplace matrix interspersed for inter-class fusion;
wherein rsn is a neighbor fusion and dispersion method, S rsn Fusing the scattered weight matrix for neighbors, D rsn Fusing diagonal matrix of scattered weight matrix for neighbor, L rsn Fusing the scattered Laplace matrix for neighbors;
wherein rsf is a non-adjacent fusion spread pattern, S rsf Weight matrix spread for non-adjacent fusion, D rsf Diagonal matrix of non-adjacent fusion spread weight matrix, L rsf A laplace matrix interspersed for non-contiguous fusion;
s208: given the values of parameters β and α, a target transfer function is created:
wherein alpha and beta are constant parameters, w is a target transformation vector, argmax w To get the parameter function, L m A tag value for the user key information;
s209: the dimension d is preset, the target conversion function is converted into a solution of a target transformation vector, and a target mapping matrix W is generated according to the dimension and the target transformation vector k :
X(L rsb +αL rsf )X T w=λX(L m +aL rsn )X T w, wherein λ is an adaptation parameter;
s210: from the following componentsAfter processing, a final user feature database is obtained, trk is a basic feature space containing user key information, myk is a generic feature space, and mapping rho is obtained according to the user feature database K (a 1 ),Wherein a is 1 For user information ρ K And the category corresponding to the user information.
Further, the centering pre-processing involves vector subtracting the mean vector:n is the total number of vectors involved.
Further, the implementation steps of S105 are as follows:
s301: by mapping ρ K (a 1 ) The user characteristic database is created to correspond to the class II classification training set Ts k ;
S302: for the Ts k User information a is obtained by building a class II classification method k Corresponding learning model f k : myk-R, wherein Myk is the generic feature space, f k For model mapping, R is a label set of user characteristic information;
s303: a set of possible user characteristic information is extrapolated via a learning model,
wherein Y' is a user characteristic information prediction set, k is a sample count of user key information, q is a total number of samples of the user key information, and a k Is a sample of user key information, t is user characteristic information, ts k For class II class training set, ρ k And the category corresponding to the user characteristic information.
Further, the user key information needs to repeat steps S101-S105 to predict the user feature information.
Further, the step S107 is specifically implemented as:
s401: in the user behavior information, the clicking times are marked as C p The number of downloads is marked as D p ;
S402: the time for displaying the advertisement in the advertisement position is T c The user finishes browsing the advertisement with the time T d ;
S403: the activity value of the user for the advertisement is:
E n =(T d -T c )×v1+Cp×v2+Dp×v3,
wherein v1, v2, v3 are coefficient factors.
The utility model provides a user management system based on full life cycle, its characterized in that includes associated information collection module, user information processing module, data preprocessing module, user behavior processing module, characteristic prediction module, behavior information collection module, information output module, its characterized in that includes:
the associated information acquisition module is used for acquiring user information, wherein the user information is information related to a user after the user browses, clicks or downloads advertisements;
the user information processing module is used for marking user key information, extracting the user key information training sample set and splitting the user key information training sample set into sample subsets according to the actual marking condition of the user key information;
the data preprocessing module is used for preprocessing the user key information by using a k-means algorithm;
the user behavior processing module is used for creating an original user characteristic database, designing a weight matrix of the original data network by building the original data network, and obtaining a final user characteristic database after processing, and specifically comprises the following steps:
creating an original user feature database from user key information
Wherein A is k In order to train the sample set,for positive sample, ++>Is a negative sample, d is used to represent the interval length between samples, i is the sample count, x i For the ith training sample characteristic value, k is the cluster count, m is the number of samples in the cluster, m k Sample number for kth cluster, +.>For the attribute feature of the kth cluster corresponding to the ith training sample feature value,is the center feature of the 1 st sample in the kth cluster in positive samples,/>Is the mth in the kth cluster in the positive sample k Center feature of individual samples, +.>Is the center feature of the 1 st sample in the kth cluster in the negative samples,is the mth in the kth cluster in the negative sample k The central characteristics of the samples are that X is a sample containing the user key information in a sample set, and Y is a sample not containing the user key information in the sample set;
training sample set A for preprocessing user key information in centralized mode k Obtaining a basic user characteristic database C k =Tr k ∪Ts k ;
At the Tr k P features are arbitrarily selected to create a P-dimensional random user feature subspace H t ;
In the subspace H t Building user characteristic proximity data network for foundationUser feature inter-class proximity data networkAnd subscriber feature non-adjacent data network->Respectively designing corresponding weight matrixes:
wherein,weight matrix for user feature proximity data network, < ->Weight matrix for non-adjacent data network for user feature, < ->Weight matrix for adjacent data network between user characteristic classes>For reaction sample->To->Is used for the distance of euclidean distance,representative sample->Average to all other sample spacings, +.>Representative sample->Average to all other sample spacings, +.>For representing samples->The distance value to the same kind of sample which is furthest from the distance value, C is the adjacent relation negative constraint matrix;
re-at the Tr k P features are arbitrarily selected to create a P-dimensional random user feature subspace H t Repeating the cycle for T times;
creating a fusion relation user characteristic data network, and calculating weight matrixes corresponding to the fusion relation user characteristic data networks:
wherein,respectively representing the specific weight values of the weight matrix;
respectively constructing a spreading matrix Q m 、Q rsb 、Q rsn And Q rsf :
Wherein S is m Representing a symmetric matrix; d (D) m Represented by S m The derived diagonal matrix, M, is the original positive constraint set, (x i ,x j ) For sample x i And sample x j A constituent sample array, (x) j ,x i ) For sample x j And sample x i The sample array is composed of a conversion matrix w T For the transpose matrix of the transform matrix, X is the sample matrix of the sample set containing the user key information, X T Is the transpose matrix of the sample matrix containing the user key information in the sample set, L m For the tag value of the user key information,for sample x i And sample x j A symmetrical matrix is formed;
wherein rsb is an inter-class fusion and dispersion method,conjugate matrix corresponding to weight matrix representing fusion spread among classes, < ->Representing conjugate matrices corresponding to symmetric matrices in weight matrices scattered by inter-class fusion, L rsb A laplace matrix interspersed for inter-class fusion;
wherein rsn is a neighbor fusion and dispersion method, S rsn Fusing the scattered weight matrix for neighbors, D rsn Fusing diagonal matrix of scattered weight matrix for neighbor, L rsn Fusing the scattered Laplace matrix for neighbors;
wherein rsf is a non-adjacent fusion spread pattern, S rsf Weight matrix spread for non-adjacent fusion, D rsf Diagonal matrix of non-adjacent fusion spread weight matrix, L rsf A laplace matrix interspersed for non-contiguous fusion;
given the values of parameters β and α, a target transfer function is created:
wherein alpha and beta are constant parameters, w is a target transformation vector, argmax w To get the parameter function, L m A tag value for the user key information;
a dimension d is preset, the target conversion function is converted into a solution of a target conversion vector, and the target conversion vector is changed according to the dimension d and the target conversion vectorVector conversion generation target mapping matrix W k :
X(u rsb +αL rsf )X T w=λX(L m +αL rsn )X T w wherein λ is an adaptation parameter;
from the following componentsAfter processing, obtaining final user characteristic database Tr k My is the basic feature space containing user key information k Obtaining mapping rho for the generic feature space according to the user feature database K (a 1 ),Wherein a is 1 For user information ρ K The category corresponding to the user information;
the feature prediction module is used for constructing a learning model by using the user feature database and a class II classification method and predicting a set of possible user feature information;
the behavior information acquisition module is used for acquiring user behavior information, wherein the user behavior information comprises browsing time, clicking times and downloading times of advertisements by a user;
the information output module is used for marking the user behavior information and acquiring the user activity value according to the user behavior information.
The beneficial effects of the invention are as follows:
(1) The user group characteristic information which is interested in a certain advertisement can be acquired in a targeted manner by using a multi-characteristic algorithm of a user after the user behavior information is acquired through the setting information acquisition module, the algorithm utilizes a self-adaptive adjacent data network to embody the adjacent relation of the user characteristic information in a generated user characteristic database, and the corresponding adjacent matrix is corrected by combining the pair constraint relation;
(2) The user characteristic information is obtained through processing by the user information processing module, meanwhile, a data network is built in a set form aiming at the user characteristic information of different categories, and specific weight values of the corresponding data network are calculated according to the data network building weight matrix, so that a three-dimensional user characteristic database is constructed, the active values of mobile application users can be quickly obtained through the user characteristic database, and powerful data support is provided for formulating advertisement marketing strategies;
(3) The user activity value is obtained by setting the user activity processing module to process the user activity information, the activity value algorithm is combined with overall calculation of all possible behaviors of the user aiming at advertisements, and the obtained user activity value can directly reflect the life cycle of the user, so that the advertisement marketing strategy can be formulated in a targeted manner.
Detailed Description
In order to further describe the technical means and effects adopted by the invention for achieving the preset aim, the following detailed description is given below of the specific implementation, structure, characteristics and effects according to the invention with reference to the attached drawings and the preferred embodiment.
In order to facilitate a better understanding of the present disclosure, a unified description of multi-feature data definitions contained in user information is required by those skilled in the art: definition X is used to represent a feature database corresponding to user key information, any key information in the user key information set has d-dimensional feature attribute representation, which can be expressed as x=r d The method comprises the steps of carrying out a first treatment on the surface of the Let a denote a set of candidate user key information, which consists of k different feature information, which can be shown as a= { a 1 ,a 2 ,...,a k I a e 0, 1. In the environment of multi-user characteristic information, any training key information set T= { (x) is given i ,Y i ) I=1,..p }, containing p user key information, with (x i ,Y i ) Representing a certain one of the collection TSpecific user key information, x i E X is used for representing the feature vector with the dimension d corresponding to the key information of the user,is a set of the critical information corresponding to the sample.
Referring to fig. 1, a user management system based on a full life cycle includes an information acquisition module, a user information processing module, a user behavior processing module, a central processing unit, a data storage module and an information output module.
The information acquisition module comprises a user information acquisition unit and a user behavior acquisition unit, wherein the user information acquisition unit is used for acquiring user information, and the user information comprises, but is not limited to, information related to users such as age, interests, work, life, hobbies and the like; the user behavior acquisition unit is used for acquiring browsing, clicking and downloading behaviors generated by a user on the put advertisements.
The user information processing module is used for marking user key information in the user information acquired by the information acquisition module, extracting the user key information training sample set, and splitting the user key information training sample set into sample subsets according to the actual marking condition of the user key information, wherein the specific operation steps are as follows:
for any user information a k E, A is the extracted training sample set of the user key information, and the training sample set of the user key information is split into positive classes P of the user key information according to the actual marking condition of the user key information k Negative class N of user key information k Two mutually disjoint subsets of user key information samples are embodied as:
P k ={x i (x i ,Y i )∈T,a k ∈Y i },
wherein x is i Key for user to be learnedInformation sample, Y i To predict user key information sets, P k Representing the result of possession of said a k Sample-formed subset of user key information, N k Is made up of all the materials not described in a k And a subset formed by the samples marked by the user key information, wherein T is the sample set of the user key information.
The user information processing module is also used for preprocessing user key information by using a k-means algorithm, and the specific operation steps are as follows:
respectively using k-means algorithm in subset P k Subset N k Preprocessing user key information tissue analysis clustering:
wherein m is k For the number of clusters to be clustered,represents an upward integer, |·| is used to find the number of samples in the set, γ is the control parameter for the number of clusters, and γε [0,1 ]]。
The user information processing module is also used for creating an original user characteristic database, designing a weight matrix of the original data network by building the original data network, and obtaining a final user characteristic database after processing, and the specific operation steps are as follows:
s201: creating an original user feature database from user key information
Wherein A is k In order to train the sample set,for positive sample, ++>Is a negative sample, d is used to represent the interval length between samples, i is the sample count, x i For the ith training sample characteristic value, k is the cluster count, m is the number of samples in the cluster, m k Sample number for kth cluster, +.>For the attribute feature of the kth cluster corresponding to the ith training sample feature value, ++>Is the center feature of the 1 st sample in the kth cluster in positive samples,/>Is the mth in the kth cluster in the positive sample k Center feature of individual samples, +.>Is the center feature of the 1 st sample in the kth cluster in the negative samples,/>Is the mth in the kth cluster in the negative sample k The central characteristics of the samples are that X is a sample containing the user key information in a sample set, and Y is a sample not containing the user key information in the sample set;
s202: preprocessing the user key information training sample set A in a centralized way k Obtaining the basic userFeature database C k =Tr k ∪Ts k ;
S203: at the Tr k P features are arbitrarily selected to create a P-dimensional random user feature subspace H t ;
S204: in the subspace H t Building user characteristic proximity data network for foundationAdjacent data network between user characteristic classes>And subscriber feature non-adjacent data network->Respectively designing corresponding weight matrixes:
wherein,weight matrix for user feature proximity data network, < ->Weight matrix for non-adjacent data network for user feature, < ->Weight matrix for adjacent data network between user characteristic classes>For reaction sample->To->Is used for the distance of euclidean distance,representative sample->Average to all other sample spacings, +.>Representative sample->Average to all other sample spacings, +.>For representing samples->The distance value to the same kind of sample which is furthest from the distance value, C is the adjacent relation negative constraint matrix;
s205: returning to step S203, the cycle is repeated T times in this way;
s206: creating a fusion relation user characteristic data network, and calculating weight matrixes corresponding to the fusion relation user characteristic data networks:
wherein,respectively representing the specific weight values of the weight matrix;
s207: respectively constructing a spreading matrix Q m 、Q rsb 、Q rsn And Q rsf :
Wherein S is m Representing a symmetric matrix; d (D) m Represented by S m The derived diagonal matrix, M, is the original positive constraint set, (x i ,x j ) For sample x i And sample x j A constituent sample array, (x) j x i ) For sample x j And sample x i The sample array is composed of a conversion matrix w T For the transpose matrix of the transform matrix, X is the sample matrix of the sample set containing the user key information, X T Is the transpose matrix of the sample matrix containing the user key information in the sample set, L m For the tag value of the user key information,for sample x i And sample x j A symmetrical matrix is formed;
wherein rsb is an inter-class fusion and dispersion method,conjugate matrix corresponding to weight matrix representing fusion spread among classes, < ->Weights representing fusion spread between classesConjugate matrix corresponding to symmetric matrix in matrix, L rsb A laplace matrix interspersed for inter-class fusion;
wherein rsn is a neighbor fusion and dispersion method, S rsn Fusing the scattered weight matrix for neighbors, D rsn Fusing diagonal matrix of scattered weight matrix for neighbor, L rsn Fusing the scattered Laplace matrix for neighbors;
wherein rsf is a non-adjacent fusion spread pattern, S rsf Weight matrix spread for non-adjacent fusion, D rsf Diagonal matrix of non-adjacent fusion spread weight matrix, L rsf A laplace matrix interspersed for non-contiguous fusion;
s208: given the values of parameters β and α, a target transfer function is created:
wherein alpha and beta are constant parameters, w is a target transformation vector, argmax w To get the parameter function, L m A tag value for the user key information;
s209: the dimension d is preset, the target conversion function is converted into a solution of a target transformation vector, and a target mapping matrix W is generated according to the dimension and the target transformation vector k :
X(L rsb +αL rsf )X T w=λX(L m +αL rsn )X T w wherein λ is an adaptation parameter;
s210: from the following componentsAfter processing, a final user feature database is obtained, trk is a basic feature space containing user key information, myk is a generic feature space, and mapping rho is obtained according to the user feature database K (a 1 ),Wherein a is 1 For user information ρ K And the category corresponding to the user information.
The user information processing module is also used for constructing a learning model by using a class II classification method and presuming a set of possible user characteristic information, and the specific operation steps are as follows:
s301: by mapping ρ K (a 1 ) The user characteristic database is created to correspond to the class II classification training set Ts k ;
S302: for the Ts k User information a is obtained by building a class II classification method k Corresponding learning model f k : myk-R, wherein Myk is the generic feature space, f k For model mapping, R is a label set of user characteristic information;
s303: a set of possible user characteristic information is extrapolated via a learning model,
wherein Y' is a user characteristic information prediction set, k is a sample count of user key information, q is a total number of samples of the user key information, and a k Is a sample of user key information, t is user characteristic information, ts k For class II class training set, ρ k And the category corresponding to the user characteristic information.
The user behavior processing module is used for marking the user behavior information, and comprises the following specific operation steps:
s401: in the user behavior information, the clicking times are marked as C p The number of downloads is marked as D p ;
S402: the time for displaying the advertisement in the advertisement position is T c The user finishes browsing the advertisement with the time T d ;
S403: the activity value of the user for the advertisement is:
E n =(T d -T c )×v1+Cp×v2+Dp×v3,
wherein v1, v2 and v3 are coefficient factors, such as v1 takes on 0.15, v2 takes on 0.44 and v3 takes on 0.27, when E n >E n-1 When E is n The larger the user, the closer to maturity, when E n <E n-1 ,E n Smaller users are more likely to be lost.
The working principle of the invention is as follows:
the user management system and method based on full life cycle, while working, the user behavior acquisition unit in the information acquisition module is in sleep mode, when the user browses, clicks or downloads the advertisement, the user behavior acquisition unit is in working mode, and transmits the user behavior information to the user behavior processing module through the central processing unit, and the user behavior processing module processes the user behavior information to obtain the user activity value; and simultaneously, the user information acquisition unit starts to acquire user information, the user information is transmitted to the user information processing module through the central processing unit, the user processing module creates a user characteristic database in the data storage module according to the user information, the user characteristic database is created to apply a user multi-characteristic algorithm, the algorithm utilizes a self-adaptive adjacent data network to embody the adjacent relation of the user characteristic information in the generated user characteristic database, and the adjacent matrix corresponding to the pair constraint relation is corrected by combining the adjacent relation, and the user characteristic data set output by the algorithm can more accurately pass through the central processing unit and output the user characteristic information through the increase of the user key information acquisition amount, meanwhile, the activity value of the user with the characteristic information on the type advertisement is calculated based on the user characteristic information, so that the user management of the full life cycle is realized by combining the user activity values for users in different life cycles.
The present invention is not limited to the above embodiments, but is not limited to the above embodiments, and any modifications, equivalents and variations made to the above embodiments according to the technical matter of the present invention can be made by those skilled in the art without departing from the scope of the technical matter of the present invention.