Disclosure of Invention
The invention provides an advertisement media quality evaluation method and device, which aim to solve the problem that the advertisement media evaluation stability and accuracy are not high.
According to an aspect of the present invention, there is provided an advertisement media quality evaluation method, including: acquiring multiple groups of access information of a user responding to multiple advertisement media respectively, wherein each group of access information in the multiple groups of access information comprises multiple access information; performing principal component analysis on the multiple access information to obtain multiple principal components, wherein the multiple principal components are independent from each other and can cover most access information of the multiple advertisement media; and determining the quality of the plurality of advertisement media according to the plurality of main components and the weight of the information of covering the plurality of main components by each main component.
Preferably, the performing principal component analysis on the plurality of access information includes: determining a correlation between each of the plurality of access information; and according to the correlation, carrying out noise reduction and redundancy removal on the various access information.
Preferably, determining the correlation between each of the plurality of access information comprises: standardizing the multiple groups of access information, and generating an original sample matrix according to the standardized multiple groups of access information; a covariance matrix of the original sample matrix with respect to the plurality of access information is calculated, wherein elements of the covariance matrix are used to indicate a correlation between each of the plurality of access information.
Preferably, denoising and de-redundancy the plurality of access information according to the correlation comprises: diagonalizing the covariance matrix to obtain a set of eigenvalues, wherein one eigenvalue in the set of eigenvalues is used for representing the relative size of the information of the original sample matrix covered by the corresponding dimension; determining a contribution rate for each eigenvalue in the set of eigenvalues; determining another group of characteristic values according to the contribution rate and a preset threshold condition, wherein the characteristic vector corresponding to the other group of characteristic values is a principal component, and the other group of characteristic values is a subset of the group of characteristic values; and calculating a new sample matrix according to the group of main components corresponding to the other group of characteristic values and the original sample matrix, wherein one element in the new sample matrix is used for representing the score of each main component in the plurality of main components of the corresponding one advertisement media.
Preferably, determining the quality of the plurality of advertisement media according to the plurality of groups of principal components and the weight of each group of principal components covering the information of the plurality of principal components comprises: determining a weight corresponding to each group of principal components in the plurality of groups of principal components, wherein the weight is used for representing the proportion of the corresponding one principal component covering the information of the plurality of principal components; and respectively calculating the comprehensive score of each advertising medium according to the scores and the weights of the various main components corresponding to each advertising medium in the advertising mediums, wherein the comprehensive score is used for representing the quality of the corresponding advertising medium.
According to another aspect of the present invention, there is also provided an advertisement media quality evaluation apparatus, including: the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a plurality of groups of access information of a user respectively responding to a plurality of advertisement media, and each group of access information in the plurality of groups of access information comprises a plurality of access information; the analysis module is used for carrying out principal component analysis on the multiple access information to obtain multiple principal components, wherein the multiple principal components are mutually independent and can cover most access information of the multiple advertisement media; and the determining module is used for determining the quality of the plurality of advertisement media according to the plurality of main components and the weight of the information of covering the plurality of main components by each main component.
Preferably, the analysis module comprises: a first determination unit configured to determine a correlation between each of the plurality of types of access information; and the processing unit is used for carrying out noise reduction and redundancy removal on the various access information according to the correlation.
Preferably, the determination unit includes: the processing subunit is used for standardizing the multiple groups of access information and generating an original sample matrix according to the standardized multiple groups of access information; a first calculating subunit, configured to calculate a covariance matrix of the original sample matrix with respect to the plurality of access information, wherein an element in the covariance matrix is used to indicate a correlation between each of the plurality of access information.
Preferably, the processing unit includes: a diagonalization subunit, configured to diagonalize the covariance matrix to obtain a set of eigenvalues, where one eigenvalue in the set of eigenvalues is used to represent a relative size of information of the original sample matrix covered by a corresponding dimension; a first determining subunit, configured to determine a contribution rate of each eigenvalue in the set of eigenvalues; a second determining subunit, configured to determine, according to the contribution rate and a predetermined threshold condition, another group of feature values, where a feature vector corresponding to the another group of feature values is a principal component, where the another group of feature values is a subset of the group of feature values; and the second calculating subunit is used for calculating a new sample matrix according to the main component corresponding to the another group of characteristic values and the original sample matrix, wherein one element in the new sample matrix is used for representing the score of one main component in the plurality of main components of the corresponding advertising media.
Preferably, the determining module comprises: a second determining unit, configured to determine a weight corresponding to each principal component of the plurality of principal components, where the weight is used to indicate a proportion of a corresponding group of principal components covering information of the plurality of principal components; and the calculating unit is used for respectively calculating the comprehensive score of each advertising medium according to the score of the various main components corresponding to each advertising medium in the advertising media and the weight, wherein the comprehensive score is used for expressing the quality of the corresponding advertising medium.
According to the invention, a plurality of groups of access information of which users respectively respond to a plurality of advertisement media are acquired, wherein each group of access information in the plurality of groups of access information comprises a plurality of access information; performing principal component analysis on the multiple access information to obtain multiple principal components, wherein the multiple principal components are independent from each other and can cover most access information of multiple advertisement media; according to the multiple main components and the weight of the information of the multiple main components covered by each main component, the quality of the multiple advertisement media is determined, the problem that the evaluation stability and accuracy of the advertisement media are not high is solved, and the evaluation stability and accuracy of the advertisement media are improved.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 is a schematic flow chart of an advertisement media quality evaluation method according to an embodiment of the present invention, and as shown in fig. 1, the flow chart includes the following steps:
step S102, acquiring multiple groups of access information of a user responding to multiple advertisement media respectively, wherein each group of access information in the multiple groups of access information comprises multiple access information;
step S104, performing principal component analysis on the multiple access information to obtain multiple principal components, wherein the multiple principal components are independent from each other and can cover most access information of multiple advertisement media;
step S106, according to the weight of the multiple main components and the information of each main component covering the multiple main components, determining the quality of the multiple advertisement media.
Through the steps, the main component analysis is carried out on the multiple groups of access information of the multiple advertisement media, the quality of the multiple advertisement media is determined according to the multiple main components which are independent from each other and can cover most of the access information of the multiple advertisement media and the weight of the information of which the main components cover all the main components, so that multiple co-linear interference among the access information can be avoided, and the objectivity of the evaluated data and the evaluation result is also ensured because the obtained multiple main components can cover most of the access information. Therefore, the method and the device solve the problem that the evaluation stability and accuracy of the advertising media are not high, and improve the evaluation stability and accuracy of the advertising media.
When a multivariate problem is studied by statistical analysis, the complexity of the problem increases when the number of variables is too large. It is naturally desirable to obtain a larger amount of information with a smaller number of variables. In many cases, there is a certain correlation between variables, and when there is a certain correlation between two variables, it can be interpreted that there is a certain overlap of information reflecting the subject. The principal component analysis is to establish as few new variables as possible for all the variables originally proposed, so that the new variables are irrelevant pairwise, and the new variables keep original information as much as possible in terms of reflecting the information of the subject.
There are various methods of principal component analysis in the related art, and these analysis methods can be applied to the embodiments of the present invention to achieve similar effects. Preferably, the principal component analysis in the present embodiment includes: determining a correlation between each of the plurality of access information; and according to the correlation, performing noise reduction and redundancy removal on various access information.
Preferably, determining the correlation between each of the plurality of access information comprises: standardizing a plurality of groups of access information, and generating an original sample matrix according to the standardized access information; a covariance matrix of the original sample matrix with respect to the plurality of access information is calculated, wherein elements in the covariance matrix are used to indicate a correlation between each of the plurality of access information.
Preferably, denoising and de-redundancy the plurality of access information according to the correlation comprises: diagonalizing the covariance matrix to obtain a group of eigenvalues, wherein one eigenvalue in the group of eigenvalues is used for representing the relative size of the information of the original sample matrix covered by the corresponding dimension; determining a contribution rate of each eigenvalue in a set of eigenvalues; determining another group of characteristic values according to the contribution rate and a preset threshold condition, wherein the characteristic vector corresponding to the another group of characteristic values is a principal component, and the another group of characteristic values is a subset of the group of characteristic values; and calculating a new sample matrix according to the main component corresponding to the other group of characteristic values and the original sample matrix, wherein one element in the new sample matrix is used for expressing the score of one main component in the plurality of main components of the corresponding one advertisement media.
Preferably, determining the quality of the plurality of advertisement media according to the plurality of principal components and the weight of the information that each principal component covers the plurality of principal components comprises: determining a weight corresponding to each principal component in the multiple principal components, wherein the weight is used for indicating the proportion of the corresponding one principal component covering all principal component information; and respectively calculating the comprehensive score of each advertising medium according to the scores and the weights of the various main components corresponding to each advertising medium in the advertising mediums, wherein the comprehensive score is used for representing the quality of the corresponding advertising medium.
Fig. 2 is a schematic flow chart of a preferred method for evaluating quality of an advertisement medium according to an embodiment of the present invention, and as shown in fig. 2, the embodiment provides a preferred method for evaluating quality of an advertisement medium, including the following steps:
step S202, acquiring multiple groups of access information of a user responding to multiple advertisement media respectively, wherein each group of access information in the multiple groups of access information comprises the same multiple access information;
step S204, determining the correlation between each access information in the multiple access information;
step S206, according to the correlation, noise reduction and redundancy elimination are carried out on the multiple kinds of access information;
step S208, determining the quality of a plurality of advertisement media according to the plurality of groups of new principal component scores and weights determined after noise reduction and redundancy elimination.
Through the steps, the noise reduction and redundancy removal processing is carried out on the correlation between each kind of access information in the multiple kinds of access information common to the multiple advertisement media, so that the quality of the advertisement media can be evaluated according to the determined new main component and the weight thereof, and compared with the defect that the quality evaluation result of the advertisement media lacks stability and accuracy caused by not considering the correlation between each kind of access information in the related technology, the influence of more than two access information with a certain degree of correlation on the evaluation result can be removed according to the determined multiple groups of new access information in the embodiment, so that the problem that the evaluation stability and accuracy of the advertisement media are not high is solved, and the stability and accuracy of the evaluation of the advertisement media are improved.
In the above steps, each advertisement medium corresponds to a set of access information, and each advertisement medium corresponds to a new set of principal component scores.
Preferably, in the related art, there are various ways to determine the correlation between two elements, for example, in this embodiment, the correlation between each piece of access information is determined by matrixing multiple sets of access information corresponding to multiple advertisement media and then calculating the covariance matrix of the original sample matrix obtained by matrixing. Each element in the established covariance matrix is used to indicate the correlation between the corresponding two access information, wherein the larger the absolute value of the element is, the larger the correlation between the corresponding two access information is, and the zero value of the element indicates no correlation between the corresponding two access information. It can be seen that the correlation between each of the at least two access information can be indicated by an element in the covariance matrix.
In the above embodiments, in order to establish the original sample matrix for different types of access information, the access information needs to be normalized, which is also called a centralization process or a normalization process.
Preferably, there may be several access information with large relevance among the access information, such as: for an advertisement medium with a content pushed for a fixed duration, there may be a large correlation between the browsing times and the browsing time, and in this case, if the quality of the advertisement medium is evaluated according to the browsing times and the browsing time, there is a problem of multiple-index co-linear interference, resulting in a high evaluation result. It should be noted that the above example is only a simple example that may appear in the present embodiment in application, and is not limited to this in practical application, and may be more complex.
In order to remove the influence of various access information with a certain degree of correlation on the evaluation result and to get rid of the factors of inaccuracy and contingency caused by human judgment, in this embodiment, a predetermined denoising processing mode is adopted to perform denoising and redundancy removing processing on the elements in the covariance matrix. For example, by diagonalizing the covariance matrix, a set of eigenvalues is obtained, a new matrix is formed by eigenvectors corresponding to the set of eigenvalues, the number of the set of eigenvalues is the same as the number of access information in a set of access information, and each eigenvalue is used as an index for representing the magnitude of influence of a new principal component, for example, if the eigenvalue is less than 1, the interpretation strength of the principal component is not as great as the average interpretation strength of an original variable directly introduced. Therefore, when removing redundancy, a preset value of the characteristic value can be set, and the characteristic value of which the value is smaller than the preset value in the group of characteristic values is removed to obtain another group of new characteristic values; and finally, determining new principal components according to the obtained eigenvectors corresponding to the other group of new eigenvalues respectively, wherein the new principal components can explain each group of the most of the original access information, and the principal components corresponding to the screened group of new eigenvalues can cover the information which can be covered by the most of the access information of the original sample matrix, so that the quality of the advertisement media can be accurately evaluated according to the determined groups of new principal components.
Preferably, in the redundancy removal processing, determining the contribution rate of each eigenvalue in the set of eigenvalues according to the magnitude of the value of each eigenvalue in the set of eigenvalues and the magnitude of the total value of the set of eigenvalues may be adopted. Preferably, after determining the contribution rate of each eigenvalue in the set of eigenvalues, the cumulative contribution rate of the largest eigenvalues among the eigenvalues can reach more than 90%, and the set of eigenvalues is determined; and removing the characteristic values with the smallest characteristic value cumulative contribution rate less than 10% in the group of characteristic values, and determining the characteristic values of the group of characteristic values.
Preferably, in step S208, after a new sample matrix is generated according to the new principal component determined after noise reduction and redundancy elimination, the quality of the plurality of advertisement media can be determined respectively according to the new sample matrix by adopting a predetermined evaluation mode.
Corresponding to the above method embodiment, the present embodiment further provides an advertisement media quality evaluation device, where the function implementation in the device embodiment has been described in detail in the above method embodiment, and in a case of no conflict, the present embodiment may be described in combination with the above method embodiment, and will not be described again here.
Fig. 3 is a schematic structural diagram of an advertising media quality evaluation apparatus according to an embodiment of the present invention, as shown in fig. 3, the apparatus including: the system comprises an acquisition module 32, an analysis module 34 and a determination module 36, wherein the acquisition module 32 is configured to acquire multiple sets of access information of a user respectively responding to multiple advertisement media, and each set of access information in the multiple sets of access information includes multiple types of access information; the analysis module 34 is coupled to the obtaining module 32, and configured to perform principal component analysis on the multiple types of access information to obtain multiple types of principal components, where the multiple types of principal components are independent from each other and can cover most of the access information of the multiple advertisement media; the determining module is coupled to the analyzing module 34 for determining the quality of the plurality of advertisement media according to the plurality of principal components and the weight of the information that each principal component covers all the principal components.
The modules and units related in the embodiments of the present invention may be implemented by software, or may be implemented by hardware. The modules and units described in this embodiment may also be disposed in the processor, and for example, may be described as: a processor includes an acquisition module 32, an analysis module 34, and a determination module 36. The names of the modules do not limit the modules themselves in some cases, and for example, the acquiring module may also be described as a "module for acquiring multiple sets of access information of users respectively responding to multiple advertisement media".
Preferably, the analysis module 34 comprises: a first determination unit configured to determine a correlation between each of a plurality of kinds of access information; the processing unit is coupled to the first determining unit and is used for carrying out noise reduction and redundancy elimination on the various access information according to the correlation.
Preferably, the determination unit 36 includes: the processing subunit is used for standardizing the multiple groups of access information and generating an original sample matrix according to the standardized multiple groups of access information; the first calculation subunit is coupled to the processing subunit and configured to calculate a covariance matrix of the original sample matrix with respect to the plurality of access information, wherein an element in the covariance matrix is used to indicate a correlation between each of the plurality of access information.
Preferably, the processing unit comprises: the diagonalization subunit is used for diagonalizing the covariance matrix to obtain a group of eigenvalues, wherein one eigenvalue in the group of eigenvalues is used for representing the relative size of the information of the original sample matrix covered by the corresponding dimension; a first determining subunit, coupled to the diagonalizing subunit, for determining a contribution rate of each feature value in the set of feature values; the second determining subunit is coupled to the first determining subunit and is used for determining another group of characteristic values according to the contribution rate and a predetermined threshold condition, and the characteristic vectors corresponding to the other group of characteristic values are principal components; wherein the other set of feature values is a subset of the set of feature values; the second calculating subunit is coupled to the second determining subunit and configured to calculate a new sample matrix according to the principal component corresponding to the another set of feature values and the original sample matrix, wherein an element in the new sample matrix is used to represent a score of one principal component of the plurality of principal components of the corresponding one of the advertisement mediums.
Preferably, the determination module 36 comprises: a second determining unit, configured to determine a weight corresponding to each principal component of the plurality of principal components, where the weight is used to indicate a weight of information that the corresponding one principal component covers the plurality of principal components; the calculating unit is coupled to the second determining unit and used for calculating a comprehensive score of each advertising medium according to the scores and the weights of the various main components corresponding to each advertising medium in the plurality of advertising media, wherein the comprehensive score is used for representing the quality of the corresponding advertising medium.
The present invention will now be described with reference to preferred embodiments and examples.
FIG. 4 is a flowchart of an advertising media quality sample evaluation according to a preferred embodiment of the present invention, as shown in FIG. 4, including the steps of:
step S402, sampling the user response condition of the advertisement media to form a sample matrix;
step S404, standardizing each element in the sample matrix to realize sample centralization;
step S406, establishing a standardized sample matrix according to each element after standardization;
step S408, calculating a covariance matrix of the normalized sample matrix;
step S410, noise reduction and redundancy removal processing are carried out on the covariance matrix, and a new sample matrix with the corresponding dimensionality containing less access information of the user removed is established, wherein the new sample matrix is the product of the normalized original matrix and a new eigenvector matrix;
step S412, index weight is calculated according to the new group of characteristic values;
step S414, according to the index weight, respectively calculating the comprehensive score of the corresponding advertisement media, wherein the comprehensive score is used for identifying the quality of the corresponding advertisement media.
Through the steps, multiple co-linear interference caused by entering of multiple indexes into the model is avoided, information overlapping does not exist among the indexes, and meanwhile the problem of how to determine the weight coefficient and the like during dimension accumulation is solved.
The above steps are described in detail below by way of a specific example.
In the preferred embodiment, an algorithm is used to evaluate the quality of N media and advertisements currently delivered by an advertiser, and a matrix of N x P is formed after data is collected. The online advertising media sample observation data matrix is as follows:
in step S404, the raw index data is subjected to normalization (centering) processing:
wherein,
in steps S406 to S408, the normalized sample matrix is denoted as S, and the covariance matrix is calculated to obtain:
in step S410, the covariance matrix C is diagonalized to find an orthogonal matrix H satisfying:
HTCH=Λ,H,Λ∈Rpxp,
p characteristic values of gamma are obtained1,γ2,…,γp;
The dimensionality corresponding to the maximum first d (d < p) eigenvalues is taken, and a new diagonal matrix Λ is formed by the d eigenvalues1∈Rd×dThe corresponding d eigenvectors form a new eigenvector matrix H1∈Rp×d。
In step S412, the corresponding contribution rates of the d new indicators are calculated according to the d eigenvalues corresponding to the d new indicators as follows:
and (4) assuming that the accumulated contribution rate of the d characteristic values needs to reach more than 90 percent, and determining the value of d.
Obtaining a new sample matrix: s1=SH1S1∈Rn×d;
In step S414, after obtaining the new sample matrix, the dimension of each sample becomes d, and the new index corresponding to the new sample is: f1,F2,…,Fd;
The composite score may be calculated by the following formula:
because there is multiple co-linear interference in the index information (i.e. access information) for evaluating the quality of the advertisement media in the related art, the noise reduction and redundancy removal processing is performed on the index in the preferred embodiment, and the dimensionless effect is achieved by sample centralization in step S404 and calculation of a sample matrix in step S406, so that indexes of different units or orders can be compared and weighted conveniently. The effect of noise reduction and redundancy removal is achieved by calculating the covariance matrix in step S408 and calculating the new sample matrix in step S410. The purpose of "noise reduction" is to make the correlation between the remaining dimensions as small as possible, that is, to make the non-diagonal elements in the covariance matrix of p × p substantially zero, that is, to make the covariance matrix diagonal; the term "redundancy removal" refers to removing dimensions corresponding to new variances smaller in the diagonal of the diagonalized covariance matrix, so that the "energy" (i.e., variance) contained in the remaining dimensions is as large as possible. The preferred embodiment takes only those dimensions that contain a large amount of energy (eigenvalues).
The preferred embodiment can be applied to advertisement media (including automobiles, fast-moving advertisements and the like), and the advertisement media put by advertisers are ranked according to indexes obtained by the algorithm, for example, the advertisement clicks and conversions brought by a large media access amount are relatively large, and the evaluation of all indexes without processing directly causes a large error of the result due to multiple collinearity. The method of the preferred embodiment performs weighting processing on each index, so that the weight is large when the effect is large, and the weight is smaller when the effect is small, thereby accurately evaluating the quality of the advertising media.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.