CN113011748A

CN113011748A - Recommendation effect evaluation method and device, electronic equipment and readable storage medium

Info

Publication number: CN113011748A
Application number: CN202110302485.XA
Authority: CN
Inventors: 陈冠宇
Original assignee: China Construction Bank Corp
Current assignee: China Construction Bank Corp
Priority date: 2021-03-22
Filing date: 2021-03-22
Publication date: 2021-06-22

Abstract

The application relates to the technical field of artificial intelligence, and particularly provides a recommendation effect evaluation method and device, an electronic device and a readable storage medium. The method comprises the following steps: acquiring preference data of an object to be recommended by a user; determining an evaluation index of a recommendation algorithm to be evaluated based on the preference data and a cross validation mode; and evaluating the recommendation effect of the recommendation algorithm to be evaluated based on the evaluation index. Based on the scheme, the recommendation effect of the recommendation algorithm can be effectively evaluated, the recommendation algorithm can be effectively discriminated, and the effectiveness of the recommendation algorithm in actual application is guaranteed.

Description

Recommendation effect evaluation method and device, electronic equipment and readable storage medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a recommendation effect evaluation method and device, an electronic device and a readable storage medium.

Background

With the rapid development of the internet and communication technology, the amount of data received by users through various terminal devices is exponentially increased along with the time, so that the users are extremely difficult to acquire and mine valuable information from massive data. The recommendation system integrates knowledge experience and thought in multiple fields of data mining, machine learning, human behavior, man-machine interaction and the like, and can provide accurate, timely and surprised personalized information service for each user.

At present, various recommendation algorithms are overlapped, but the recommendation effects are different, so that if the recommendation effect of the recommendation algorithm can be effectively evaluated, the recommendation algorithm can be effectively discriminated, and the effectiveness of the recommendation algorithm in actual application is ensured.

Disclosure of Invention

The present application aims to solve at least one of the above technical drawbacks. The technical scheme adopted by the application is as follows:

in a first aspect, an embodiment of the present application provides a method for evaluating a recommendation effect, where the method includes:

acquiring preference data of an object to be recommended by a user;

determining an evaluation index of a recommendation algorithm to be evaluated based on the preference data and a cross validation mode;

and evaluating the recommendation effect of the recommendation algorithm to be evaluated based on the evaluation index.

Optionally, determining an evaluation index of a recommendation algorithm to be evaluated based on the preference data and based on a cross validation manner, includes:

dividing preference data into a training set and a test set based on a cross validation mode;

training a recommendation algorithm to be evaluated based on the training set, and predicting preference prediction data of users in the test set on objects to be recommended in the test set based on the trained recommendation algorithm to be evaluated;

and determining an evaluation index of the recommendation algorithm to be evaluated based on the preference prediction data and based on the preference data of the user in the test set on the object to be recommended in the test set.

Optionally, the dividing of the preference data into a training set and a test set based on a cross validation method includes:

and dividing the preference data into K groups based on a K-fold cross validation mode, designating one group in the K groups as a test set, and designating the preference data except the test set as a training set.

Optionally, dividing the preference data into K groups based on a K-fold cross validation method includes:

based on a K-fold cross validation mode, dividing the preference data into K groups according to the sequence of the timestamps, wherein the corresponding time lengths of all the groups in the K groups are equal.

Optionally, calculating a value of an evaluation index of a recommendation algorithm to be evaluated based on the preference prediction data and based on preference data of the user in the test set on the object to be recommended in the test set, where the calculating includes:

based on the preference prediction data and based on the preference data of the users in the test set to the objects to be recommended in the test set, repeatedly determining the initial value of the evaluation index of the recommendation algorithm to be evaluated in a K-fold cross validation mode;

and determining the average value of the initial values as the value of the evaluation index of the recommended algorithm to be evaluated.

Optionally, the assessment indicator comprises a scoring accuracy and/or a usage accuracy;

wherein scoring accuracy comprises at least one of:

root mean square error;

averaging the absolute errors;

wherein the usage accuracy comprises at least one of:

precision ratio;

true positive rate;

false positive rate;

and (4) true negative rate.

Optionally, if the evaluation index includes the scoring accuracy and the usage accuracy, evaluating the recommendation effect of the recommendation algorithm to be evaluated based on the evaluation index includes:

determining a recommended hit weight for each user based on the usage accuracy;

and determining the recommendation effect of the recommendation algorithm to be evaluated based on the recommendation hit weight and the scoring accuracy.

Optionally, the obtaining of the preference data of the user for the object to be recommended includes:

acquiring an operation behavior of a user on an object to be recommended;

and determining preference data of the user to-be-recommended objects based on the operation behaviors.

Optionally, determining, based on the operation behavior, preference data of the user for the object to be recommended includes:

and determining preference data of the object to be recommended based on a preset quantitative evaluation rule and operation data of each operation behavior.

initiating research DEMO of an object to be recommended;

and determining preference data of the user to-be-recommended objects based on the result of the research on the DEMO.

Optionally, before determining the evaluation index of the recommendation algorithm to be evaluated based on the preference data and based on a cross validation manner, the method further includes:

preprocessing the preference data, wherein the preprocessing is used for reducing the sparsity of the preference data;

the pre-treatment comprises at least one of:

filtering preference data of the corresponding operation behaviors, wherein the operation data of the corresponding operation behaviors do not meet a preset threshold;

performing dimensionality reduction on the preference data;

and performing noise reduction processing on the preference data.

In a second aspect, an embodiment of the present application provides an apparatus for evaluating a recommendation effect, where the apparatus includes:

the preference data acquisition module is used for acquiring preference data of an object to be recommended by a user;

the evaluation index determining module is used for determining the evaluation index of the recommendation algorithm to be evaluated based on the preference data and based on a cross validation mode;

and the recommendation effect evaluation module is used for evaluating the recommendation effect of the recommendation algorithm to be evaluated based on the evaluation index.

Optionally, when the evaluation index determining module divides the preference data into a training set and a test set based on a cross validation method, the evaluation index determining module is specifically configured to:

Optionally, when the evaluation index determining module divides the preference data into K groups based on a K-fold cross validation method, the evaluation index determining module is specifically configured to:

Optionally, the evaluation index determining module is specifically configured to, when calculating the value of the evaluation index of the recommendation algorithm to be evaluated based on the preference prediction data and based on the preference data of the user in the test set to the object to be recommended in the test set, perform:

wherein scoring accuracy comprises at least one of:

root mean square error;

averaging the absolute errors;

wherein the usage accuracy comprises at least one of:

precision ratio;

true positive rate;

false positive rate;

and (4) true negative rate.

Optionally, if the evaluation index includes the scoring accuracy and the usage accuracy, the recommendation effect evaluation module is specifically configured to:

determining a recommended hit weight for each user based on the usage accuracy;

Optionally, the preference data obtaining module is specifically configured to:

acquiring an operation behavior of a user on an object to be recommended;

Optionally, when determining, based on the operation behavior, preference data of the user for the object to be recommended, the preference data obtaining module is specifically configured to:

Optionally, the preference data obtaining module is specifically configured to:

initiating research DEMO of an object to be recommended;

Optionally, the apparatus further comprises:

the preprocessing module is used for preprocessing the preference data before determining the evaluation index of the recommendation algorithm to be evaluated based on the preference data and based on a cross validation mode, and the preprocessing is used for reducing the sparsity of the preference data; the pre-treatment comprises at least one of:

performing dimensionality reduction on the preference data;

and performing noise reduction processing on the preference data.

Optionally, the evaluation index determining module is specifically configured to:

training a recommendation algorithm to be evaluated based on the training set, and predicting preference prediction data of the user in the test set on the object to be recommended in the test set based on the trained recommendation algorithm to be evaluated;

and calculating the value of the evaluation index of the recommendation algorithm to be evaluated based on the preference prediction data and the preference data of the user in the test set to the object to be recommended in the test set.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory;

a memory for storing operating instructions;

a processor, configured to execute the evaluation method of recommendation effect as shown in any implementation manner of the first aspect of the present application by calling an operation instruction.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for evaluating a recommendation effect shown in any one of the embodiments of the first aspect of the present application.

The technical scheme provided by the embodiment of the application has the following beneficial effects:

according to the scheme provided by the embodiment of the application, the evaluation index of the recommendation algorithm to be evaluated is determined based on the preference data of the object to be recommended of the user and based on a cross validation mode, so that the recommendation effect of the recommendation algorithm to be evaluated is evaluated according to the evaluation index. Based on the scheme, the recommendation effect of the recommendation algorithm can be effectively evaluated, the recommendation algorithm can be effectively discriminated, and the effectiveness of the recommendation algorithm in actual application is guaranteed.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic flowchart of a method for evaluating a recommendation effect according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of an apparatus for evaluating a recommendation effect according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 1 shows a schematic flow chart of a method for evaluating a recommendation effect according to an embodiment of the present application, and as shown in fig. 1, the method mainly includes:

step S110: acquiring preference data of an object to be recommended by a user;

step S120: determining an evaluation index of a recommendation algorithm to be evaluated based on the preference data and a cross validation mode;

step S130: and evaluating the recommendation effect of the recommendation algorithm to be evaluated based on the evaluation index.

In the embodiment of the application, the object to be recommended can be media content or financial products and the like, and the object to be recommended can be recommended to the user so as to obtain the real preference data of the user on the object to be recommended.

In the embodiment of the application, the evaluation index of the recommendation algorithm to be evaluated can be determined based on a cross validation mode. Specifically, based on a cross validation mode, the acquired preference data may be used as a data set, the whole data set is divided into a plurality of parts, one part is used as a training set to train an evaluation algorithm, and the other part is used as a test set to test the quality of the trained evaluation algorithm.

According to the method provided by the embodiment of the application, the evaluation index of the recommendation algorithm to be evaluated is determined based on the preference data of the object to be recommended of the user and based on a cross validation mode, so that the recommendation effect of the recommendation algorithm to be evaluated is evaluated according to the evaluation index. Based on the scheme, the recommendation effect of the recommendation algorithm can be effectively evaluated, the recommendation algorithm can be effectively discriminated, and the effectiveness of the recommendation algorithm in actual application is guaranteed.

In an optional manner of the embodiment of the application, acquiring preference data of a user on an object to be recommended includes:

acquiring an operation behavior of a user on an object to be recommended;

In a traditional preference data acquisition mode, a user actively scores a product and records the value of the product scored by the user, the operation is complex for the user, and the acquisition effect depends on the scoring enthusiasm of the user.

In the embodiment of the application, the operation behavior of the user on using the product to treat the recommended object can be monitored, so that the preference data of the user on treating the recommended object is determined based on the operation behavior, and the defects in the traditional preference data acquisition mode can be avoided.

In particular, operational behavior may include implicit behavior, which refers to browsing and operational behavior that a user actively makes for some purpose without experiencing additional burden. For example, the user's dwell time on the product page, the user's finger or mouse touch or click, the user copies the page text, tags a certain product for collection, the user shares a product with others, the user purchases or signs a certain product, etc.

According to the embodiment of the application, the preference value of the user to the product can be rapidly acquired and quantified based on the implicit behavior, and the user data closest to the real application scene is acquired.

In an optional mode of the embodiment of the application, determining preference data of a user on an object to be recommended based on an operation behavior includes:

In the embodiment of the application, when determining the preference data, the operation data of the user needs to be quantified, specifically, a quantitative evaluation rule may be set, a quantitative score may be calculated based on the operation data of each operation behavior, and the quantitative score may be used as the preference data of the user for the product.

The quantitative evaluation rule may be a series of operation rules for the operation data, and the operation data may be calculated according to the quantitative evaluation rule to obtain a quantitative score. The specific operation mode in the quantitative evaluation rule can be set according to actual needs.

As one example, the operational behavior may include a user click behavior on a product page, a browse behavior, a collection behavior on a product, and a share behavior.

The operation data corresponding to the clicking action of the user on the product page can be the number of clicks.

The operation data corresponding to the browsing behavior of the user on the product page may be browsing time.

The collection behavior of the user on the product can be collection times.

The sharing behavior of the product by the user can be sharing times.

The quantitative evaluation rule in this example is specifically as follows:

taking user u as an example, the quantitative score is r_u,i。

If user u clicks product i once, then r_u,iAnd increased by 1.

If the user u browses t under the introduction page of the product i_u,iSecond, then r_u,i＝log₂γt_u,iWherein, γ t_u,iHas a value range of [1,128 ]]And gamma is a quantitative grading acceleration control coefficient used for controlling the speed of the increment of the preference degree by the page browsing time. For example, the larger the value of γ, the shorter the user browses on the product page, which reflects a higher preference for the product. The value of gamma is reasonably selected by combining the actual situation and the weight occupied by the page browsing time in the scene.

If user u collects bank product i once, then r_u,iAnd increased by 2.

If user u shares bank product i to others once, then r_u,iAnd increased by 3.

The data table storage structure of the preference data of this example is shown in table 1

TABLE 1

Operation behavior ID	Description of operational behaviors	Quantitative scoring
			A₁	Click on	1
A₂	Browsing	log₂γt_u,i
			A₄	Collection method	2
A₅	Sharing	3
			…	…	…

The operation behavior ID is an operation behavior identifier, and the operation behavior description is description information of a user operation behavior.

In the embodiment of the application, the operation behavior of the user can be monitored in a point burying mode, and specifically, the operation log of the user can be recorded in a point burying mode through a client code, so that the operation behavior of the user on the object to be recommended is obtained, and recording is performed when the user browses the product, wherein the operation behavior is specified.

A user behavior monitoring module is installed on a client page of a recommendation system to be evaluated, behaviors generated in a product browsing process of a user are usually recorded in a mode of combining a buried point with a log record, and user behavior information can be recorded in a user-product evaluation log table (namely table 2) shown in the following figure.

TABLE 2

As shown in Table 2, U₁、U₂Is identified for the user, I₁、I₂For product identification, operation behavior ID is operationAnd (5) making behavior identification, wherein the time is the occurrence time of the operation behavior.

If the user operation end of the recommendation system is a webpage, capturing and collecting operation behaviors can be completed by adopting Ajax and JavaScript; if the user operation end of the recommendation system is a bank APP of the web page Android or IOS, the SDK (Software Development Kit) can be acquired by adopting the behavior of the Android or IOS.

initiating research DEMO of an object to be recommended;

In the embodiment of the application, a research DEMO mode of the object to be recommended can be initiated, and research is initiated on the user, so that the preference score is obtained.

The existing recommendation algorithm is evaluated mainly through three modes of off-line evaluation, user investigation evaluation and on-line evaluation. Off-line evaluation typically trains recommendation algorithms through publicly downloadable third party data sets and evaluates accuracy through prediction errors; a group of test objects need to be set for user research and evaluation, the test objects need to interact with the system according to requirements, test behaviors need to be observed and recorded when the test objects finish interaction tasks, and qualitative problems are provided according to various stages of testing and used for collecting data which cannot be intuitively acquired. Finally, the performance of the system is known by analyzing the collected data; the online evaluation actually enables a user to experience online, and the quality is judged through the feedback of the user.

The three methods have certain defects, the difference between a third-party data set used for off-line evaluation and a scene to which a real recommendation algorithm is applied possibly exists, and only errors can be predicted but whether a user adopts recommended content cannot be calculated. User research is expensive to perform, commission and time costs are high, and trial design requirements are stringent. The on-line evaluation period is long and a control test needs to be designed, and the evaluation conclusion can be obtained through repeated adjustment and test. The method for combining online experiment and offline evaluation provided by the application aims to solve the problem caused by the defects of the existing evaluation mode on the premise of ensuring effective evaluation of the recommendation effect.

In the embodiment of the application, for the collection of the preference data, the investigation DEMO link for the collection of the preference data can be sent to the group chat or the network community of the social application only through the social platform, the experimental volunteer can browse and click the favorite product on the website at will, and the user-activity click data set can be obtained through the data collection engine, so that the preference data of the user can be obtained, and the data collection difficulty and the task amount under the real scene are greatly reduced.

In the embodiment of the application, the obtained real preference data is directly divided and calculated by adopting an off-line evaluation idea, functions are taken, circulated and cross-calculated only by setting values of parameters and outputting evaluation indexes of a system under the current parameters through a programming language, the cost of evaluation operation is only time complexity and consumption of calculation resources caused by running an evaluation program, and a theoretical feasibility of an evaluation experiment is ensured by a method of leave-one and K-fold cross validation, so that the obtained conclusion is more convincing to people.

In an optional manner of the embodiment of the application, before determining the evaluation index of the recommendation algorithm to be evaluated based on the preference data and based on a cross validation manner, the method further includes:

and preprocessing the preference data, wherein the preprocessing is used for reducing the sparsity of the preference data.

In actual use, the pre-treatment may include at least one of:

performing dimensionality reduction on the preference data;

and performing noise reduction processing on the preference data.

In the embodiment of the application, in order to reduce the sparsity of the preference data of the user, the preference data can be preprocessed.

Specifically, the pre-processing of the preference data may be, but is not limited to, the following:

(1) and filtering, because the inactive user only has behavior records for a small number of products, the establishment of a recommendation model of a recommendation algorithm is not substantially assisted, a preset threshold value can be set, and the user behaviors of which the operation data do not meet the preset threshold value are filtered. For example, preference data corresponding to a user browsing behavior with a browsing time less than a preset duration is filtered.

(2) Dimension reduction, which can be performed by using a dimension reduction method to solve the sparsity problem, such as performing dimension reduction on preference data by using a PCA (Principal Component Analysis) or SVD (Singular Value Decomposition). .

(3) And noise reduction is carried out, and noise data such as missing data, abnormal data, malicious data or natural deviation data and the like which are easy to model by an image algorithm are reasonably removed.

In an optional mode of the embodiment of the application, determining an evaluation index of a recommendation algorithm to be evaluated based on preference data and based on a cross validation mode includes:

In the embodiment of the application, the preference data can be divided into the training set and the test set based on a cross validation mode, and the training set and the test set do not have data cross.

When the preference prediction data is predicted, the trained recommendation model can be obtained through training of the recommendation algorithm to be evaluated through the training set, and then the preference prediction data of the object to be recommended of the user can be predicted through the trained recommendation model. And taking the preference data of the object to be operated in the test set as real data, so as to calculate the value of the evaluation index of the recommendation algorithm to be evaluated, such as the values of the use accuracy and the scoring accuracy, based on the preference prediction data of the object to be recommended and the preference data of the object to be recommended by the user in the test set.

In an optional mode of the embodiment of the present application, the dividing of the preference data into the training set and the test set based on the cross validation mode includes:

In the embodiment of the application, preference data can be equally divided into K groups based on a K-fold cross validation mode, one group of the K groups is designated as a test set, the rest of the K groups minus 1 are designated as a training set, and an evaluation index is determined through the designated test set and training set.

In an optional mode of the embodiment of the present application, the preference data is divided into K groups based on a K-fold cross validation mode, including:

based on a K-fold cross validation mode, dividing the preference data into K groups according to the sequence of the timestamps, wherein the corresponding time lengths of all the parts in the K groups are equal.

In the embodiment of the application, the timestamp can be the time when the user browses the object to be recommended, that is, the value under the attribute of the timestamp in the user-product evaluation log table can be divided into K groups with the same corresponding duration according to the sequence of the timestamp. The duration can be divided according to the data volume of the data set, the distribution of the data and the specific test requirements, and can be units of year, month, day, time and the like. Because the occurrence time of the preference data in each group of the K groups is continuous, the preference data in each data can represent the continuous behavior of the user in a certain period, and the preference of the user in a period can be better reflected compared with random grouping.

In practical use, it is recommended to make the data amount of the preference data collected in each period uniform so that the data amount of each piece of preference data divided into equal time lengths is made similar.

In the scheme, the data is grouped by combining two aspects of time and user selection sequence. Firstly, equally dividing the time used in the whole data collection process into K groups according to the sequence of the timestamps, taking one group as special test time, and taking the other groups as training data of a recommended algorithm model. The evaluation test is grouped according to the timestamp so as to enable the evaluation test to be closer to and simulate the scene of on-line user evaluation, namely one of the evaluation test is used as special test time, a recommendation system is assumed to be established in the test time, product operation behaviors actually generated by a user in the period of time are hidden, then the recommendation system is made to recommend the product operation behaviors, the actual behaviors are assumed to be feedback after the user is recommended products, feedback of volunteers to recommendation effects in the test time is simulated, and then the recommendation quality is evaluated.

In an optional manner of the embodiment of the application, the calculating, based on the preference prediction data and based on the preference data of the user in the test set to the object to be recommended in the test set, a value of an evaluation index of a recommendation algorithm to be evaluated includes:

In the embodiment of the application, one part of preference data in the K groups can be designated as a test set for K times, and the rest K-1 groups are designated as training sets, so that different test sets are ensured to be selected each time. As an example, K sets of preference data may be numbered according to a time sequence, taking K ═ 10 as an example: x₁、X₂、X₃、X₄、X₅、X₆、X₇、X₈、X₉And X₁₀. The number can be X for the first time₁₀Is assigned to a test set, numbered X₁、X₂、X₃、X₄、X₅、X₆、X₇、X₈And X₉Is assigned as a training set and then a first initial evaluation index is calculated. The second time can number X₉Is assigned to a test set, numbered X₁、X₂、X₃、X₄、X₅、X₆、X₇、X₈And X₁₀Is assigned as a training set and then a second initial evaluation index is calculated. And averaging the ten initial evaluation indexes to obtain the evaluation index. The formula for calculating the evaluation index from the initial evaluation index is as in formula 1:

wherein the content of the first and second substances,

to make the group X_iWhen the preference data of (a) is specified as a test set, an initial evaluation index is calculated, and A is the calculated evaluation index.

In an optional manner of the embodiment of the present application, the evaluation index includes scoring accuracy and/or usage accuracy;

wherein scoring accuracy comprises at least one of:

root mean square error;

averaging the absolute errors;

wherein the usage accuracy comprises at least one of:

precision ratio;

true positive rate;

false positive rate;

and (4) true negative rate.

In the embodiment of the application, the evaluation indexes of the recommendation system may include a scoring accuracy index and a use accuracy, and the scoring accuracy index and the use accuracy index may be used as the evaluation indexes in the application.

The evaluation of the grading accuracy mainly measures the error between the predicted grading value and the actual grading value of the article by the recommending system, wherein the smaller the natural difference, the higher the accuracy, and the larger the difference, the lower the accuracy. The method mainly comprises Root-Mean-Square Error (RMSE) and Mean Absolute Error (MAE).

If in the recommendation system the actual user-rated data set is

Above, the user u has a score r for the object i_uiWhile in the test set R_eThe credit value of the upper user u to the object i is predicted to be r'_u,i，|R_eIs based on the test set R_eThe number of scores predicted by user u for item i.

Thus, the root mean square error of the predicted value and the actual value is shown in equation 2:

the mean absolute error is similar to the above and is a long-term error evaluation choice, as shown in equation 3:

the accuracy evaluation is mainly used for measuring whether the recommended objects of the recommended set are interested in the user or not, and not for focusing on whether the error of the prediction score is minimal or not.

As shown in table 3, the user has the following four possible situations for giving the object to be recommended:

TABLE 3

	Is recommended	Is not recommended
			Of interest	True positive number (tp)	False negative number (fn)
Is not interested in	False positive number (fp)	Number of true negatives (tn)

Whether the object to be recommended is interested by the user can be distinguished according to the quantitative score value of the preference data, if the quantitative score value is set to be higher than a certain value, the object to be recommended is determined to be interested by the user, and correspondingly, when the quantitative score value is not higher than the certain value, the object to be recommended is determined not to be interested by the user. The recommended objects can be objects recommended to the user in the objects to be recommended, and the objects not recommended are objects except the recommended objects in the objects to be recommended.

By this classification, we can count the number that falls into each bin to evaluate the effectiveness of the recommendation. Therefore, there are several indicators and calculation formulas to measure the accuracy of the algorithm:

formula 4 refers to the precision ratio, formula 5 refers to the true positive ratio, also called recall ratio or recall ratio, formula 6 refers to the false positive ratio, and formula 7 refers to the true negative ratio. Generally, in order to measure the recommendation accuracy of the system, a precision ratio-recall ratio comparison curve composed of formula 4 and formula 5 together or a false positive ratio-true negative ratio comparison curve composed of formula three and formula four may be used. The former is called precision-recall curve, and is mainly used for measuring the proportion of the items really liked by the user in the recommendation set and the item set. The latter is the roc (receiver operating charateristic) curve, which mainly measures the proportion of items that a user dislikes in the recommended set and the non-recommended set of items. The curve used can therefore be selected according to the actual situation.

As an example, when the true positive rate is calculated, the objects to be recommended may include A, B, C, D, E and F, where the recommended objects are A, B, C and D, the unrendered objects are E and F, the objects of interest to the user are C, D and E, and the objects of non-interest are A, B and F, then it may be determined that the true positive number tp is 2, and the false negative number fn is 1, and then the true positive rate at this time may be calculated to be two thirds.

Currently, the typical approach for most recommendation systems to evaluate is to simply measure the error of the recommendation algorithm based on the data set published by a third party on the network. Firstly, a data set is not user data of a real scene on a platform to which a recommendation system is to be applied, and a recommendation mechanism of the system essentially generates a recommendation set through prediction and recall, so important evaluation indexes such as precision, recall rate and the like of a user on the use condition of the recommendation set are reflected, and offline evaluation cannot be measured, so that the recommendation set is not adopted.

According to the scheme provided by the embodiment of the application, the recommended operation behaviors of the user on the platform can be simulated through the collected user product behavior data. Therefore, evaluation indexes such as precision and recall rate of the user on the use condition of the recommendation set can be calculated, and better measurement on the recommendation effect is facilitated.

In an optional manner of the embodiment of the application, if the evaluation index includes scoring accuracy and using accuracy, evaluating the recommendation effect of the recommendation algorithm to be evaluated based on the evaluation index includes:

determining a recommended hit weight for each user based on the usage accuracy;

In the embodiment of the present application, the recommendation hit weight for each user may be determined based on the usage accuracy, and specifically, the recall rate or the precision rate may be determined as the recommendation hit weight for each user. The predictive quantitative score value may then be determined based on the recommended hit weights and based on the root mean square error or the average absolute error.

In the embodiment of the application, the recommendation effect of the recommendation algorithm to be evaluated can be represented by the measured score value obtained through calculation, and it can be considered that when the value of the measured score value is high, the recommendation effect of the recommendation algorithm to be evaluated is good, and when the value of the measured score value is low, the recommendation effect of the recommendation algorithm to be evaluated is poor.

As an example, when the predicted quantitative score value is calculated based on the recommendation hit weight and the root mean square error, the specific calculation is as shown in equation 8:

wherein, RMSE_X1For the prediction of group X1, the quantitative score value, R_uFor a set of objects to be recommended to a user, I_uSet of objects, | R, actually used or scored by the user_u∩I_uIs |The number of products in the intersection of the set of products recommended to the user and the set of products actually used or scored by the user, W_uTo recommend hit weights, r_uiIs the value of credit of user u to object i, r'_u,iAnd predicting the score of the user u on the object i on the test set.

As an example, recommendation effects of multiple recommendation systems may be compared based on the scheme provided in the embodiment of the present application.

In particular, the behavior of the user experience recommendation system may be simulated first.

In order to evaluate the effect of the recommendation system offline, pre-collected user product behavior data is needed to simulate the operation behaviors recommended and used by the user on the platform. The offline data set may be modeled as two parts of a user performing a task and collecting recommendation feedback in an online test.

The scheme combines two aspects of time and user selection sequence to group data. Firstly, according to the time used in the whole data collection process, equally dividing into X parts according to the sequence of the time stamps. One of the parts is used as special test time, a simulation recommendation system is established during testing, product operation behaviors actually generated by a user in the period of time are hidden, then the system is used for recommending the product operation behaviors, the actual behaviors are assumed to be feedback of the user to a recommended product, and then the recommendation quality is evaluated.

And taking the other groups as training data of a recommendation algorithm model, generating a product recommendation set for the user in the test time or predicting the quantitative score value of the user on the product according to the data by the recommendation system, and then calculating an evaluation index to evaluate the recommendation effect or the prediction accuracy in multiple aspects.

As an example, by adopting cycle-based X-fold cross validation, a specific process for quantifying the recommendation effect of the recommendation algorithm to be evaluated is as follows:

(1) assumptions and control variables are determined.

Assumptions and control variables must be made before experiments are performed, and if the effect of the algorithm A is better than that of the algorithm B, the performance results of the two algorithms under the same data set need to be evaluated; if it is assumed that the algorithm performs better under parameter configuration a than under parameter configuration B, it is guaranteed that the same data set and the same recommended algorithm are used.

And recommending the value of the algorithm parameter. For example, a value of a preset threshold of the operation data; or in a neighbor-based collaborative filtering recommendation algorithm, the value of the number of nearest neighbors is taken; based on the value of the most similar article number in the article collaborative filtering algorithm; the value of N when TOP-N is used to generate the optimal recommended product set, etc.

(2) The data sets are grouped in an analog manner, the data sets are divided into X period groups according to equal time, and X pieces of preference data can be numbered: 1. 2, 3, 4, 5, 6, 7, 8, 9, and 10, and selecting a portion as a set of test cycles, e.g., X₁And the rest are (X)₂,X₃…,X₁₀) As a set of training periods.

(3) Test period set X₁The user product behavior record data is calculated and stored into a system user-activity quantitative scoring matrix

Similarly, training period set R_eAnd the quantitative scoring matrix with the same form is also stored, so that calculation by a recommendation algorithm is facilitated.

(4) And selecting a proper index for evaluating the accuracy of the recommendation algorithm, wherein the recall rate or precision rate and the root mean square error are adopted in the example.

Set R of test periods₀Operating a recommendation algorithm under preset parameters to generate a user u₁List of recommendations (e.u), according to the formula

Or

Calculating precision rate or recall rate as recommended hit weight W for user u_u。

Wherein U refers to the set of all users (U)₁,u₂…,u_m) Tp indicates the number of true positives, fp indicates the number of false negatives, and fn indicates the number of false negatives.

(5) Calculating and recording recommended product set R of user u through recommendation model_uAnd running a prediction algorithm of a recommendation model to quantitatively grade the user u_uCalculating a predicted quantitative score value r 'of the user u to the product i for each item'_u,i。

(6) Use of

Calculating a set of test periods X₁The quantitative score value of (a).

(7) And (3) cross validation: according to the K groups of data sets divided based on the period, one group is replaced to be used as a test period set, the rest are used as training period sets, the steps (3) to (6) are executed again to perform K-fold cross validation, RMSE of all period groups is calculated, the average value of the RMSE is taken as a final recommendation effect quantitative value RMSE under the conditions of the recommendation algorithm and the parameters_XThe specific calculation method is shown in formula 9:

wherein, RMSE_XK is the number of groups of the divided data sets,

is in group X_iThe RMSE is then calculated.

Based on the same principle as the method shown in fig. 1, fig. 2 shows a schematic structural diagram of an evaluation apparatus for recommendation effect provided by an embodiment of the present application, and as shown in fig. 2, the evaluation apparatus 20 for recommendation effect may include:

a preference data obtaining module 210, configured to obtain preference data of an object to be recommended by a user;

the evaluation index determining module 220 is configured to determine an evaluation index of a recommendation algorithm to be evaluated based on the preference data and based on a cross validation manner;

and the recommendation effect evaluation module 230 is configured to evaluate the recommendation effect of the recommendation algorithm to be evaluated based on the evaluation index.

The device provided by the embodiment of the application determines the evaluation index of the recommendation algorithm to be evaluated based on the preference data of the object to be recommended of the user and based on a cross validation mode, so that the recommendation effect of the recommendation algorithm to be evaluated is evaluated according to the evaluation index. Based on the scheme, the recommendation effect of the recommendation algorithm can be effectively evaluated, the recommendation algorithm can be effectively discriminated, and the effectiveness of the recommendation algorithm in actual application is guaranteed.

wherein scoring accuracy comprises at least one of:

root mean square error;

averaging the absolute errors;

wherein the usage accuracy comprises at least one of:

precision ratio;

true positive rate;

false positive rate;

and (4) true negative rate.

determining a recommended hit weight for each user based on the usage accuracy;

Optionally, the preference data obtaining module is specifically configured to:

acquiring an operation behavior of a user on an object to be recommended;

Optionally, the preference data obtaining module is specifically configured to:

initiating research DEMO of an object to be recommended;

Optionally, the apparatus further comprises:

the preprocessing module is used for preprocessing the preference data before determining the evaluation index of the recommendation algorithm to be evaluated based on the preference data and based on a cross validation mode, and the preprocessing is used for reducing the sparsity of the preference data;

the pre-treatment comprises at least one of:

performing dimensionality reduction on the preference data;

and performing noise reduction processing on the preference data.

It is understood that the above modules of the evaluation apparatus of a recommendation effect in the present embodiment have functions of implementing the corresponding steps of the evaluation method of a recommendation effect in the embodiment shown in fig. 1. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above. The modules can be software and/or hardware, and each module can be implemented independently or by integrating a plurality of modules. For the functional description of each module of the apparatus for evaluating a recommendation effect, reference may be specifically made to the corresponding description of the method for evaluating a recommendation effect in the embodiment shown in fig. 1, and details are not repeated here.

The embodiment of the application provides an electronic device, which comprises a processor and a memory;

a memory for storing operating instructions;

and the processor is used for executing the evaluation method of the recommendation effect provided by any embodiment of the application by calling the operation instruction.

As an example, fig. 3 shows a schematic structural diagram of an electronic device to which an embodiment of the present application is applicable, and as shown in fig. 3, the electronic device 2000 includes: a processor 2001 and a memory 2003. Wherein the processor 2001 is coupled to a memory 2003, such as via a bus 2002. Optionally, the electronic device 2000 may also include a transceiver 2004. It should be noted that the transceiver 2004 is not limited to one in practical applications, and the structure of the electronic device 2000 is not limited to the embodiment of the present application.

The processor 2001 is applied to the embodiment of the present application to implement the method shown in the above method embodiment. The transceiver 2004 may include a receiver and a transmitter, and the transceiver 2004 is applied to the embodiments of the present application to implement the functions of the electronic device of the embodiments of the present application to communicate with other devices when executed.

The Processor 2001 may be a CPU (Central Processing Unit), general Processor, DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit), FPGA (Field Programmable Gate Array) or other Programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 2001 may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs and microprocessors, and the like.

Bus 2002 may include a path that conveys information between the aforementioned components. The bus 2002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 2002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 3, but this does not mean only one bus or one type of bus.

The Memory 2003 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.

Optionally, the memory 2003 is used for storing application program code for performing the disclosed aspects, and is controlled in execution by the processor 2001. The processor 2001 is configured to execute the application program code stored in the memory 2003 to implement the method for evaluating the recommendation effect provided in any of the embodiments of the present application.

The electronic device provided by the embodiment of the application is applicable to any embodiment of the method, and is not described herein again.

Compared with the prior art, the embodiment of the application provides the electronic equipment, the evaluation index of the recommendation algorithm to be evaluated is determined based on the preference data of the object to be recommended of the user and based on a cross validation mode, and therefore the recommendation effect of the recommendation algorithm to be evaluated is evaluated according to the evaluation index. Based on the scheme, the recommendation effect of the recommendation algorithm can be effectively evaluated, the recommendation algorithm can be effectively discriminated, and the effectiveness of the recommendation algorithm in actual application is guaranteed.

The embodiment of the application provides a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the program is executed by a processor, the program realizes the recommendation effect evaluation method shown in the above method embodiment.

The computer-readable storage medium provided in the embodiments of the present application is applicable to any of the embodiments of the foregoing method, and is not described herein again.

Compared with the prior art, the embodiment of the application provides a computer-readable storage medium, which is based on preference data of an object to be recommended of a user and determines an evaluation index of a recommendation algorithm to be evaluated in a cross validation mode, so that the recommendation effect of the recommendation algorithm to be evaluated is evaluated according to the evaluation index. Based on the scheme, the recommendation effect of the recommendation algorithm can be effectively evaluated, the recommendation algorithm can be effectively discriminated, and the effectiveness of the recommendation algorithm in actual application is guaranteed.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for evaluating a recommendation effect, comprising:

acquiring preference data of an object to be recommended by a user;

2. The method according to claim 1, wherein the determining an evaluation index of a recommendation algorithm to be evaluated based on the preference data and based on a cross-validation manner comprises:

dividing the preference data into a training set and a test set based on a cross validation mode;

3. The method of claim 2, wherein the cross-validation based approach to partition the preference data into a training set and a test set comprises:

dividing the preference data into K groups based on a K-fold cross validation mode, designating one group in the K groups as a test set, and designating the preference data except the test set as a training set.

4. The method of claim 3, wherein the dividing the preference data into K groups based on a K-fold cross-validation approach comprises:

5. The method according to claim 3 or 4, wherein the calculating of the value of the evaluation index of the recommendation algorithm to be evaluated based on the preference prediction data and based on the preference data of the users in the test set on the objects to be recommended in the test set comprises:

6. The method according to any one of claims 1 to 4, wherein the evaluation index includes scoring accuracy and/or usage accuracy;

wherein the scoring accuracy comprises at least one of:

root mean square error;

averaging the absolute errors;

wherein the usage accuracy comprises at least one of:

precision ratio;

true positive rate;

false positive rate;

and (4) true negative rate.

7. The method of claim 6, wherein if the evaluation indicator includes scoring accuracy and usage accuracy, the evaluating the recommendation effect of the recommendation algorithm to be evaluated based on the evaluation indicator includes:

determining a recommended hit weight for each of the users based on the usage accuracy;

8. The method according to any one of claims 1 to 4, wherein the obtaining of the preference data of the user for the object to be recommended includes:

acquiring an operation behavior of a user on an object to be recommended;

and determining preference data of the user on the object to be recommended based on the operation behaviors.

9. The method according to claim 8, wherein determining the preference data of the user for the object to be recommended based on the operation behavior comprises:

10. The method according to any one of claims 1 to 4, wherein the obtaining of the preference data of the user for the object to be recommended includes:

initiating research DEMO on the object to be recommended;

and determining preference data of the user on the object to be recommended based on the result of the research on the DEMO.

11. The method according to any one of claims 1-4, wherein before determining the evaluation index of the recommendation algorithm to be evaluated based on the preference data and based on a cross-validation manner, the method further comprises:

preprocessing the preference data, wherein the preprocessing is used for reducing sparsity of the preference data;

the pre-treatment comprises at least one of:

performing dimension reduction processing on the preference data;

and performing noise reduction processing on the preference data.

12. An evaluation apparatus of a recommendation effect, comprising:

the evaluation index determining module is used for determining the evaluation index of the recommendation algorithm to be evaluated based on the preference data and a cross validation mode;

13. An electronic device comprising a processor and a memory;

the memory is used for storing operation instructions;

the processor is used for executing the method of any one of claims 1-11 by calling the operation instruction.

14. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out the method of any one of claims 1-11.