CN116932907A

CN116932907A - Verification method and device of recommendation model

Info

Publication number: CN116932907A
Application number: CN202310914336.8A
Authority: CN
Inventors: 石磊; 胡彬; 赵登; 何建杉
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-07-24
Filing date: 2023-07-24
Publication date: 2023-10-24

Abstract

The embodiment of the specification provides a verification method and device for a recommendation model. The method comprises the following steps: acquiring a recommendation model trained by using a training set and a pre-established knowledge graph, wherein each sample in the training set comprises a user, a service object and a behavior label of the user for the service object, and a plurality of entity nodes in the knowledge graph comprise nodes corresponding to the service object; and obtaining a preset index with a threshold parameter, and a test threshold to be set for the threshold parameter based on the test set, wherein the index value of the preset index depends on the hit condition of the recall object with the recommended ranking within the number indicated by the threshold parameter. A plurality of verification thresholds are determined based on the data amounts of the verification set and the test set, and the test thresholds. Determining a plurality of verification index values of the recommendation model aiming at preset indexes under a plurality of verification thresholds based on the verification set; and carrying out weighted summation processing on the verification index values to obtain a comprehensive verification index value.

Description

Verification method and device of recommendation model

Technical Field

One or more embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and an apparatus for verifying a recommendation model.

Background

With the development of society and the advancement of science and technology, more and more service platforms provide various kinds of products, services and the like for users so as to meet various demands of users in life and work. In order to help users to accurately locate interesting information in massive information and realize thousands of people and thousands of faces, a plurality of service platforms build machine learning models for recommendation so as to carry out personalized recommendation for the users.

To obtain a recommended model that can be put into use, the recommended model needs to be trained, validated, and tested first. Therefore, the embodiment of the specification provides a verification method of a recommended model, which can stably and accurately realize verification of model performance so as to help obtain the recommended model which is to show the best performance in a test stage.

Disclosure of Invention

The embodiment of the specification describes a verification method and device for a recommended model, which can stably and accurately realize verification of model performance.

According to a first aspect, a method of validating a recommendation model is provided. The method comprises the following steps:

acquiring a recommendation model trained by using a training set and a pre-established knowledge graph; each sample in the training set comprises a user and a business object, and a behavior label of the user for the business object; the plurality of entity nodes in the knowledge graph comprise nodes corresponding to the business objects. Acquiring a preset index with a threshold parameter, and testing a test threshold to be set for the threshold parameter based on a test set; the index value of the preset index depends on the hit condition of the recall object with the recommended ranking within the number indicated by the threshold parameter. A plurality of verification thresholds is determined based on the verification set and the data volume of the test set, and the test threshold. Based on the verification set, determining a plurality of verification index values of the recommendation model for the preset index under the plurality of verification thresholds. And carrying out weighted summation processing on the verification index values to obtain a comprehensive verification index value.

In one embodiment, the preset index is a hit@n index, and the threshold parameter is n in the hit@n index.

In one embodiment, a plurality of verification thresholds are determined: calculating a ratio between the data volume of the validation set and the data volume of the test set; and determining the product of the test threshold and the ratio as a central verification threshold, and classifying the product into the verification thresholds.

In a specific embodiment, determining the plurality of verification thresholds further comprises: determining a number of scaled-down authentication thresholds that are less than the central authentication threshold, the scaled-down authentication thresholds being included in the plurality of authentication thresholds; and/or determining a number of enlarged verification thresholds greater than the central verification threshold, the plurality of verification thresholds being included.

In a specific embodiment, the weighting and summing processing is performed on the verification index values to obtain a comprehensive verification index value, which includes: acquiring a plurality of weights preset for the verification thresholds, wherein each weight is inversely related to the distance between the corresponding verification threshold and the central verification threshold; and carrying out weighted summation processing on the verification index values by using the weights to obtain the comprehensive verification index value.

In one embodiment, after performing weighted summation processing on the plurality of verification index values to obtain a comprehensive verification index value, the method further includes: under the condition that the comprehensive verification index value does not reach a preset standard, adjusting training super parameters related to the recommendation model; and continuously training the recommendation model based on the adjusted training super-parameters.

In one embodiment, after performing weighted summation processing on the plurality of verification index values to obtain a comprehensive verification index value, the method further includes: ending training when the comprehensive verification index reaches a preset standard; and determining a test index value of the recommended model aiming at the preset index under the test threshold by utilizing the test set.

According to a second aspect, a verification device of a recommendation model is provided. The device comprises:

the training module is configured to acquire a recommended model trained by utilizing a training set and a pre-established knowledge graph; each sample in the training set comprises a user and a business object, and a behavior label of the user for the business object; the plurality of entity nodes in the knowledge graph comprise nodes corresponding to the business objects. The acquisition module is configured to acquire preset indexes with threshold parameters and test thresholds which are to be set for the threshold parameters based on tests performed by the test set; the index value of the preset index depends on the hit condition of the recall object with the recommended ranking within the number indicated by the threshold parameter. A determination module configured to determine a plurality of verification thresholds based on the verification set and the data volume of the test set, and the test threshold. And the verification module is configured to determine a plurality of verification index values of the recommendation model aiming at the preset index under the plurality of verification thresholds based on the verification set. And the weighting module is configured to carry out weighted summation processing on the verification index values to obtain a comprehensive verification index value.

According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.

According to a fourth aspect, there is provided a computing device comprising a memory having executable code stored therein and a processor which when executing the executable code implements the method of the first aspect.

By adopting the method and the device provided by the embodiment of the specification, the model which is best or nearly best in performance on the test set can be stably and accurately selected through the verification set on the premise of not observing the test set.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments below are briefly introduced, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram showing movie recommendation using knowledge maps;

FIG. 2 shows a workflow diagram for training, validating and testing a recommendation model;

FIG. 3 is a flow chart of a method for verifying a recommendation model disclosed in an embodiment of the present disclosure;

fig. 4 is a schematic diagram showing the structure of a verification device of the recommendation model disclosed in the embodiment of the present specification.

Detailed Description

The following describes the scheme provided in the present specification with reference to the drawings.

In the internet era of information explosion, the recommendation model can understand personalized requirements and preferences of users and help the users to screen goods and services of interest. By introducing a Knowledge Graph (KG) as auxiliary information into the recommendation model, the prediction performance of the recommendation model can be effectively improved.

In order to facilitate understanding, a knowledge graph under a recommended scene, a principle of recommending by using the knowledge graph, and a mode of constructing a recommendation model based on the knowledge graph are briefly described below.

Generally, KG is a directed heterogeneous graph representing complex relationships between entities, where nodes represent entities and edges represent relationships between entities. A KG typically includes a plurality of triples, shaped as a head (relation), a tail (tail), indicating that there is some relation between the head entity and the tail entity.

For a knowledge graph constructed in a recommendation scene, the entity (entity) at least comprises a business object (item) to be recommended, such as goods or services. By way of example, the merchandise may be clothing, cosmetics, electronic books, cloud disk storage, movie tickets, etc., and the services may be travel services, public number subscription services, attention services for hotspot personas in a network platform, etc. For brevity of description, the business object entity in the knowledge graph is hereinafter simply referred to as an object entity.

It will be appreciated that other entities, or non-object entities, may be included in the knowledge-graph. Whether the relationship exists among different entities and the specific content of the relationship can be flexibly set by staff according to collected data, actual requirements, experience and the like. Illustratively, the knowledge-graph includes triples: (some movie name, director, little white), which indicates that the director of some movie name is little white, where some movie name is an object entity, little white is a non-object entity, and director is a relationship.

By introducing the knowledge graph, the interest of the user can be reasonably predicted. Taking movie recommendation as an example, fig. 1 shows a schematic diagram of a concept of movie recommendation using a knowledge graph, where a movie (e.g., movie a, movie B, and movie C) watched by a user may be connected to other movies by entities (e.g., science fiction, character a, character B, and character C) in KG, and by reasonable inference, it may be considered that the user may also like movies (e.g., movie D, movie E, and movie F) closely connected to the movie. Therefore, from the attributes and features of the movie, a KG can help reasonably infer the interests of the user.

There are various ways to construct the recommendation model based on the knowledge graph. It should be understood that, in the construction process, a training sample set (or training set) is also needed, where each training sample includes characteristics of the user and the service object, and a behavior label of the user for the service object. The user characteristics are obtained by legal acquisition after the associated user confirms the authorization. All business objects involved in the training set are contained in the knowledge-graph.

In one embodiment, the user characteristics include static attribute characteristics such as date of birth, constellation, occupation, shipping address, and the like. In another embodiment, the user characteristics include dynamic network behavior characteristics such as network activity, consumption category, transaction amount, and the like. In one embodiment, the business object is a commodity, in which case the commodity characteristics may include a commodity ID, a commodity category, vendor information, commodity cost, and the like. The commodity is illustratively a movie ticket, and in this case, the commodity features may include information about the photographer of the movie, the movie title, the showing time of the movie, the selling price of the movie ticket, and the like. In another embodiment, the business object is a service, at which time the service characteristics may include a service category, service provider information, service cost, and the like.

In one embodiment, the behavior tags indicate a user's score for a business object. In another embodiment. The behavior tag indicates whether the user makes a predetermined behavior with respect to the business object. In one example, the business object is an advertisement, at which point the predetermined action may be a click. In another example, the business object is a commodity, at which point the predetermined action may be a purchase. In yet another example, the business object is an application APP, at which point the predetermined action may be a download.

Based on the above, in one construction or training mode of the recommendation model, separate embedding processing can be performed based on the knowledge graph and the training set, respectively, and then fusion and prediction are performed based on the result of the embedding processing, so that the recommendation model is trained according to the prediction result and the behavior label.

Further, in one embodiment, the design recommendation model includes a knowledge-graph embedding layer, a user embedding layer, an object embedding layer, a fusion layer, and a prediction layer. Thus, for any first training sample in the training set, processing first user characteristics in the first training sample by using a user embedding layer to obtain a first user embedding vector, and processing first object characteristics in the first training sample by using an object embedding layer to obtain a first object embedding vector; and embedding the knowledge graph by utilizing the knowledge graph embedding layer to obtain a plurality of node embedding vectors corresponding to the plurality of entity nodes, wherein the plurality of node embedding vectors comprise a first node embedding vector corresponding to the first business object in the first training sample. Further, the fusion layer is utilized to fuse the first user embedded vector, the first object embedded vector and the first node embedded vector to obtain a fusion vector. And then, the fusion vector is processed by the prediction layer to obtain a first prediction result, and the recommendation model is trained by the first prediction result and the first behavior label in the first training sample.

It should be noted that, the terms "first" in the "first training sample", "first user feature", and the like, and the terms "second" and the like in other places herein are all for distinguishing similar things, and do not have other limitation effects such as ordering; each layer in the recommendation model may be implemented using a neural network layer. For example, the knowledge graph embedding layer may be implemented by using a graph neural network (Graph Neural Networks, GNN for short).

In another training mode of the recommendation model, the knowledge graph and the training set can be integrated first, and then integrated into a graph structure, and then the recommendation model is trained based on the integrated graph structure.

And integrating the knowledge graph and the training set. For example, a relationship network diagram may be established according to a training set and the like, where the relationship network diagram includes a plurality of user nodes corresponding to a plurality of users related to the training set, and further includes a plurality of object nodes corresponding to a plurality of service objects related to the training set; the connection edges are formed due to the association relation between the different nodes, for example, the social relation between the user nodes is established, and the interaction between the user and the service object nodes is performed to establish the connection edges. For example, assuming that the plurality of business objects related to the training set are a plurality of commodities, the behavior label indicates whether the corresponding user purchases the corresponding commodity, at this time, a user-commodity bipartite graph may be established, and an undirected connection edge is established between the user having a purchase relationship and the commodity. Further, considering that the knowledge graph and the relational network graph constructed based on the training set have object nodes corresponding to the same business object, based on the knowledge graph and the relational network graph, the knowledge graph and the relational network graph can be combined, so that the two graph structures are integrated into one graph structure.

In one embodiment, the design recommendation model includes a graph embedding layer and a prediction layer, the graph embedding layer is used for performing graph embedding processing on the integrated graph structure, so that a second user embedding vector of a second user and a second object embedding vector of a second service object in any second training sample can be obtained, and then prediction can be performed on the prediction layer based on the second user embedding vector and the second object embedding vector to obtain a second prediction result; thereby training the recommendation model based on the second prediction result and the second behavior label in the second training sample. Illustratively, the graph embedding layer is implemented based on the GNN network, and the prediction layer uses the similarity between the user embedding vector and the object embedding vector as a corresponding prediction result.

In the above, an exemplary description is given of a manner of training a recommendation model based on a knowledge-graph.

In practice, in order to obtain a recommended model that can be put into use, verification and testing of the recommended model is required in addition to training the recommended model. For training and verification, the two stages may be performed sequentially or alternately, with the test stage being performed last.

FIG. 2 shows a workflow diagram for training, validating and testing a recommendation model. As shown in fig. 2, the acquired data set is first divided into a training set, a validation set and a test set. The training set is used for training the model, the verification set is used for verifying the performance of the model before final testing so as to adjust the network structure or control the superparameter of the complexity and the fitting degree of the model, and the testing set is only used for evaluating the performance of the final model.

The most commonly adjusted hyper-parameter in practice is the training round number (epoch), i.e. one of the models of different fitting degrees is selected for the final test that has the best generalization performance. However, how to choose a model with a verification index is a considerable problem.

In a recommended scenario, a test evaluation index for a test set may be designed to measure the condition of hit of a user's object of interest in a plurality of recall objects. At this time, one verification strategy is to use a common machine learning index as a model verification index, for example, a mean square error (Mean Square Error, abbreviated as MSE), an Accuracy (Accuracy), an Area Under Curve (AUC), etc., however, these machine learning indexes are difficult to characterize the test evaluation index, and the recommended model may perform well on the verification index but perform poorly on the test evaluation index.

Another verification strategy is to verify the recommendation model in complete conformity with the test evaluation index. However, this requires that the scale and distribution of the validation and test sets be consistent, however in practice this theoretical premise is generally not satisfied: to reduce computational overhead, the validation set data volume may be smaller and, in turn, result in a distribution that is more random and may be more diverse from the test set. At this point, the recommended model is validated using the test evaluation index on the validation set, and the model may still perform well but perform poorly on the test.

Based on the above observations and analyses, the embodiments of the present specification disclose a scheme of designing a plurality of verification indexes based on test evaluation indexes, thereby fusing the plurality of verification index values to perform comprehensive verification, so that a recommended model that will perform best or near best on a test set is stably selected through the verification set.

The following describes the steps of the implementation of the above scheme in conjunction with fig. 3. Fig. 3 is a schematic flow chart of a verification method of a recommendation model disclosed in an embodiment of the present disclosure, where an execution subject of the method may be any apparatus, platform, server or device cluster with computing and processing capabilities, for example, an e-commerce platform. As shown in fig. 3, the method comprises the steps of:

step S310, acquiring a recommendation model trained by using a training set and a pre-established knowledge graph; step S320, obtaining a preset index with a threshold parameter and a test threshold to be set for the threshold parameter based on a test performed by a test set; the index value of the preset index depends on the hit condition of recall objects with recommended ranking within the number indicated by the threshold parameter; step S330, determining a plurality of verification thresholds based on the verification set, the data volume of the test set and the test threshold; step S340, based on the verification set, determining a plurality of verification index values of the recommendation model aiming at the preset index under the plurality of verification thresholds; and step S350, carrying out weighted summation processing on the verification index values to obtain a comprehensive verification index value.

The development of the above steps is described as follows:

in step S310, a recommendation model trained using a training set and a pre-established knowledge-graph is acquired. Each sample in the training set comprises a user and a service object, the behavior label of the user for the service object, and a plurality of entity nodes in the knowledge graph comprise nodes corresponding to the service object.

It should be noted that, the description of step S310 may be referred to the related description in the foregoing embodiments, and will not be repeated.

In step S320, a preset index having a threshold parameter is acquired, and a test threshold to be set for the threshold parameter based on the test performed by the test set is acquired. It should be noted that, step S320 may be performed before, after, or simultaneously with step S310, which is not limited.

The index value of the preset index depends on the hit condition of the recall object with the recommended ranking within the number indicated by the threshold parameter. It should be understood that, for any user, the m recommendation degrees of m candidate recommended objects relative to the user may be calculated through a recommendation model (herein, the higher the recommendation degree is, the more interesting the user is for example illustrated), and then the m recommendation degrees are ranked from high to low, so that the recommendation ranks of the m candidate recommended objects are correspondingly obtained, further, the index of the performance of the metric model may be designed based on the hit condition of the candidate recommended objects ranked in the first n bits on the actual interesting object of the user, that is, the preset index, where n corresponds to the threshold parameter, and the actual interesting object of the user may be obtained through the behavior label, for example, if the user purchases a certain commodity, the user is interested in the commodity, and for example, if the user clicks a certain advertisement, the user is interested in the advertisement.

The specific calculation formulas of the preset indexes are various, and two are written here as examples, which are not exhaustive. In one example, the preset index is calculated as:

in the formula (1), f@n represents a preset index, S represents a set formed by a plurality of users, |s| represents the total number of users in the user set, count (rank) _i And n) is equal to or less, and represents the number of objects hitting the object of interest of the user in the first n objects recommended to the ith user by the recommendation model.

In another example, the preset index is calculated as:

in formula (2), f@n represents a preset index, S represents a user set, and S represents the total number of users in the user set;and (3) indicating whether the object of interest of the user is hit in the first n objects recommended by the recommendation model to the ith user or not for indicating the function, if so, taking 1 as the function value, otherwise, taking 0 as the function value. It is to be understood that the index in formula (2) can also be denoted as hit@n.

With respect to the test threshold n _test Because the test set is used for evaluating the quality of the final model, the actual requirement reflects the use performance after being put into actual use, the test set can be set according to the number of recommended objects displayed when the test set is recommended to the user each time under the actual service scene, for example, 5 advertisement positions are arranged in an application interface, and a test threshold value n can be correspondingly set _test ＝5。

From the above, a preset index f@n can be obtained, and a test threshold n set for its threshold parameter n _test 。

Step S330, determining a plurality of verification thresholds based on the data amounts of the verification set and the test set, and the test thresholds.

It should be noted that if the test evaluation index is directly used as the verification index to verify the model, the verification set and the test set are required to have consistent sizes and distributions, which is not usually satisfied in practice. At this point, f@n on the validation set has difficulty characterizing f@n on the test set.

First, when the validation set data size is smaller than the test set, it is obviously easier to rank the top n on the validation set (because fewer negative examples compete with it), thereby proposing: the verification index that best characterizes test set f@n should be f@ (n·r), where r is the sample size ratio of the verification set to the test set.

Accordingly, in one embodiment, this step may include: calculating data quantity N of verification set _val And data volume N of test set _test The ratio r between, i.eAnd then test threshold value n _test The product of the ratio r is determined as a central verification threshold n _{val_c} Fall under multiple verification thresholds. For the measurement of data volume in the validation set and the test set, in one example, the number of samples therein may be taken as the data volume; in another example, the number of nodes or edges in the knowledge-graph referred to therein may be taken as the data volume.

Then, the test threshold n is scaled by the data amount _test The problem of inconsistent scale of the verification set and the test set can be solved, but the problem of inconsistent distribution still exists. In practice, in order to make the test performance possibly reflect the model performance in practical applications, it is assumed that the distribution of the test set is unpredictable and controllable as the real distribution in the future, and it is therefore also proposed that: deviations caused by non-uniformity of distribution can be reduced by weakening the randomness.

Accordingly, in one embodiment, in order to reduce the contingency of model verification, a method of fusing a plurality of verification thresholds is adopted to perform comprehensive verification, and precision and robustness are both considered to select a model. Specifically, the threshold n is verified at the center _{val_c} A number of verification thresholds are set in the neighborhood of (a).

In a particular embodiment, it is determined that less than the center verification threshold n _{val_c} Several shrinkage tests of (2)The certification threshold is classified into a plurality of verification thresholds. In another particular embodiment, it is determined that greater than the center verification threshold n _{val_c} Is included in the plurality of verification thresholds.

Exemplary, the threshold value n is verified at the center _{val_c} A plurality of verification thresholds are set at equal intervals or unequal intervals in the neighborhood of the computer, and the verification thresholds are classified into a plurality of verification thresholds. In a specific example, assume a center verification threshold n _{val_c} =8, at this time, two verification thresholds smaller than 8 and two thresholds larger than 8, such as 4, 6, 10, 12, may be set at equal intervals (such as 2).

From the above, a plurality of verification thresholds { n }, can be determined _{val_i} } _[k] Where k represents the total number of authentication thresholds.

In step S340, based on the verification set, a plurality of verification index values of the recommendation model for the preset index under a plurality of verification thresholds are determined. It will be appreciated that based on a plurality of verification thresholds { n } _{val_i} } _[k] And the definition of preset indexes, a plurality of verification indexes { f@n) _{val_i} } _[k] . Accordingly, based on the calculation formula of the preset index, a plurality of verification index values can be calculated.

To determine a recommendation model based on the validation set at any validation threshold n _{val_i} The following describes an example of an index value of a preset index. Further, assuming that the calculation formula of the preset index is the above formula (2), at this time, all users related to the verification set may be formed into a user set, all business objects related to the verification set may be formed into an alternative object set, further, based on the recommendation model, recommendation scores (or recommendation degree, interest degree, etc.) of any user in the user set on all m alternative objects in the alternative object set may be obtained, and then, the m recommendation scores may be ranked from top to bottom, thereby obtaining the top n _{val_i} Bit candidates, based on the validation samples in the validation set, can be n _{val_i} Whether or not the true object of interest of the user is hit in the bit candidate object, thereby the verification index value f@n can be calculated based on the formula (2) _{val_i} . For example, assume that there are 1000 users in the user set and 650 usersIs hit, at which point f@n can be calculated _{val_i} ＝0.65。

From this, a plurality of verification index values can be obtained, which are also referred to herein as { f@n } _{val_i} } _[k] 。

Step S350, carrying out weighted summation processing on the verification index values to obtain a comprehensive verification index value.

In one embodiment, a plurality of weights preset for a plurality of verification thresholds may be acquired first, wherein each weight is inversely related to a distance between a corresponding verification threshold and a center verification threshold; and then carrying out weighted summation processing on the verification index values by using the weights to obtain the comprehensive verification index value. By way of example, the comprehensive verification index may be calculated using the following formula:

in another embodiment, the multiple verification index values may be directly summed on average to obtain the integrated verification index value.

By the method, the comprehensive verification index value can be obtained, and the accurate and stable verification of the performance of the recommended model is realized.

According to another aspect of the embodiment, after step S350, the method disclosed in the embodiment of the present specification may further include: judging whether the comprehensive verification index value reaches a preset standard or not, if so, judging that the comprehensive verification index value reaches the preset standard, otherwise, judging that the comprehensive verification index value does not reach the preset standard; or if the error of the two continuous comprehensive verification index values is smaller than the error threshold value, the preset standard is considered to be met, otherwise, the preset standard is considered not to be met.

Further, under the condition that the comprehensive verification index value does not reach the preset standard, firstly adjusting training super parameters related to the recommendation model, such as training rounds, the number of neurons in the recommendation model and the like; and then, continuously training the recommendation model based on the adjusted training super-parameters.

And under the condition that the comprehensive verification index value reaches a preset standard, finishing training, taking the recommended model as a final model to be put into use, and determining a test index value of the final model aiming at a preset index under a test threshold by utilizing a test set.

In summary, by adopting the verification method of the recommended model disclosed in the embodiment of the specification, a single core verification index capable of accurately representing the test evaluation index is calculated by scaling according to the data size ratio of the test set and the verification set, and a comprehensive verification index capable of stably and accurately representing the test evaluation index is calculated by weighting and fusing a plurality of verification indexes, so that a model which is best or nearly best in performance on the test set is stably and accurately selected through the verification set on the premise of not observing the test set.

Corresponding to the above verification method, the embodiment of the present specification also discloses a verification device. Fig. 4 is a schematic diagram showing the structure of a verification device of the recommendation model disclosed in the embodiment of the present specification. As shown in fig. 4, the apparatus 400 includes:

a training module 410 configured to obtain a recommendation model trained using a training set and a pre-established knowledge-graph; each sample in the training set comprises a user and a business object, and a behavior label of the user for the business object; the plurality of entity nodes in the knowledge graph comprise nodes corresponding to the business objects. An obtaining module 420 configured to obtain a preset index having a threshold parameter, and a test threshold to be set for the threshold parameter based on a test performed by a test set; the index value of the preset index depends on the hit condition of the recall object with the recommended ranking within the number indicated by the threshold parameter. A determination module 430 is configured to determine a plurality of verification thresholds based on the verification set and the data volume of the test set, and the test threshold. A verification module 440 is configured to determine a plurality of verification indicator values for the preset indicator for the recommendation model at the plurality of verification thresholds based on the verification set. And the weighting module 450 is configured to perform weighted summation processing on the verification index values to obtain a comprehensive verification index value.

In one embodiment, the determination module 430 is specifically configured to: calculating a ratio between the data volume of the validation set and the data volume of the test set; and determining the product of the test threshold and the ratio as a central verification threshold, and classifying the product into the verification thresholds.

In a specific embodiment, the determination module 430 is further configured to: determining a number of scaled-down authentication thresholds that are less than the central authentication threshold, the scaled-down authentication thresholds being included in the plurality of authentication thresholds; and/or determining a number of enlarged verification thresholds greater than the central verification threshold, the plurality of verification thresholds being included.

In a specific embodiment, weighting module 450 is specifically configured to: acquiring a plurality of weights preset for the verification thresholds, wherein each weight is inversely related to the distance between the corresponding verification threshold and the central verification threshold; and carrying out weighted summation processing on the verification index values by using the weights to obtain the comprehensive verification index value.

In one embodiment, the apparatus 400 further comprises a retraining module configured to: under the condition that the comprehensive verification index value does not reach a preset standard, adjusting training super parameters related to the recommendation model; and continuously training the recommendation model based on the adjusted training super-parameters.

In another embodiment, the apparatus 400 further comprises a test module configured to: ending training when the comprehensive verification index reaches a preset standard; and determining a test index value of the recommended model aiming at the preset index under the test threshold by utilizing the test set.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 3.

According to an embodiment of yet another aspect, there is also provided a computing device including a memory having executable code stored therein and a processor that, when executing the executable code, implements the method described in connection with fig. 3. Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention in further detail, and are not to be construed as limiting the scope of the invention, but are merely intended to cover any modifications, equivalents, improvements, etc. based on the teachings of the invention.

Claims

1. A method of validating a recommendation model, comprising:

acquiring a recommendation model trained by using a training set and a pre-established knowledge graph; each sample in the training set comprises a user and a business object, and a behavior label of the user for the business object; the plurality of entity nodes in the knowledge graph comprise nodes corresponding to the business objects;

acquiring a preset index with a threshold parameter, and testing a test threshold to be set for the threshold parameter based on a test set; the index value of the preset index depends on the hit condition of recall objects with recommended ranking within the number indicated by the threshold parameter;

determining a plurality of verification thresholds based on the verification set and the data volume of the test set, and the test threshold;

determining a plurality of verification index values of the recommendation model for the preset index under the plurality of verification thresholds based on the verification set;

and carrying out weighted summation processing on the verification index values to obtain a comprehensive verification index value.

2. The method of claim 1, wherein the preset indicator is a hit@n indicator, and the threshold parameter is n in the hit@n indicator.

3. The method of claim 1 or 2, wherein a plurality of verification thresholds are determined:

calculating a ratio between the data volume of the validation set and the data volume of the test set;

and determining the product of the test threshold and the ratio as a central verification threshold, and classifying the product into the verification thresholds.

4. The method of claim 3, wherein determining a plurality of verification thresholds further comprises:

determining a number of scaled-down authentication thresholds that are less than the central authentication threshold, the scaled-down authentication thresholds being included in the plurality of authentication thresholds; and/or the number of the groups of groups,

a number of enlarged verification thresholds greater than the central verification threshold are determined, falling into the plurality of verification thresholds.

5. The method of claim 3, wherein performing a weighted summation process on the plurality of verification index values to obtain a comprehensive verification index value comprises:

acquiring a plurality of weights preset for the verification thresholds, wherein each weight is inversely related to the distance between the corresponding verification threshold and the central verification threshold;

and carrying out weighted summation processing on the verification index values by using the weights to obtain the comprehensive verification index value.

6. The method of claim 1, wherein after performing weighted summation processing on the plurality of verification index values to obtain a comprehensive verification index value, the method further comprises:

under the condition that the comprehensive verification index value does not reach a preset standard, adjusting training super parameters related to the recommendation model;

and continuously training the recommendation model based on the adjusted training super-parameters.

7. The method of claim 1, wherein after performing weighted summation processing on the plurality of verification index values to obtain a comprehensive verification index value, the method further comprises:

ending training when the comprehensive verification index reaches a preset standard;

and determining a test index value of the recommended model aiming at the preset index under the test threshold by utilizing the test set.

8. A verification apparatus of a recommendation model, comprising:

the training module is configured to acquire a recommended model trained by utilizing a training set and a pre-established knowledge graph; each sample in the training set comprises a user and a business object, and a behavior label of the user for the business object; the plurality of entity nodes in the knowledge graph comprise nodes corresponding to the business objects;

the acquisition module is configured to acquire preset indexes with threshold parameters and test thresholds which are to be set for the threshold parameters based on tests performed by the test set; the index value of the preset index depends on the hit condition of recall objects with recommended ranking within the number indicated by the threshold parameter;

a determining module configured to determine a plurality of verification thresholds based on the verification set and the data volume of the test set, and the test threshold;

a verification module configured to determine a plurality of verification index values of the recommendation model for the preset index under the plurality of verification thresholds based on the verification set;

and the weighting module is configured to carry out weighted summation processing on the verification index values to obtain a comprehensive verification index value.

9. A computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed in a computer, causes the computer to perform the method of any of claims 1-7.

10. A computing device comprising a memory and a processor, wherein the memory has executable code stored therein, which when executed by the processor, implements the method of any of claims 1-7.