CN111444438B

CN111444438B - Method, device, equipment and storage medium for determining quasi-recall rate of recall strategy

Info

Publication number: CN111444438B
Application number: CN202010212112.9A
Authority: CN
Inventors: 魏龙; 王娜; 武桓州
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-03-24
Filing date: 2020-03-24
Publication date: 2023-09-01
Anticipated expiration: 2040-03-24
Also published as: CN111444438A

Abstract

The disclosure provides a method, a device, equipment and a storage medium for determining a quasi-recall rate of a recall strategy, which relate to an intelligent recommendation technology and comprise the following steps: vector data for online recall is generated according to the recall strategy; simulating an online recall process according to the vector data and the existing real user click data, and determining a recall result; and determining the quasi-recall rate corresponding to the recall strategy according to the recall result. In the method, the device, the equipment and the readable storage medium provided by the disclosure, the recall result can be determined in an off-line simulation mode, and then the quasi-recall rate is determined according to the recall result, so that the system is not required to be acquired on line, and the efficiency of determining the quasi-recall rate of the recall strategy can be improved.

Description

Method, device, equipment and storage medium for determining quasi-recall rate of recall strategy

Technical Field

The present disclosure relates to computer technology, and more particularly, to intelligent recommendation technology.

Background

The Internet can provide massive information for users, and the intelligent recommendation system can quickly help the users to find out interesting information.

The recommendation of the recommendation system is realized by establishing the association relation between people and articles, taking surrounding data, algorithm and system as cores, applying massive data information to corresponding recall strategies and ordering strategies by utilizing the algorithm of the recommendation system, and realizing the personalized recommendation process provided for users.

Wherein the recall policy is used to generate a recommended candidate set that matches the user's candidate set in the raw data by algorithms and rules. And the sorting strategy sorts the candidate sets generated by the recall strategy according to different algorithm models to obtain a recommended candidate set list.

Because various recall strategies exist at present, recommendation information is required to be generated according to the recall strategies and the sorting strategies in the prior art, the recommendation information is applied on line, and the quasi-recall rate of the recall strategies is determined based on the result of the on-line application, so that the effect of the recall strategies is determined by utilizing the quasi-recall rate.

However, the manner of online testing requires a long time, resulting in lower efficiency in evaluating the recall strategy, and therefore, how to improve the evaluation efficiency of the recall strategy is a technical problem that needs to be solved by those skilled in the art.

Disclosure of Invention

The disclosure provides a method, a device, equipment and a storage medium for determining a quasi-recall rate of a recall strategy so as to improve the evaluation efficiency of the recall strategy.

A first aspect of the present disclosure provides a method for determining a quasi-recall rate of a recall policy, including:

vector data for online recall is generated according to the recall strategy;

simulating an online recall process according to the vector data and the existing real user click data, and determining a recall result;

And determining the quasi-recall rate corresponding to the recall strategy according to the recall result.

In an alternative embodiment, the vector data includes a user vector;

if the recall policy includes collaborative filtering based on a user, simulating an online recall process according to the vector data and the existing real user click data, and determining a recall result, including:

determining similar users corresponding to the first preset user according to the user vector;

acquiring first click content information of the first preset user and second click content information of the similar user in a preset time period according to the real user click data;

and determining the recall result according to the first click content information and the second click content information.

In this alternative embodiment, the determined user vector may be used to simulate the process of content recommendation to the first preset user based on collaborative filtering of the user, thereby obtaining recall results. Since in the recommendation system, the content is recommended to the user according to the user vector, this embodiment can simulate the actual recommendation process.

In an optional embodiment, the determining, according to the user vector, a similar user corresponding to the first preset user includes:

Determining a preset vector of the first preset user, and determining a corresponding similar vector in the user vectors according to the preset vector;

and determining the user corresponding to the similar vector as the similar user corresponding to the preset vector.

In this alternative embodiment, the similar user of the first preset user may be determined according to the determined user vector, and then the similar user that is determined to match the first preset user when recommending the content to the first preset user may be simulated using the vector data corresponding to the recall policy.

In an alternative embodiment, the determining the recall result according to the first click content information and the second click content information includes:

comparing the click time of the first click content and the second click content with a first time threshold;

and determining retrieval information and content related information corresponding to the recall strategy according to the comparison result.

In such an alternative embodiment, the existing real click data may be used to determine the click content of the first preset user and the click content of similar users, and thus determine whether the content retrieved by the system is relevant to the first preset user based on these contents.

In an optional implementation manner, the determining, according to the comparison result, the retrieval information and the content-related information corresponding to the recall policy includes:

determining the retrieval information according to the first click content and the second click content of which the click time is greater than the first time threshold;

and determining the content correlation according to the first click content and the second click content of which the click time is smaller than the first time threshold.

The determining the search information according to the first click content and the second click content with the click time greater than the first time threshold comprises the following steps:

screening out a first preset number of first screening contents with the largest click time from first click contents with the click time larger than the first time threshold, and screening out a first preset number of second screening contents with the largest click time from second click contents with the click time larger than the first time threshold;

determining the retrieval information according to the first screening content and the second screening content;

the determining the content related information according to the first click content and the second click content with the click time smaller than the first time threshold comprises the following steps:

Screening out a second preset number of third screening contents with the minimum click time from first click contents with the click time smaller than the first time threshold, and screening out a second preset number of fourth screening contents with the minimum click time from second click contents with the click time smaller than the first time threshold;

and determining the content related information according to the third screening content and the fourth screening content.

In this alternative embodiment, the first click content and the second click content are respectively divided by the first time threshold, so that when the content is recommended to the first preset user based on the recall policy at the time, the content related information, the retrieval information, that is, the related content retrieved by the system, the non-related content retrieved, and the related content not retrieved, in the recall result can be simulated.

In an alternative embodiment, the vector data comprises a content vector;

if the recall policy includes content-based collaborative filtering, simulating an online recall process according to the vector data and existing real user click data, and determining a recall result, including:

acquiring third click content corresponding to a second time threshold value of a second preset user;

Determining a related content candidate set and a search content candidate set according to the third click content;

and determining the recall result according to the content vector, the related content candidate set and the retrieval content candidate set which are included in the vector data.

In this alternative embodiment, the determined content vector may be used to simulate the process of content recommendation to the second preset user based on collaborative filtering of the content, thereby obtaining recall results. Since in the recommendation system, the content is recommended to the user based on the content vector, this embodiment can simulate the actual recommendation process.

In an alternative embodiment, the determining the recall result according to the content vector, the related content candidate set, and the retrieved content candidate set included in the vector data includes:

determining content related information according to the content vector and the related content candidate set;

and determining retrieval information according to the retrieval content candidate set.

In such an alternative embodiment, more relevant content information may be simulated in combination with the content vector, the content candidate set, and the search information may be determined from the search content candidate set of the second preset user, resulting in recall results of this simulation process.

In an optional implementation manner, the determining, according to the recall result, the quasi-recall rate corresponding to the recall policy includes:

determining a first quantity of searched related contents, a second quantity of searched irrelevant contents and a third quantity of non-searched related contents according to the content related information and the search information;

and determining the quasi-calling rate according to the first quantity, the second quantity and the third quantity.

The determining the quasi-recall according to the first quantity, the second quantity, and the third quantity includes:

determining a ratio of the first quantity to a sum of the first quantity and the second quantity as an accuracy rate;

determining a ratio of the first quantity to a sum of the first quantity and the third quantity as a recall.

In the alternative implementation mode, the relevant content, the irrelevant content and the relevant content which are not searched in the recall result can be counted, based on the relevant content, the accuracy and the recall rate corresponding to the recall strategy can be determined, and the advantages and disadvantages of the recall strategy can be measured by utilizing the data.

In an alternative embodiment, the generating vector data for online recall according to the recall policy includes:

Acquiring historical click data of a user, and determining user correlation information and content correlation information according to the historical click data;

training the user correlation information and the content correlation information to obtain correlation vectors, and splitting the correlation vectors to obtain user vectors and content vectors.

In such an alternative embodiment, the user vector and the content vector for online recall may be determined based on the actual user data, and the determined content vector and user vector may be evaluated for effectiveness.

In an alternative embodiment, the method further comprises:

and carrying out de-duplication processing on the retrieval information and the content related information.

In such an alternative embodiment, duplicate data in the recall result can be removed, thereby making the determination of the quasi-recall more accurate.

A second aspect of the present disclosure provides a device for determining a quasi-recall rate of a recall policy, including:

the generation module is used for generating vector data for online recall according to the recall strategy;

the simulation module is used for simulating an online recall process according to the vector data and the existing real user click data and determining a recall result;

And the determining module is used for determining the quasi-recall rate corresponding to the recall strategy according to the recall result.

A third aspect of the present disclosure is to provide an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of determining a quasi-recall for a recall policy as described in any one of the first aspects.

A fourth aspect of the present disclosure provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of determining a quasi-recall of a recall policy according to any one of the first aspects.

The method, device, equipment and storage medium for determining the quasi-recall rate of the recall strategy provided by the disclosure comprise the following steps: vector data for online recall is generated according to the recall strategy; simulating an online recall process according to the vector data and the existing real user click data, and determining a recall result; and determining the quasi-recall rate corresponding to the recall strategy according to the recall result. In the method, the device, the equipment and the readable storage medium provided by the disclosure, the recall result can be determined in an off-line simulation mode, and then the quasi-recall rate is determined according to the recall result, so that the system is not required to be acquired on line, and the efficiency of determining the quasi-recall rate of the recall strategy can be improved.

Drawings

The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:

FIG. 1 is a flow chart of a method for determining the quasi-recall of a recall policy according to an exemplary embodiment of the application;

FIG. 2 is a diagram illustrating a manner in which a quasi-recall is determined according to an exemplary embodiment of the present application;

FIG. 3 is a flow chart of a method of determining a quasi-recall rate for a recall policy, as shown in another exemplary embodiment of the application;

FIG. 4 is a diagram of user-related information, item-related information, as shown in an exemplary embodiment of the present application;

FIG. 5 is a diagram illustrating the generation of user vectors, content vectors, according to an exemplary embodiment of the present application;

FIG. 6 is a schematic diagram of first click content and second click content according to an exemplary embodiment of the present application;

FIG. 7 is a diagram of third click content shown in accordance with an exemplary embodiment of the present application;

FIG. 8 is a diagram of a distributed system architecture shown in an exemplary embodiment of the present application;

FIG. 9 is a block diagram of a recall policy accurate recall rate determination apparatus according to an exemplary embodiment of the present application;

FIG. 10 is a block diagram of a recall policy accurate recall rate determination apparatus according to another exemplary embodiment of the present application;

Fig. 11 is a block diagram of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The current internet can provide massive information for users, and in order to provide more targeted content for the users, a recommendation system can be arranged, and the users can be screened for the matched content by the recommendation system and recommended.

The recommendation system is characterized in that under the application scene of the big data/AI field, massive data information is applied to a corresponding recall strategy and a corresponding sort strategy by establishing the association relation between people and articles and taking surrounding data, algorithm and system as cores and utilizing the algorithm of the recommendation system (such as an algorithm based on collaborative filtering, a machine learning algorithm and the like) to realize the personalized recommendation process provided for users.

Wherein the recall policy is used to generate a recommended candidate set that matches the user's candidate set in the raw data by algorithms and rules. And the sequencing strategy reorders the candidate sets generated by the recall strategy or the near-line strategy according to different algorithm models to obtain a recommended candidate set list.

For recall strategies and sequencing strategies, respective evaluation indexes are required to measure the effects of the strategies, so that the recommendation effects are better and the user preference is better fitted.

At present, a recommendation system can be online, recommendation is performed to a user based on the recommendation system, and then the standard recall rate of the recall strategy is determined according to the feedback result of the user, so that the effect of the recall strategy is evaluated. However, the time taken from system online to user feedback is lengthy, resulting in less efficient evaluation of recall policies.

The scheme provided by the application can simulate the online recall process by utilizing the existing real user click data, thereby realizing online recall, and determining the accurate recall rate of the recall strategy by utilizing the recall result, thereby evaluating the effect of the recall strategy. The method can obtain the accurate recall rate of the recall strategy without the need of online system, and can improve the evaluation effect of the recall strategy.

FIG. 1 is a flow chart of a method for determining the quasi-recall of a recall policy according to an exemplary embodiment of the application.

As shown in FIG. 1, the method for determining the quasi-recall rate of the recall strategy provided by the application comprises the following steps:

step 101, vector data for online recall is generated according to a recall strategy.

The method provided in this embodiment may be performed by an electronic device with computing capability, where the electronic device may be a single electronic device or may be multiple electronic devices. For example, an electronic system composed of a plurality of electronic devices may be used.

The recall policy may be a recall policy to be evaluated, a plurality of recall policies to be evaluated may be set, and a quasi-recall rate of each recall policy to be evaluated is determined based on the method provided in the embodiment.

Specifically, the quasi-recall includes recall and accuracy. The recall strategy can be measured for quality by recall rate and accuracy. Wherein Recall ratio (Recall) =related content retrieved by the system/total number of all related content of the system; accuracy (Precision) =related content retrieved by system/total number of content retrieved by system.

Fig. 2 is a schematic diagram showing a manner of determining a quasi-recall according to an exemplary embodiment of the present application.

As shown in fig. 2, if the number of related contents retrieved by the system is a, the number of non-related contents retrieved is B, the number of non-related contents retrieved is C, and the number of non-related contents retrieved is D. Recall = a/(a+c); accuracy = a/(a+b).

Such as content that a recommendation system recommends to a user based on his or her portrayal includes: "Beijing university security examination research student"; "Beijing Internet job recruitment"; "what college life is. In fact, the "beijing university security research student" in combination with the information of the user can determine that the "beijing university security research student" is the content of interest to the user. The recommendation system searches the related content from the existing data as "Peking university security research student", and the searched irrelevant content is "Peking Internet work recruitment", "what university life is. In addition, the existing data of the system also comprises contents of 'North Dakai season', 'scenery of the unknown lake', and the two contents can be determined to be the content concerned by the user in combination with the user information, so that the related content which is not searched by the recommendation system comprises 'North Dakai season', 'scenery of the unknown lake'.

In the recommendation system, the vector data for online recall can be generated based on the recall strategy, and then the content related to the user is matched according to the vector data and recommended to the user, so that the advantages and disadvantages of the recommendation effect have an association relationship with the generated vector data.

In the method provided by the embodiment, vector data for online recall is generated based on a recall policy, and the vector data may include a user vector and a content vector. When recommending content to a user, similar users of the user may be matched based on the user vector and content of interest to the similar users may be recommended to the user. For example, if user a is interested in content a, and user a is a similar user to user b, content a may be recommended to user b. In addition, when recommending content to a user, similar content corresponding to the content of interest to the user may also be matched based on the content vector, and the similar content may be recommended to the user. For example, if user a is interested in content a, which is similar to content B, content B is recommended to user a.

In addition, the similarity between contents may be obtained based on the contents themselves alone, and may be combined with user preference, such as calculating the similarity between items in advance from the historical preference data of all users.

In the recommendation system, the vector data has a key effect on determining the content recommended to the user, and if the vector data generated based on the recall strategy is inaccurate, the matched recommended content is inaccurate.

The historical click data can be analyzed based on the recall strategy, the association relation among users and the association relation among contents can be extracted, and vector data can be generated according to the association relations.

Specifically, a large number of historical click data of the user may be obtained, from which a correspondence between the user and the clicked content may be obtained, e.g., the user a clicks the content A, B, the user b clicks the content B, C, etc. And then, according to the historical click data, the association relation between the user and the content is constructed, and according to the association relation, a relation diagram between the user and the content can be obtained, for example, when the user A and the user B click the content B, the two users can be associated through the content B. By using the relationship graph, a user relationship sequence and a content relationship sequence can be obtained.

Furthermore, model training can be performed according to the user relationship sequence and the content relationship sequence, a correlation vector is output, and then the correlation vector is separated to obtain a user vector and a content vector. During the training process, training features corresponding to different recall strategies may be different, and the output vectors may be different.

Step 102, simulating an online recall process according to the vector data and the existing real user click data, and determining a recall result.

In practical application, a search content corresponding to a preset user can be determined according to the vector data, and then a recall result is determined by combining with the real user click data, wherein the recall result can comprise searched related content, searched irrelevant content, non-searched related content, non-searched irrelevant content and the like.

The method comprises the steps of simulating matched content when recommending the content to a preset user by using vector data, and determining a recall result by combining real user click data, so that the recall result can be determined without online of a system.

If the recall policy includes collaborative filtering based on the user, a similar user of the preset user can be determined according to the user vector, and then click content corresponding to the preset user and click content corresponding to the similar user can be determined by combining with real user click data, so that recall results can be determined according to the click content of the preset user and the click content of the similar user.

Specifically, if the recommendation system is online, the recommendation system based on collaborative filtering of the user may recommend click content corresponding to a user similar to the preset user, so this embodiment can simulate the process. Meanwhile, the searched related content, the searched irrelevant content, the non-searched related content and the like can be determined by combining the real click content of the preset user and the real click content of the similar user.

Further, if the recall policy includes collaborative filtering based on content, click content corresponding to a preset user can be determined according to click data of a real user, and then related content candidate sets are screened out from the click content, and the content candidate sets are searched. For the related content candidate set, other content similar to the content in the related content candidate set can be screened out by using the content vector, namely more related content information corresponding to the user can be screened out.

Information at the search level, i.e., the searched related content and the related content not searched, may be determined from the search content candidate set. And according to the finally screened related content information, determining related level information, namely the searched related content and the non-searched related content.

Specifically, if the recommendation system is online, the recommendation system based on collaborative filtering of the content recommends similar content corresponding to the content of interest to the preset user according to the content vector, so that the embodiment can simulate the process. Meanwhile, the searched related content, the searched irrelevant content, the non-searched related content and the like can be determined by combining the real click content and the real click data of the preset user.

And step 103, determining the quasi-recall rate corresponding to the recall strategy according to the recall result.

Further, after determining the recall result, the quasi-recall rate may be determined based on the recall result. For example, the accuracy and recall may be determined based on the retrieved relevant content, the retrieved irrelevant content, and the non-retrieved relevant content in the recall result.

In practical application, recall (Recall) =related content retrieved by the system/total number of related content of the system; accuracy (Precision) =related content retrieved by system/total number of content retrieved by system.

The recall strategy evaluation method and device can evaluate the advantages and disadvantages of the recall strategy according to the determined accuracy and recall rate, so that the recall strategy evaluation efficiency is improved.

The method provided by the present embodiment is used for determining the merits of recall policies, and the method is performed by a device provided with the method provided by the present embodiment, and the device is typically implemented in hardware and/or software.

The method for determining the quasi-recall rate of the recall strategy provided by the embodiment comprises the following steps: vector data for online recall is generated according to the recall strategy; simulating an online recall process according to the vector data and the existing real user click data, and determining a recall result; and determining the quasi-recall rate corresponding to the recall strategy according to the recall result. In the method provided by the embodiment, the recall result can be determined in an off-line simulation mode, and then the standard recall rate is determined according to the recall result, so that the system does not need to be on-line to acquire the recall result, and the efficiency of determining the standard recall rate of the recall strategy can be improved.

FIG. 3 is a flow chart illustrating a method of determining a quasi-recall rate for a recall policy according to another exemplary embodiment of the application.

As shown in fig. 3, the method for determining the quasi-recall rate of the recall strategy provided by the application comprises the following steps:

step 301, acquiring historical click data of a user, and determining user correlation information and content correlation information according to the historical click data.

Wherein the user history click data may be obtained by an existing system. For example, a large number of click data of the user within a preset time period may be acquired, and the preset time period may be set according to requirements, for example, about three months. The historical click data can include click content corresponding to a user, for example, the user A clicks the content A and the content B, and the user B clicks the content B and the content C.

Fig. 4 is a schematic diagram of user-related information and item-related information according to an exemplary embodiment of the present application.

As shown in FIG. 4, a relationship diagram between users and content can be constructed according to historical click data, and edges can be used to connect users and content, for example, an edge can be established between a user1 and content A, and an edge can be established between a user2 and content B.

Further, a directed graph may be generated from a graph of relationships between users and content. The same content in the relationship graph between the user and the content can be combined, so that different users are connected through the same content, and the directed relationship graph comprises richer information. For example, multiple users associate the same content, and for example, the same user pays attention to multiple content.

In practical application, the rich information included in the directed relation graph can be analyzed to determine the user correlation information and the content correlation information. The user-related information may be, for example, a user origin walk sequence, and the content-related information may be, for example, a content origin walk sequence.

The user starting point walk sequence can comprise one user as a starting point and other associated users. For example, if there are N users that are similar to the content of interest to user1, then user1 may point to these N users in the starting point walk sequence. The content start point walk sequence may include a content-related other content that starts from the content. For example, a user is interested in content a and N other contents, then content a may point to the N other contents in the starting point walk sequence.

As shown in fig. 4, the determined USER-related information is a USER origin run sequence (USER origin run sequence), and the determined content-related information is a content origin run sequence (ITEM) origin run sequence.

Step 302, training the user correlation information and the content correlation information to obtain correlation vectors, and splitting the correlation vectors to obtain user vectors and content vectors.

After obtaining the user correlation information and the content correlation information, the user correlation information and the content correlation information can be trained to obtain a user vector and a content vector.

In one embodiment, the user vector and the content vector may be determined directly based on the user correlation information and the content correlation information, e.g., the vectors of users who pay attention to the same content are similar, and the content vectors of the content are similar.

In the method provided by the embodiment, the user vector and the content vector are determined together by combining the user correlation information and the content correlation information, so that the user vector and the content vector are more accurate, for example, the content vector can be determined according to the historical preference of the user, and therefore the content vectors focused by similar users are similar.

Specifically, the user correlation information and the content correlation information can be trained to obtain correlation vectors, and the correlation vectors are split to obtain user vectors and content vectors.

Fig. 5 is a schematic diagram illustrating generation of user vectors and content vectors according to an exemplary embodiment of the present application.

As shown in fig. 5, the user origin walk sequence and the content origin walk sequence (sequence samples) may be specifically model-trained, and a correlation vector (UI vector) may be output, and the user vector and the content vector may be separated.

With continued reference to FIG. 3, if the recall policy includes user-based collaborative filtering, steps 303-305 may be performed.

Step 303, determining similar users corresponding to the first preset user according to the user vector.

Further, a first preset user may be preselected through which the offline recall process is simulated. For example, a user id is selected, and a user corresponding to the id is used as a first preset user.

In practical application, the first preset user may be a real user, that is, the click data of the first preset user is included in the click data of the real user.

The preset vector of the first preset user may be determined, for example, the preset vector of the first preset user may be directly searched from the separated user vectors.

Specifically, a similar vector corresponding to the preset vector may be determined in the determined user vectors, and a user corresponding to the similar vector is used as a similar user of the first preset user. In the method provided by the embodiment, the characteristics of the users can be described by vectors, and if the user vectors are similar, the two users are considered to be similar.

Step 304, acquiring first click content information of a first preset user and second click content information of a similar user in a preset time period according to the click data of the real user.

Further, a preset period of time may be specified, such as a period of time around a threshold of a certain moment, such as a period of 1 minute around ten hours in the morning. When the time length is 0, the preset time period may be considered as a specified time, for example, may be the time of ten morning hours.

In actual application, the first click content information of the first preset user and the second click content information of the similar user can be obtained according to the click data of the real user. The content information may include an identification corresponding to the content, for example, an id of the content.

The first click content information may include a content identifier clicked by a first preset user in a preset time period, and the second click content information may include a content identifier clicked by a similar user in a preset time period.

Step 305, determining recall results according to the first click content information and the second click content information.

Specifically, the second click content information may be used as the content recommended to the first preset user, which is determined based on the recall policy, and the first click content information may be used as the content actually clicked by the first preset user, that is, the content concerned by the user. The result of this off-line simulated recall may be determined in conjunction with the first click content information, the second click content information.

Furthermore, a first time threshold may be set, and the click time of each first click content may be compared with the first time threshold, and each second click content may be compared with the first time threshold, where the click contents may be grouped according to the comparison result. For example, the first click content with the click time smaller than the first time threshold may be grouped, the first click content with the click time larger than the first time threshold may be grouped, and correspondingly, the second click content with the click time smaller than the first time threshold may be grouped, and the second click content with the click time larger than the first time threshold may be grouped.

In practical application, the retrieval information and the content related information corresponding to the recall strategy can be determined according to the comparison result, and particularly can be determined according to the grouping result.

The search information can be determined according to the first click content and the second click content with the click time being greater than the first time threshold, and the content related information can be determined according to the first click content and the second click content with the click time being less than the first time threshold.

Fig. 6 is a schematic diagram of first click content and second click content according to an exemplary embodiment of the present application.

As shown in fig. 6, the first click content and the second click content may be respectively sorted by click time. The left is the first click content and the right is the second click content. The first click content may be partitioned according to a first time threshold to obtain first click content 61 having a click time greater than the first time threshold and first click content 62 having a click time less than the first time threshold. The second click content may also be partitioned according to a first time threshold to obtain second click content 63 having a click time greater than the first time threshold and second click content 64 having a click time less than the first time threshold.

The search information may be determined according to the first click content 61 and the second click content 63, and specifically, the searched related content and the non-searched related content may be obtained. Content-related information is determined from the first click content 62 and the second click content 64, and may include, in particular, related content retrieved and unrelated content retrieved.

Specifically, a first preset number of first filtering contents 611 may be filtered out from the first clicking contents 61 with the clicking time greater than the first time threshold, and a first preset number of second filtering contents 631 may be filtered out from the second clicking contents 63 with the clicking time greater than the first time threshold, for example, the first preset number may be 5, so that 5 first filtering contents 611 and 5 second filtering contents 631 may be filtered out. The search information may be determined based on the first filter content and the second filter content.

Further, the second preset number of third filtering contents 621 may be selected from the first clicking contents 62 with the clicking time less than the first time threshold, and the second preset number of fourth filtering contents 631 may be selected from the second clicking contents 63 with the clicking time less than the first time threshold, for example, the second preset number may be 20, so as to screen 20 third filtering contents and 20 fourth filtering contents. The content-related information may be determined based on the third filter content and the fourth filter content.

With continued reference to FIG. 3, if the recall policy includes content-based collaborative filtering, steps 306-308 may be performed.

Step 306, obtaining third click content corresponding to the second time threshold of the second preset user.

Further, a second preset user may be pre-selected through which the offline recall process is simulated. For example, a user id is selected, and a user corresponding to the id is used as a first preset user.

In practical application, the second preset user may be a real user, that is, the click data of the second preset user is included in the click data of the real user.

The second time threshold may be preset, and a third click content corresponding to the second time threshold by the second preset user may be obtained. Specifically, the click content of the second preset user when the second preset user is near the second time threshold, for example, the third click content corresponding to the user in about ten hours in the morning, specifically, the time period including the second time threshold, for example, the third click content of the second preset user 5 minutes before and after ten hours in the morning, can be accurately obtained.

Step 307, determining a related content candidate set according to the third click content, and retrieving the content candidate set.

Specifically, the related content candidate set can be primarily screened out according to the determined third click content, and the content candidate set can be searched.

Further, the third click contents may be ranked according to the click time, where a third preset number (m) of third click contents with a later click time are used as search content candidate sets, and a fourth preset number (n) of third click contents with a later click time are used as related content candidate sets.

Fig. 7 is a schematic diagram of third click content shown in accordance with an exemplary embodiment of the present application.

As shown in fig. 7, the third click content may be ordered by click time, the third click through is shown in order of time from small to large. As shown in fig. 7, m third click contents with a later click time may be set as search content candidate sets, and n third click contents with a later click time may be set as related content candidate sets.

The values of m and n can be set according to the requirements, for example, m can be 20 and n can be 60.

In practical application, the method provided in this embodiment has a certain requirement on the number of third click contents, that is, the number of third click contents should be greater than the values of m and n, based on this, when the third click contents of the second preset user are obtained, if the number is less than m or n, the second preset user may be replaced, and the third click contents may be redetermined.

With continued reference to fig. 3, after determining the related content candidate set and retrieving the content candidate set, the method provided in this embodiment further includes:

step 308, determining recall results according to the content vectors, the related content candidate sets, and the retrieved content candidate sets included in the vector data.

And determining a recall result corresponding to the recall strategy by combining the predetermined content vector, the related content candidate set and the searched content candidate set. The content included in the related content candidate set may be regarded as related content, and the retrieved content included in the retrieved content candidate set is the retrieved content

Specifically, content related information may be determined according to the content vector, the related content candidate set; search information is determined from the search content candidate set.

Further, more relevant content information may be determined from a predetermined content vector, content candidate set. For example, a related content vector may be determined according to related content included in the related content candidate set, and then a similar vector may be found according to a predetermined content vector, and the corresponding content may be used as the related content. In this way, content-related information corresponding to the recall policy may be determined.

In practical application, the content related information may include the retrieved related content and the retrieved unrelated content.

The search information may be determined according to the search content candidate set, and the search information may include searched related content and non-searched related content.

After step 305 or step 308, it may further include:

step 309, performing deduplication processing on the search information and the content-related information.

In the method provided by the embodiment, the determined recall result includes retrieval information and content related information.

Specifically, the determined search information and the content related information may have repeated content, for example, the search information includes a searched related content a, and the content related information also includes the searched related content a, so that the content related information may be de-duplicated and only one content a is reserved.

Further, whether or not duplicate content exists in the retrieval information and the content-related information may be determined by identifying the content id.

Step 310, determining a first number of retrieved related content, a second number of retrieved unrelated content, and a third number of non-retrieved related content according to the content related information and the retrieval information.

Further, content related information and retrieval information may be counted to determine a first amount of related content retrieved therein. For example, in the related information, the number of the searched related contents is a, and in the search information, the number of the searched related contents is b, the sum of a and b may be taken as the first number.

In practical application, the related content which is not searched can be determined according to the search information, and the third quantity is counted. The retrieved irrelevant content may be determined based on the content-related information and the second number is counted.

Step 311, determining the recall according to the first quantity, the second quantity and the third quantity.

After the first quantity, the second quantity and the third quantity are determined, the ratio of the first quantity to the sum of the first quantity and the second quantity can be determined as the accuracy; determining a ratio of the first quantity to a sum of the first quantity and the third quantity as a recall.

Specifically, the corresponding quasi-recall rate can be determined for each recall strategy, and the advantages and disadvantages of each recall strategy are determined based on the quasi-recall rate.

Optionally, in order to further improve the efficiency of determining the recall rate, the method provided in this embodiment may be applied to a distributed system.

Fig. 8 is a diagram of a distributed system architecture according to an exemplary embodiment of the present application.

As shown in fig. 8, tasks for determining a quasi-recall may be issued by clients 81 and received by task scheduling center 82. The task scheduling center 82 then issues the task to the node 83, and the node 83 issues the task to the execution end 85 through the message middleware 84.

The executing end 85 is configured to execute any of the above methods for determining the quasi-recall rate of the recall policy, and the executing end 85 also feeds back the execution result to the node end 83. In the redistribution system architecture, a plurality of execution ends 85 may be provided, where the plurality of execution ends 85 may include physical devices or virtual machines.

When the execution end 85 is a physical device, it may run multiple tasks for determining the recall rate at the same time, and when the execution end 85 is a virtual machine, it may run only one task for determining the recall rate.

When the accurate recall rates of a plurality of recall strategies are required to be compared, the accurate recall rate corresponding to each recall strategy can be obtained rapidly based on the distributed system, so that the determination efficiency of the accurate recall rate is further improved.

Fig. 9 is a block diagram of a recall policy accurate recall rate determination apparatus according to an exemplary embodiment of the present application.

As shown in fig. 9, the device for determining the quasi-recall rate of the recall strategy according to the present application includes:

a generating module 91, configured to generate vector data for online recall according to the recall policy;

the simulation module 92 is configured to simulate an online recall process according to the vector data and the existing real user click data, and determine a recall result;

and the determining module 93 is configured to determine a quasi-recall rate corresponding to the recall policy according to the recall result.

The device for determining the quasi-recall rate of the recall strategy provided by the embodiment comprises a generation module, a calculation module and a calculation module, wherein the generation module is used for generating vector data for online recall according to the recall strategy; the simulation module is used for simulating an online recall process according to the vector data and the existing real user click data and determining a recall result; and the determining module is used for determining the quasi-recall rate corresponding to the recall strategy according to the recall result. In the device provided by the embodiment, the recall result can be determined in an off-line simulation mode, and then the accurate recall rate is determined according to the recall result, so that the system does not need to be on-line to acquire the recall result, and the efficiency of determining the accurate recall rate of the recall strategy can be improved.

The specific principle and implementation manner of the device for determining the quasi-recall rate of the recall strategy provided in this embodiment are similar to those of the embodiment shown in fig. 1, and are not repeated here.

Fig. 10 is a block diagram of a recall policy accurate recall rate determination apparatus according to another exemplary embodiment of the present application.

As shown in fig. 10, based on the foregoing embodiment, the determining device for the quasi-recall ratio of the recall policy provided in this embodiment may optionally include a user vector in the vector data;

if the recall policy includes user-based collaborative filtering, the simulation module 92 includes a first determination unit 921 for:

Optionally, the first determining unit 921 is specifically configured to:

and determining the content related information according to the first click content and the second click content of which the click time is smaller than the first time threshold.

Optionally, the first determining unit 921 is specifically configured to:

Optionally, the vector data includes a content vector;

if the recall policy includes content-based collaborative filtering, the simulation module includes a second determination unit 922 for:

Optionally, the second determining unit 922 is specifically configured to:

Optionally, the recall result includes content related information and retrieval information;

the determining module 93 includes:

a statistics unit 931 configured to determine a first number of retrieved related contents, a second number of retrieved unrelated contents, and a third number of non-retrieved related contents based on the content related information and the retrieval information;

a quasi-calling rate determining unit 932, configured to determine the quasi-calling rate according to the first quantity, the second quantity, and the third quantity.

Optionally, the quasi-recall determining unit 932 is specifically configured to:

Optionally, the generating module 91 includes:

the data processing unit 911 is configured to obtain historical click data of a user, and determine user relevance information and content relevance information according to the historical click data;

The training unit 912 is configured to train the user correlation information and the content correlation information to obtain a correlation vector, and split the correlation vector to obtain a user vector and a content vector.

Optionally, the apparatus further comprises a deduplication module 94 for:

The specific principle and implementation manner of the device for determining the quasi-recall rate of the recall strategy provided in this embodiment are similar to those of the embodiment shown in fig. 3, and are not repeated here.

According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium.

As shown in fig. 11, is a block diagram of an electronic device according to an embodiment of the application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 11, the electronic device includes: one or more processors 1101, memory 1102, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). In fig. 11, a processor 1101 is taken as an example.

Memory 1102 is a non-transitory computer-readable storage medium provided by the present application. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the method for determining the quasi-recall rate of the recall strategy provided by the application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to execute the method of determining the quasi-recall of the recall policy provided by the present application.

The memory 1102 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the generation module 91, the simulation module 92, and the determination module 93 shown in fig. 9) corresponding to the method for determining the quasi-recall of a recall strategy in an embodiment of the present application. The processor 1101 executes various functional applications of the server and data processing, i.e., a method of determining the quasi-recall rate of the recall policy in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 1102.

Memory 1102 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 1102 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, the memory 1102 may optionally include memory located remotely from the processor 1101, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device may further include: an input device 1103 and an output device 1104. The processor 1101, memory 1102, input device 1103 and output device 1104 may be connected by a bus or other means, for example in fig. 11.

The input device 1103 may receive input digital or character information and generate key signal inputs related to user settings and function control of the electronic device, such as a touch screen, keypad, mouse, trackpad, touchpad, pointer stick, one or more mouse buttons, trackball, joystick, and like input devices. The output device 1104 may include a display device, auxiliary lighting (e.g., LEDs), and haptic feedback (e.g., a vibration motor), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.

The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims

1. A method for determining the quasi-recall rate of a recall strategy is characterized by comprising the following steps:

vector data for online recall is generated according to the recall strategy;

determining a quasi-recall rate corresponding to the recall strategy according to the recall result;

the vector data includes a content vector;

2. The method of claim 1, wherein the vector data comprises a user vector;

3. The method of claim 2, wherein the determining similar users corresponding to the first preset user from the user vector comprises:

4. The method of claim 2, wherein the determining the recall result from the first click content information and the second click content information comprises:

5. The method of claim 4, wherein the determining, according to the comparison result, the retrieval information and the content-related information corresponding to the recall policy includes:

6. The method of claim 5, wherein the step of determining the position of the probe is performed,

7. The method of claim 1, wherein the determining the recall result from the content vector, the related content candidate set, the retrieved content candidate set included in the vector data comprises:

8. The method of any one of claims 4-7, wherein the determining a quasi-recall corresponding to the recall policy based on the recall result comprises:

9. The method of claim 8, wherein the determining the recall from the first number, the second number, the third number comprises:

10. The method of any of claims 1-7, 9, wherein the generating vector data for online recall according to the recall policy comprises:

11. The method of any one of claims 4-7, further comprising:

12. A device for determining a quasi-recall rate of a recall strategy, comprising:

the determining module is used for determining the quasi-recall rate corresponding to the recall strategy according to the recall result;

the vector data includes a content vector; if the recall policy includes content-based collaborative filtering, the simulation module includes a second determination unit to: acquiring third click content corresponding to a second time threshold value of a second preset user; determining a related content candidate set and a search content candidate set according to the third click content; and determining the recall result according to the content vector, the related content candidate set and the retrieval content candidate set which are included in the vector data.

13. The apparatus of claim 12, wherein the vector data comprises a user vector;

if the recall policy includes user-based collaborative filtering, the simulation module includes a first determination unit configured to:

14. The apparatus according to claim 13, wherein the first determining unit is specifically configured to:

15. The apparatus of any one of claims 12-14, wherein the recall result includes content-related information, retrieval information;

the determining module includes:

A statistics unit, configured to determine, according to the content related information and the search information, a first number of searched related content, a second number of searched unrelated content, and a third number of non-searched related content;

and the quasi-calling rate determining unit is used for determining the quasi-calling rate according to the first quantity, the second quantity and the third quantity.

16. The apparatus of any one of claims 12-14, wherein the generating module comprises:

the data processing unit is used for acquiring historical click data of a user and determining user correlation information and content correlation information according to the historical click data;

and the training unit is used for training the user correlation information and the content correlation information to obtain correlation vectors, and splitting the correlation vectors to obtain user vectors and content vectors.

17. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.

18. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-11.