Retrieval algorithm evaluation recommendation method and system
Technical Field
The invention relates to the technical field of internet, in particular to a retrieval algorithm evaluation recommendation method and system.
Background
The current criteria for evaluating data mining algorithms for the internet are based on recall and accuracy, or a combination of both. The Recall Rate (Recall Rate, also called Recall Rate) is the ratio of the number of retrieved records to the number of all relevant records in the record library, and the Recall Rate of the retrieval system is measured; the precision is the ratio of the number of relevant records retrieved to the total number of records retrieved, and the precision is measured by the precision of the retrieval system. The evaluation of the ordered queue involves an average sorting score of sorting accuracy or a weighted average sorting score, and the evaluation mode is complex and inaccurate.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, the technical problem to be solved by the present invention is how to derive an optimal search algorithm so as to enable a user to obtain a better search experience.
In order to achieve the above object, in one aspect, the present invention provides a method for evaluating and recommending a search algorithm, including the steps of:
s1, determining an evaluation condition according to the retrieval background information;
s2, evaluating each retrieval algorithm according to the evaluation condition;
and S3, acquiring an optimal retrieval algorithm to perform retrieval calculation, and obtaining a retrieval result.
Preferably, in step S2, the result ranking within a certain time is counted according to the search condition to obtain a real result, the search result is obtained according to each search result algorithm, the search result of each search algorithm is compared with the real result, and the search algorithm with the search result closest to the real result is analyzed.
Preferably, in step S2, a first list is obtained according to a search result, a second list is obtained according to a real result, the first list includes a plurality of records, the second list includes a plurality of records, the first list and the second list are compared, and a search algorithm with a search result closest to the real result is analyzed according to the records only in the first list, the records only in the second list and the records both in the first list and the second list.
Preferably, in step S2, the prediction score RS is calculated by the formula:
adding records in the second list, which do not appear in the first list, to the first list to obtain a third list;
defining a prediction score RS, and obtaining an optimal retrieval algorithm according to the prediction score RS, wherein the calculation formula of the detection score RS is as follows:
wherein k is the number of records in the second list; m is the number of records in the third list; h isjScore for the jth record in the third list, tjScores for records that are present in both the first list and the second list; f. ofjA score for a record that exists in the first list but not in the second list; i.e. ijScores for records that are not present in the first list but are present in the second list.
Preferably, said tjAn absolute value equal to a difference value recorded at the sorting positions of the first list and the second list is multiplied by a coefficient, which is confirmed in accordance with the evaluation condition.
Preferably, said fjA (m +1), the said ijB (m +1), wherein A, B was confirmed according to the evaluation conditions, respectively.
In another aspect, the present invention provides a search algorithm evaluation recommendation system, including:
the determining module is used for determining an evaluation condition according to the retrieval background information;
the evaluation module is used for evaluating each retrieval algorithm according to the evaluation conditions;
and the recommendation module is used for acquiring the optimal retrieval algorithm to perform retrieval calculation and obtaining a retrieval result.
Preferably, the evaluation module counts result ranking within a certain time according to the retrieval conditions to obtain real results, respectively obtains retrieval results according to each retrieval result algorithm,
the method comprises the steps of obtaining a first list according to a retrieval result, obtaining a second list according to a real result, wherein the first list comprises a plurality of records, the second list comprises a plurality of records, comparing the first list with the second list, and analyzing a retrieval algorithm with the retrieval result closest to the real result according to the records only existing in the first list, the records only existing in the second list and the records simultaneously existing in the first list and the second list.
Preferably, the evaluation module adds the record in the second list, which is not present in the first list, to the first list to obtain a third list; defining a prediction score RS, and obtaining an optimal retrieval algorithm according to the prediction score RS, wherein the calculation formula of the detection score RS is as follows:
wherein k is the number of records in the second list; m is the number of records in the third list; h isjScore for the jth record in the third list; t is tjScores for records that are present in both the first list and the second list; f. ofjA score for a record that exists in the first list but not in the second list; i.e. ijScores for records that are not present in the first list but are present in the second list.
Preferably, said tjMultiplying an absolute value of a difference value corresponding to the sorting positions recorded in the first list and the second list by a coefficient, the coefficient being confirmed according to the evaluation condition; f isjA (m +1), the said ijB (m +1), wherein A, B was confirmed according to the evaluation conditions, respectively.
The invention provides an evaluation method and a system of a mining algorithm with a prediction ordered queue as output, and the application scene of the method can be used in the internet fields of recommendation and prediction of evaluation commodities and the like or other non-internet fields with similar measurement requirements; in the technical scheme, the advantages and disadvantages of the prediction algorithm are comprehensively evaluated by using the size of a numerical value, and an optimal retrieval algorithm is developed, so that a user can obtain better retrieval experience; while reducing the amount of server access.
Drawings
FIG. 1 is a schematic flow chart of a method for evaluating recommendation by a search algorithm according to a first embodiment of the present invention;
fig. 2 is a schematic structural diagram of a retrieval algorithm evaluation recommendation system in the second embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
Fig. 1 is a schematic flow chart of a method for evaluating and recommending a search algorithm according to a first embodiment of the present invention, as shown in fig. 1, the method for evaluating and recommending a search algorithm includes the steps of:
s1, determining an evaluation condition according to the retrieval background information;
s2, evaluating each retrieval algorithm according to the evaluation condition;
and S3, acquiring an optimal retrieval algorithm to perform retrieval calculation, and obtaining a retrieval result.
Preferably, in step S2, the result ranking within a certain time is counted according to the search condition to obtain a real result, the search result is obtained according to each search result algorithm, the search result of each search algorithm is compared with the real result, and the search algorithm with the search result closest to the real result is analyzed.
Preferably, in step S2, a first list is obtained according to a search result, a second list is obtained according to a real result, the first list includes a plurality of records, the second list includes a plurality of records, the first list and the second list are compared, and a search algorithm with a search result closest to the real result is analyzed according to the records only in the first list, the records only in the second list and the records both in the first list and the second list.
Preferably, in step S2, a first list is obtained according to a search result, a second list is obtained according to a real result, the first list includes a plurality of records, the second list includes a plurality of records, the first list and the second list are compared, and a search algorithm with a search result closest to the real result is analyzed according to the records only in the first list, the records only in the second list and the records both in the first list and the second list.
Preferably, t isjAn absolute value equal to a difference value recorded at the sorting positions of the first list and the second list is multiplied by a coefficient, which is confirmed in accordance with the evaluation condition.
Preferably, fj=A(m+1),ijB (m +1), wherein A, B was confirmed according to the evaluation conditions, respectively.
Those skilled in the art will appreciate that, in accordance with the method of the present invention, the present invention also includes a search algorithm evaluation recommendation system, which includes, in one-to-one correspondence with the above method steps, as shown in fig. 2: a determining module 201, configured to determine an evaluation condition according to the retrieval background information; an evaluation module 202, configured to evaluate each retrieval algorithm according to an evaluation condition; and the recommending module 203 is used for acquiring an optimal retrieval algorithm to perform retrieval calculation and obtain a retrieval result.
Preferably, the evaluation module 202 counts result ranking within a certain time according to the retrieval conditions to obtain a real result, obtains the retrieval results according to the retrieval result algorithms respectively,
the method comprises the steps of obtaining a first list according to a retrieval result, obtaining a second list according to a real result, wherein the first list comprises a plurality of records, the second list comprises a plurality of records, comparing the first list with the second list, and analyzing a retrieval algorithm with the retrieval result closest to the real result according to the records only existing in the first list, the records only existing in the second list and the records simultaneously existing in the first list and the second list.
Preferably, the evaluation module 202 adds the record in the second list that does not appear in the first list to obtain a third list; defining a prediction score RS, and obtaining an optimal retrieval algorithm according to the prediction score RS, wherein the calculation formula of the detection score RS is as follows:
wherein k is the number of records in the second list; m is the number of records in the third list; h isjScore for the jth record in the third list; t is tjScores for records that are present in both the first list and the second list; f. ofjA score for a record that exists in the first list but not in the second list; i.e. ijA score for a record that is not present in the first list but is present in the second list; t is tjMultiplying an absolute value of a difference value corresponding to the sorting positions recorded in the first list and the second list by a coefficient, the coefficient being confirmed according to the evaluation condition; f isjA (m +1), the said ijB (m +1), wherein A, B was confirmed according to the evaluation conditions, respectively.
Wherein, the smaller the value of RS is, the better, if the prediction result is completely correct, the value of RS is 0; k is the number of commodities in the real commodity queue; m is all commodities in the prediction queue and the real queue; h isjIs the ranking score of the jth good, in three cases: correctly predicted as tjWrong prediction is noted as fjNot predicted is denoted as ij;tjIndicating that the jth commodity is a correctly predicted commodity which appears in both queues simultaneously, and the jth commodity is equal to the absolute value of the difference between the sorting position of the commodity in the predicted queue and the sorting position of the commodity in the real queue; f. ofjIndicating that the jth good is not in the real queue but is present in the predicted queue, fj=A(m+1);ijIndicating that the jth good is not in the prediction queue but is present in the real queue, ijB (m + 1). The coefficient A, B is determined according to the evaluation condition.
If A is 1; b is 1;
assuming that the real goods queue is T ═ (B, D, C, E), and the predicted goods queue is P ═ (a, B, C); for queues T and P, then k is 4; m-5 (commercial products a, B, C, D, E); h is16 (article a was misjudged); h is21 (position of article B in P)Is 2, position in T is 1); h is30 (commodity C is 3 in T and 3 in P); h is46 (item D was not predicted); h is56 (item E was not predicted); RS 19/4 4.75.
Assuming that the real goods queue is T ═ D, F, E, and the predicted goods queue is P ═ E, C, F; then k is 3; m-4 (commercial products E, C, F, D); h is12 (article E is 1 in P and 3 in T); h is25 (article C misjudged); h is31 (commodity F is 3 in P and 2 in T); h is44 (item D was not predicted); RS 12/3 4.
Finally, a better retrieval method is introduced to provide retrieval recommendation. The application scene of the invention can be used in the Internet field of the recommendation and prediction of the evaluation commodity and the like, or other non-Internet fields with similar measurement requirements. In the technical scheme, the advantages and disadvantages of the prediction algorithm are comprehensively evaluated by using the size of a numerical value, and an optimal retrieval algorithm is developed, so that a user can obtain better retrieval experience; while reducing the amount of server access.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.