CN107122467B - Search engine retrieval result evaluation method and device and computer readable medium - Google Patents

Search engine retrieval result evaluation method and device and computer readable medium Download PDF

Info

Publication number
CN107122467B
CN107122467B CN201710293371.7A CN201710293371A CN107122467B CN 107122467 B CN107122467 B CN 107122467B CN 201710293371 A CN201710293371 A CN 201710293371A CN 107122467 B CN107122467 B CN 107122467B
Authority
CN
China
Prior art keywords
search
search engine
retrieval result
result
evaluation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710293371.7A
Other languages
Chinese (zh)
Other versions
CN107122467A (en
Inventor
李悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nubia Technology Co Ltd
Original Assignee
Nubia Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nubia Technology Co Ltd filed Critical Nubia Technology Co Ltd
Priority to CN201710293371.7A priority Critical patent/CN107122467B/en
Publication of CN107122467A publication Critical patent/CN107122467A/en
Application granted granted Critical
Publication of CN107122467B publication Critical patent/CN107122467B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for evaluating a search result of a search engine and a computer readable medium, aiming at solving the problem that the existing method for evaluating the search result of the search engine is lack of universality and objectivity. The method comprises the following steps: acquiring click data of the retrieval result content position in the retrieval result page, and taking the click rate corresponding to the retrieval result content position as a position score; obtaining a search engine quality index (DCG) evaluation score of each retrieval result page through a DCG evaluation model according to the position score; combining the evaluation scores of the search engine quality index (DCG) of each retrieval result page according to the top K items (TOPK) search ranking results of the search engine; and obtaining a search engine quality index (DCG) overall evaluation score corresponding to the top K item (TOPK) search ranking result.

Description

Search engine retrieval result evaluation method and device and computer readable medium
Technical Field
The present invention relates to the field of network communication technologies, and in particular, to a method and an apparatus for evaluating search results of a search engine, and a computer-readable medium.
Background
The search engine is a system for automatically collecting information from the internet, and providing the information to a user for inquiry after certain arrangement. The information on the internet is vast and inexorable, all the information is like one island on the wanyo, the web page link is a criss-cross bridge between the islands, and the search engine draws a clear information map for the user to look up at any time. They extract the information of each website (mainly web page characters) from the internet, establish a database, search the records matched with the user query conditions, and return the results according to a certain arrangement sequence. The full-text search engine is a mainstream search engine widely applied at present, the foreign representative search is Google, and the Chinese search is the largest in every century at home. They extract the information of each website (mainly web page characters) from the internet, establish a database, search the records matched with the user query conditions, and return the results according to a certain arrangement sequence.
Due to the rapid development of internet information retrieval technology, various search engines are in endless, which on one hand provides convenience for users to retrieve information, and on the other hand makes many users feel comfortable and do not know how to select a proper search engine, thereby providing requirements for evaluating search engines. By reasonably evaluating the search engine, the method is not only beneficial to the selection and the use of the user, but also beneficial to the improvement and the development of the user. One of the existing main search engine evaluation methods is the Cranfield evaluation system: the name of Cranfield-like apple University comes from the british Cranfield University, because in the fifties of the twentieth century the University first proposed such a set of evaluation systems: the complete evaluation scheme is composed of a query sample set, a correct answer set and evaluation indexes, and the core position of evaluation in information retrieval research is established. The Cranfield evaluation system has wide application in various large search engines. In specific application, firstly, the problem to be solved is to construct a query term set for testing. Common search engine evaluation methods also include Precision-Recall methods, P @ N methods, DCG (measure search engine quality index) methods, and the like.
However, the existing search engine on-line evaluation of search effect is mostly related to business, that is, a certain rule is shunted to on-line users, users are guided to different service versions, and finally, the purchase conversion rate, the download conversion rate and the music playing conversion rate which are strongly related to business are used as evaluation indexes to evaluate the search effect of different versions, and the business combination is too tight and not universal.
Meanwhile, the existing DCG (measure search engine quality index) evaluation algorithm for search effect of the search engine is mostly used for offline evaluation, and is mainly used for evaluating a few test colleagues, so that the subjectivity is too strong, and the offline search evaluation result is not ideal and objective.
Disclosure of Invention
The invention mainly aims to provide a method and a device for evaluating a search result of a search engine and a computer readable medium, aiming at solving the problem that the existing method for evaluating the search result of the search engine is lack of universality and objectivity.
In order to achieve the above object, the present invention provides a method for evaluating search results of a search engine, comprising the steps of:
acquiring click data of the retrieval result content position in the retrieval result page, and taking the click rate corresponding to the retrieval result content position as a position score;
obtaining a search engine quality index (DCG) evaluation score of each retrieval result page through a DCG evaluation model according to the position score;
combining the evaluation scores of the search engine quality index (DCG) of each retrieval result page according to the top K items (TOPK) search ranking results of the search engine; and obtaining a search engine quality index (DCG) overall evaluation score corresponding to the top K item (TOPK) search ranking result.
Further, the search result evaluation method of the search engine further comprises the step of obtaining search behavior data from a server log file, a visitor access log file and the like.
Furthermore, the search result evaluation method of the search engine further comprises the step of obtaining the search result pages corresponding to all independent visitors of the same search term from the search behavior data.
Further, the method for evaluating the retrieval result of the search engine further comprises the step of obtaining top K item (TOPK) search ranking results from the search behavior data.
The top K item (TOPK) search sorting result is obtained through a TOPK algorithm, and a search engine records all search strings used by an independent visitor for each search through log files, wherein the length of each query string is 1-255 bytes. Assuming that ten million records exist at present (the repetition degree of the query strings is relatively high, although the total number is 1 million, but if the repetition degree is removed, the number is not more than 3 million, the higher the repetition degree of one query string is, the more independent visitors querying the query string is, the more hot, the most hot 10 query strings are counted, and the most hot 10 retrieval results in the search engine are obtained.
Further, in the search result evaluation method of the search engine, the calculation process of the position score is as follows:
counting clicks of the content of the retrieval result page corresponding to the same search word of each independent visitor at the same retrieval result content position once, and accumulating the counts of clicks at different positions corresponding to the retrieval result content;
taking a click rate CTR as the position score, wherein the CTR is the number of clicks/exposure times; the number of exposures is the number of retrieval result pages, that is, the number of independent visitors corresponding to the same search term.
Wherein, UV (independent visitor): namely, Unique viewer, a computer client accessing your website is a Visitor. The same client within 00:00-24:00 is computed only once.
For example, a) one UV, the same search term result list, allows to click on the positions of a plurality of different search results, but for the click on the position of the same search result, only 1 time is counted, and the positions of different search results are counted +1 corresponding to the position;
b) taking the click rate CTR as a position score, wherein CTR is the click times/exposure times; a search action, obtaining 10 results, and the independent visitor A clicks on the positions 2, 3 and 5; another independent visitor B clicks on positions 1, 2, 3. Then click rate for position 1: 1/2, click rate for position 2: 2/2, position 3 click rate: 1/2, position 4 click rate: 0, position 5 click rate: 1/2.
Further, the click behaviors corresponding to all search terms of all independent visitors are counted from the search behavior data, and the click rate corresponding to the top K item (TOPK) search ranking result according to the click position is as follows:
Figure BDA0001280139250000031
wherein, i represents the position number of the retrieval result, k represents the number of independent visitors, and CTR represents the click rate.
Further, according to the top K item (TOPK) search ranking result, performing log2 attenuation according to the search result position i, and the corresponding calculation formula of the total evaluation score of the search engine quality index (DCG) is as follows:
Figure BDA0001280139250000041
wherein i represents the position number of the retrieval result, and K represents K results before searching and sorting.
In another aspect of the present invention, to achieve the above object, the present invention further provides a search result evaluation device for a search engine, the device comprising:
the data acquisition module is used for acquiring the search behavior data of all independent visitors, and acquiring top K item (TOPK) search sequencing results and retrieval result pages corresponding to all independent visitors of the same retrieval word according to the search behavior data.
The system comprises a search engine quality index (DCG) calculation module, a search result content position search module and a search result content position search module, wherein the DCG calculation module is used for obtaining click rate corresponding to the search result content position according to click data of the search result content position in a search result page as a position score; obtaining a search engine quality index (DCG) evaluation score of each retrieval result page through a DCG evaluation model according to the position score;
meanwhile, according to the top K items (TOPK) search sorting result of the search engine, combining the evaluation score of the search engine quality index (DCG) of each retrieval result page; and obtaining a search engine quality index (DCG) overall evaluation score corresponding to the top K item (TOPK) search ranking result.
Further, the data acquisition module acquires the search behavior data of the independent visitor from a server log file, a visitor access log file and the like.
Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a search result evaluation program of a search engine, which when executed by a processor, implements the steps of the search result evaluation method of the search engine as described above:
acquiring click data of the retrieval result content position in the retrieval result page, and taking the click rate corresponding to the retrieval result content position as a position score;
obtaining a search engine quality index (DCG) evaluation score of each retrieval result page through a DCG evaluation model according to the position score;
combining the evaluation scores of the search engine quality index (DCG) of each retrieval result page according to the top K items (TOPK) search ranking results of the search engine; and obtaining a search engine quality index (DCG) overall evaluation score corresponding to the top K item (TOPK) search ranking result.
The search result evaluation method and device of the search engine and the computer readable medium combine the evaluation algorithm of the search engine quality index (DCG) which is common in the traditional offline search engine and the online user search behavior data to optimize and obtain the online search engine evaluation model, can directly convert the user click rate into the search engine quality index (DCG) score which is measured by the search engine, and finally evaluate the search effect of the search result according to the real user behavior. And counting the search behaviors of all search terms of all users, and performing overall evaluation of a search engine quality index (DCG) by combining search ranking results of K items (TOPK) before a search engine, wherein the higher the score is, the better the result is.
Drawings
Fig. 1 is a flowchart of a first method for evaluating search results of a search engine according to various embodiments of the present invention.
Fig. 2 is a flowchart of a retrieval result evaluation method of a second search engine according to various embodiments of the present invention.
Fig. 3 is a block diagram of a search result evaluation apparatus for a search engine according to various embodiments of the present invention.
Fig. 4 is a block diagram of a DCG calculation module for implementing various embodiments of the present invention.
Fig. 5 is a block diagram of the execution steps of a search result evaluation program of a search engine implementing various embodiments of the present invention.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
A mobile terminal implementing various embodiments of the present invention will now be described with reference to the accompanying drawings. In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in themselves. Thus, "modules" and "components" may be used in a mixture.
The principle of the evaluation method for measuring the search engine quality index (DCG) related by the invention is explained as follows:
DCG is the abbreviation of English language discrete relational Gain, and Chinese can be translated into 'index for measuring quality of search engine'. The basic idea of the method for measuring the search engine quality index (DCG) is as follows:
1. the relevance of each result is graded for measurement;
2. considering the position of the result, the more the position is forward, the higher the importance degree is;
3. the more forward the result position is, the higher the value should be, otherwise a penalty is given.
Looking at the first bar first: and (5) grading the relevance. This is more elaborate than simply counting "accurate" or "inaccurate" when calculating Precision.
We can subdivide the results into multiple levels.
Such as the commonly used 3-level: good (Good), Fair (normal), Bad (Bad). The corresponding score rel is: good: 3/Fair: 2/Bad: 1.
some more detailed evaluations use a 5-level classification: very Good, Fair, Bad, Very Bad, the corresponding score rel may be set as: very Good: 2/Good: 1/Fair: 0/Bad: -1/Very Bad: -2.
The criterion of the judgment result can be determined according to specific application, and Very Good generally means that the subjects of the result are completely related, and the webpage content is rich and the webpage quality is high. And is specific to each
Figure BDA0001280139250000061
The formula for calculating the DCG is not unique and theoretically requires only the smoothness of the log discount factor. For example, the following DCG formula is more reasonable, emphasizing the correlation, and the discount coefficients of the 1 st and 2 nd results are also more reasonable:
Figure BDA0001280139250000062
the Discount factor (Discount factor) values of the results at the first 4 positions of DCG are shown in Table 1 below:
TABLE 1
i log2(i+1) 1/log2(i+1)
1 1 1
2 1.59 0.63
3 2 0.5
4 2.32 0.43
Taking the log value with base 2 also comes from empirical formulas and there is no theoretical basis. In fact, the base of the Log can be modified according to the smoothing requirements, when the values are increased (e.g. using Log)5Log replacement2) Discount factor is decreasedLower is more rapid, when the weighting of the previous result is emphasized.
To facilitate lateral comparisons between different types of query results, some evaluation systems also normalized DCG based on DCG, which are collectively referred to as nDCG (i.e., normaize DCG). The most common calculation method is to normalize by dividing by the ideal value iDCG (ideal DCG) for each query, the formula:
Figure BDA0001280139250000071
the ideal iDCG needs to be calibrated for the nDCG, and the iDCG is extremely difficult in actual operation, because everyone often understands the best result differently, and it is a difficult task to select the optimal result from the massive data, but it is usually easier to compare the two groups of results, so in practical application, a method for comparing results is usually selected for evaluation.
Example 1
Based on the evaluation method for measuring the search engine quality index (DCG), the invention provides various embodiments of the method.
As shown in fig. 1, a first embodiment of the present invention provides a method for evaluating search results of a search engine, including the steps of:
s101, acquiring click data of the content position of the retrieval result in a retrieval result page, and taking the click rate corresponding to the content position of the retrieval result as a position score;
s102, obtaining a search engine quality index (DCG) evaluation score of each retrieval result page through a DCG evaluation model according to the position score;
s103, combining the evaluation scores of the search engine quality index (DCG) of each retrieval result page according to the top K item (TOPK) search ranking results of the search engine; and obtaining a search engine quality index (DCG) overall evaluation score corresponding to the top K item (TOPK) search ranking result.
In the evaluation method, the position click rate in the independent visitor search behavior data is used as a test colleague score in evaluating the search engine quality index (DCG): taking the click behavior of the independent visitor as a model, and when the independent visitor clicks the content of the retrieval result page, indicating that the retrieval result of the position is satisfied; the click rate of most independent visitors to the same position is high, and the retrieval result quality of the position is considered to be good; the resources in the lower ranking are subjected to score attenuation when calculating scores, and the better the ranking of the search engine is, the better the search quality of the first few resources is; and counting the search behaviors of all search terms of all independent visitors, and performing overall evaluation of a search engine quality index (DCG) by combining search ranking results of K items (TOPK) before the search engine, wherein the higher the score is, the better the result is.
As shown in fig. 2, the first embodiment of the present invention proposes a second search engine search result evaluation method, which includes the following steps:
s201, acquiring search behavior data from a server log file, a visitor access log file and the like; acquiring the search result page and top K item (TOPK) search sequencing results corresponding to all independent visitors of the same search word from the search behavior data;
s202, acquiring click data of the retrieval result content position in a retrieval result page, and scoring the click rate corresponding to the retrieval result content position as a position score;
s203, obtaining a search engine quality index (DCG) evaluation score of each retrieval result page through a DCG evaluation model according to the position score;
s204, combining the evaluation scores of the search engine quality index (DCG) of each retrieval result page according to the top K item (TOPK) search ranking results of the search engine; and obtaining a search engine quality index (DCG) overall evaluation score corresponding to the top K item (TOPK) search ranking result.
User operation behavior data for network user behavior analysis can be obtained from server databases such as server log files and visitor access log files, for example, which search terms are used by users, which search result page contents are obtained, results of which positions of the search result page contents are clicked, then, the user operation behavior data of all the users are analyzed, and search results of the top K item (TOPK) and most popular search terms can be obtained.
The top K item (TOPK) search sorting result is obtained through a TOPK algorithm, and a search engine records all search strings used by an independent visitor for each search through log files, wherein the length of each query string is 1-255 bytes. Assuming that ten million records exist at present (the repetition degree of the query strings is relatively high, although the total number is 1 million, but if the repetition degree is removed, the number is not more than 3 million, the higher the repetition degree of one query string is, the more users who query the query string is, the more popular is), the most popular 10 query strings are counted, namely, the most popular 10 search results in the search engine are obtained.
Wherein the position score is calculated as follows:
counting clicks of the content of the retrieval result page corresponding to the same search word of each independent visitor at the same retrieval result content position once, and accumulating the counts of clicks at different positions corresponding to the retrieval result content;
taking a click rate CTR as the position score, wherein the CTR is the number of clicks/exposure times; the number of exposures is the number of retrieval result pages, that is, the number of independent visitors corresponding to the same search term.
Wherein, UV (independent visitor): namely, Unique viewer, a computer client accessing your website is a Visitor. The same client within 00:00-24:00 is computed only once.
For example, a) one UV, the same search term result list, allows to click on the positions of a plurality of different search results, but for the click on the position of the same search result, only 1 time is counted, and the positions of different search results are counted +1 corresponding to the position;
b) taking the click rate CTR as a position score, wherein CTR is the click times/exposure times; a search action, obtaining 10 results, and the independent visitor A clicks on the positions 2, 3 and 5; another independent visitor B clicks on positions 1, 2, 3. Then click rate for position 1: 1/2, click rate for position 2: 2/2, position 3 click rate: 1/2, position 4 click rate: 0, position 5 click rate: 1/2.
Further, the click behaviors corresponding to all search terms of all independent visitors are counted from the search behavior data, and the click rate corresponding to the top K item (TOPK) search ranking result according to the click position is as follows:
Figure BDA0001280139250000101
wherein, i represents the position number of the retrieval result, k represents the number of independent visitors, and CTR represents the click rate.
For example, a formula is calculated according to the click rate of the position corresponding to the click position according to the top K Term (TOPK) search ranking result, such as the result list shown in table 2 below.
TABLE 2
Position i CTR
Position 1 20%
Position 2 50%
Position 3
Position 4
Position K
According to the top K item (TOPK) search sorting result, carrying out log2 attenuation according to a search result position i, wherein the corresponding calculation formula of the search engine quality index (DCG) overall evaluation score is as follows:
Figure BDA0001280139250000102
wherein i represents the position number of the retrieval result, and K represents K results before searching and sorting.
Example 2
In another aspect of the present invention, to achieve the above object, as shown in fig. 3, the present invention further provides a search result evaluation device for a search engine, the device comprising:
the data obtaining module 200 is configured to obtain search behavior data of all independent visitors, and obtain top K item (TOPK) search ranking results and search result pages corresponding to all independent visitors of the same search term according to the search behavior data.
A search engine quality index (DCG) evaluation calculation module 300, configured to obtain, according to click data of a content position of a search result in a search result page, a click rate corresponding to the content position of the search result as a position score; obtaining a search engine quality index (DCG) evaluation score of each retrieval result page through a DCG evaluation model according to the position score;
meanwhile, according to the top K items (TOPK) search sorting result of the search engine, combining the evaluation score of the search engine quality index (DCG) of each retrieval result page; and obtaining a search engine quality index (DCG) overall evaluation score corresponding to the top K item (TOPK) search ranking result.
The device further comprises a DCG score output module 400, which is used for outputting the result obtained by the DCG calculation module 300 to an operation interface on line, so that a user can intuitively obtain the evaluation result of the search engine.
The data obtaining module 200 obtains the search behavior data of the user from a server log file, a user access log file, and the like, where the log files are stored in the server database 100 and share the same database with the network user operation behavior analysis system.
As shown in fig. 4, the search engine quality index (DCG) evaluation module 300 includes a click rate calculating unit 310 and a DCG evaluation score calculating unit 320, wherein the click rate calculating unit 310 is configured to obtain a click rate corresponding to a content location of a search result according to click data of the content location of the search result in a search result page, and the DCG evaluation score calculating unit 320 is configured to combine a search engine quality index (DCG) evaluation score of each search result page according to top K (TOPK) search ranking results of the search engine; and obtaining a search engine quality index (DCG) overall evaluation score corresponding to the top K item (TOPK) search ranking result.
Example 3
Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a search result evaluation program of a search engine, which when executed by a processor, implements the steps of the search result evaluation method of the search engine as described above:
acquiring click data of the retrieval result content position in the retrieval result page, and taking the click rate corresponding to the retrieval result content position as a position score;
obtaining a search engine quality index (DCG) evaluation score of each retrieval result page through a DCG evaluation model according to the position score;
combining the evaluation scores of the search engine quality index (DCG) of each retrieval result page according to the top K items (TOPK) search ranking results of the search engine; and obtaining a search engine quality index (DCG) overall evaluation score corresponding to the top K item (TOPK) search ranking result.
Specifically, as shown in fig. 5, the search engine online general evaluation program executes the following processes:
the server of the search engine collects the click behavior of the independent visitor in real time, obtains user search behavior data, and stores the user search behavior data in a certain time period, such as 1 day, 1 week or 1 month.
The log files of the server or the log files of the visitors are stored in a server database, and the storage mode of the user search behavior data depends on the mode of collecting the network user operation behavior data by the server.
And obtaining the search terms and the search result page content corresponding to each independent visitor from the user search behavior data, and obtaining the click position information of the user in the search result page content. Based on the content of each search result page corresponding to the same search word, including the homonym such as the similar meaning word, the TOPK search result, that is, the search result of the previous K items corresponding to each search word, can be obtained by counting the click position information of the user.
According to the click position, the position click rate corresponding to the TOPK search result in the search result page content can be obtained.
And calculating the position click rate corresponding to the TOPK retrieval result to obtain the DCG evaluation score of the search engine, wherein when the DCG evaluation score is higher, the higher the retrieval result accuracy of the search engine is.
The search result evaluation method and device of the search engine and the computer readable medium combine the evaluation algorithm of the search engine quality index (DCG) which is common in the traditional offline search engine and the online user search behavior data to optimize and obtain the online search engine evaluation model, can directly convert the user click rate into the search engine quality index (DCG) score which is measured by the search engine, and finally evaluate the search effect of the search result according to the real user behavior. And counting the search behaviors of all search terms of all users, and performing overall evaluation of a search engine quality index (DCG) by combining search ranking results of K items (TOPK) before a search engine, wherein the higher the score is, the better the result is.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (8)

1. A method for evaluating a search result of a search engine, the method comprising the steps of:
acquiring click data of the retrieval result content position in the retrieval result page, and taking the click rate corresponding to the retrieval result content position as a position score;
obtaining the evaluation score of the search engine quality index of each retrieval result page through a search engine quality index evaluation model according to the position score;
according to the first K search ranking results of the search engine, combining the evaluation scores of the search engine quality indexes measured on each search result page; obtaining the total evaluation score of the search engine quality index corresponding to the first K search ranking results; the first K search ranking results of the search engine are the retrieval results of the top K most popular query strings in the search engine;
the method further comprises the steps of obtaining search behavior data of all independent visitors, obtaining top K search ranking results according to the search behavior data, and retrieval result pages corresponding to all independent visitors of the same retrieval word; the front K item searching and sorting result is obtained through a TOPK algorithm; the same search term comprises all the search terms containing the characters of the search term, the similar meaning words of the search term and different translations of the search term.
2. The method of claim 1, further comprising retrieving search behavior data from a server log file and a guest access log file.
3. The method of claim 1, wherein the position score is calculated as follows:
counting clicks of the content of the retrieval result page corresponding to the same search word of each independent visitor at the same retrieval result content position once, and accumulating the counts of clicks at different positions corresponding to the retrieval result content;
taking a click rate CTR as the position score, wherein the CTR is the number of clicks/exposure times; the number of exposures is the number of retrieval result pages, that is, the number of independent visitors corresponding to the same search term.
4. The method according to claim 3, wherein click behaviors corresponding to all search terms of all independent visitors are counted from the search behavior data, and the click rate corresponding to the top K search ranking results according to click positions is as follows:
Figure FDA0002727942480000021
wherein, i represents the position number of the retrieval result, N represents the number of independent visitors, and CTR represents the click rate.
5. The method according to claim 4, wherein log2 is attenuated according to the retrieval result position i according to the top K search ranking results, and the calculation formula of the overall evaluation score of the corresponding search engine quality index is:
Figure FDA0002727942480000022
wherein i represents the position number of the retrieval result, and K represents K results before searching and sorting.
6. A search result evaluation apparatus for a search engine, the apparatus comprising:
the data acquisition module is used for acquiring search behavior data of all independent visitors, and acquiring top K search ranking results and retrieval result pages corresponding to all independent visitors of the same retrieval word according to the search behavior data; the front K item searching and sorting result is obtained through a TOPK algorithm; the same search term comprises all search terms containing the characters of the search term, the similar meaning words of the search term and different translations of the search term;
the quality index calculation module of the search engine is used for obtaining the click rate corresponding to the content position of the retrieval result as a position score according to the click data of the content position of the retrieval result in the retrieval result page; obtaining the evaluation score of the search engine quality index of each retrieval result page through a search engine quality index evaluation model according to the position score;
meanwhile, according to the top K search ranking results of the search engine, combining the evaluation scores of the search engine quality indexes of each retrieval result page; obtaining the total evaluation score of the search engine quality index corresponding to the first K search ranking results; the top K search ranking results of the search engine are the retrieval results of the top K most popular query strings in the search engine.
7. The apparatus of claim 6, wherein the data obtaining module obtains the search behavior data of the independent visitor from a server log file and a visitor access log file.
8. A computer-readable storage medium, on which a search result evaluation program of a search engine is stored, the search result evaluation program of the search engine implementing the steps of the search result evaluation method of the search engine according to any one of claims 1 to 5 when executed by a processor.
CN201710293371.7A 2017-04-26 2017-04-26 Search engine retrieval result evaluation method and device and computer readable medium Active CN107122467B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710293371.7A CN107122467B (en) 2017-04-26 2017-04-26 Search engine retrieval result evaluation method and device and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710293371.7A CN107122467B (en) 2017-04-26 2017-04-26 Search engine retrieval result evaluation method and device and computer readable medium

Publications (2)

Publication Number Publication Date
CN107122467A CN107122467A (en) 2017-09-01
CN107122467B true CN107122467B (en) 2020-12-29

Family

ID=59726440

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710293371.7A Active CN107122467B (en) 2017-04-26 2017-04-26 Search engine retrieval result evaluation method and device and computer readable medium

Country Status (1)

Country Link
CN (1) CN107122467B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885857B (en) * 2017-11-17 2019-02-12 山东师范大学 A kind of search results pages user's behavior pattern mining method, apparatus and system
CN108460085A (en) * 2018-01-19 2018-08-28 北京奇艺世纪科技有限公司 A kind of video search sequence training set construction method and device based on user journal
CN109190129A (en) * 2018-08-31 2019-01-11 传神语联网网络科技股份有限公司 A kind of multilingual translation quality evaluation engine based near synonym knowledge mapping
CN110941786A (en) * 2018-09-21 2020-03-31 广州神马移动信息科技有限公司 Method and device for monitoring search effect
CN111061942B (en) * 2018-10-17 2023-04-18 阿里巴巴集团控股有限公司 Search ranking monitoring method and system
CN110580322B (en) * 2019-09-18 2022-03-15 北京百度网讯科技有限公司 Independent visitor information processing method and device, electronic equipment and storage medium
CN110674400B (en) * 2019-09-18 2022-05-10 北京字节跳动网络技术有限公司 Sorting method, sorting device, electronic equipment and computer-readable storage medium
CN112749316A (en) * 2019-10-29 2021-05-04 阿里巴巴集团控股有限公司 Translation quality determination method and device, storage medium and processor
CN111367778B (en) * 2020-03-13 2023-07-07 百度在线网络技术(北京)有限公司 Data analysis method and device for evaluating search strategy
CN111612658B (en) * 2020-05-29 2022-03-01 北京华宇元典信息服务有限公司 Evaluation method and evaluation device for legal data retrieval and electronic equipment
CN113010776B (en) * 2021-03-03 2022-12-09 昆明理工大学 Meta-search sequencing Top-k polymerization method based on Monroe rule
CN113065065A (en) * 2021-03-30 2021-07-02 广联达科技股份有限公司 Method, device and equipment for evaluating search performance and readable storage medium
CN113220967B (en) * 2021-05-11 2023-09-22 北京百度网讯科技有限公司 Ecological health degree measuring method and device for Internet environment and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103064852A (en) * 2011-10-20 2013-04-24 阿里巴巴集团控股有限公司 Website statistical information processing method and website statistical information processing system
CN104063523A (en) * 2014-07-21 2014-09-24 焦点科技股份有限公司 E-commerce search scoring and ranking method and system
CN105808590A (en) * 2014-12-31 2016-07-27 中国电信股份有限公司 Search engine realization method as well as search method and apparatus

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1924855A (en) * 2006-09-18 2007-03-07 浙江大学 Arrangement method in image search engine
CN100507920C (en) * 2007-05-25 2009-07-01 清华大学 Search engine retrieving result reordering method based on user behavior information
CN104636407B (en) * 2013-11-15 2019-07-19 腾讯科技(深圳)有限公司 Parameter value training and searching request treating method and apparatus
CN103646092B (en) * 2013-12-18 2017-07-04 孙燕群 Based on the method for sequencing search engines that user participates in
US10592514B2 (en) * 2015-09-28 2020-03-17 Oath Inc. Location-sensitive ranking for search and related techniques

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103064852A (en) * 2011-10-20 2013-04-24 阿里巴巴集团控股有限公司 Website statistical information processing method and website statistical information processing system
CN104063523A (en) * 2014-07-21 2014-09-24 焦点科技股份有限公司 E-commerce search scoring and ranking method and system
CN105808590A (en) * 2014-12-31 2016-07-27 中国电信股份有限公司 Search engine realization method as well as search method and apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于双向热点跟踪的搜索引擎优化模型";王果等;《计算机应用与软件》;20130215;第30卷(第2期);第144-147页 *

Also Published As

Publication number Publication date
CN107122467A (en) 2017-09-01

Similar Documents

Publication Publication Date Title
CN107122467B (en) Search engine retrieval result evaluation method and device and computer readable medium
US9535911B2 (en) Processing a content item with regard to an event
US8166032B2 (en) System and method for sentiment-based text classification and relevancy ranking
US9317550B2 (en) Query expansion
CN104657425B (en) Topic management type network public opinion evaluation management system and method
Blooma et al. A predictive framework for retrieving the best answer
JP4920023B2 (en) Inter-object competition index calculation method and system
CN108073568A (en) keyword extracting method and device
RU2680746C2 (en) Method and device for developing web page quality model
KR20080068825A (en) Selecting high quality reviews for display
CN109190033B (en) User friend recommendation method and system
CN104268142B (en) Based on the Meta Search Engine result ordering method for being rejected by strategy
US20130006975A1 (en) System and method for matching entities and synonym group organizer used therein
CN109711424B (en) Behavior rule acquisition method, device and equipment based on decision tree
CN110334356A (en) Article matter method for determination of amount, article screening technique and corresponding device
JP2011108053A (en) System for evaluating news article
CN115860283B (en) Contribution degree prediction method and device based on knowledge worker portrait
CN116955833A (en) User behavior analysis system and method
CN112202889A (en) Information pushing method and device and storage medium
CN108153857A (en) A kind of method and system for being used to be associated network access data processing
CN111858895B (en) Sequencing model determining method, sequencing device and electronic equipment
JP2022111544A (en) Information processing system and information processing method
CN113051482A (en) Web page search intelligent matching recommendation method based on user feature recognition and behavior analysis
CN113468206A (en) Data maintenance method, device, server, medium and product
JP5801243B2 (en) Feature keyword recommendation device, method and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant