CN107122467A - The retrieval result evaluation method and device of a kind of search engine, computer-readable medium - Google Patents

The retrieval result evaluation method and device of a kind of search engine, computer-readable medium Download PDF

Info

Publication number
CN107122467A
CN107122467A CN201710293371.7A CN201710293371A CN107122467A CN 107122467 A CN107122467 A CN 107122467A CN 201710293371 A CN201710293371 A CN 201710293371A CN 107122467 A CN107122467 A CN 107122467A
Authority
CN
China
Prior art keywords
retrieval result
search engine
search
quality index
retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710293371.7A
Other languages
Chinese (zh)
Other versions
CN107122467B (en
Inventor
李悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nubia Technology Co Ltd
Original Assignee
Nubia Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nubia Technology Co Ltd filed Critical Nubia Technology Co Ltd
Priority to CN201710293371.7A priority Critical patent/CN107122467B/en
Publication of CN107122467A publication Critical patent/CN107122467A/en
Application granted granted Critical
Publication of CN107122467B publication Critical patent/CN107122467B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Abstract

The invention discloses the retrieval result evaluation method and device of a kind of search engine, computer-readable medium, it is intended to the problem of retrieval result evaluation method for solving existing search engine lacks versatility and objectivity.This method comprises the following steps:The click data to retrieval result location of content in retrieval result page is obtained, and using the clicking rate of the correspondence retrieval result location of content as position score;Measurement search engine quality index (DCG) evaluation score of each retrieval result page will be obtained by weighing search engine quality index (DCG) evaluation model according to the position score;According to preceding K (TOPK) searching order results of search engine, with reference to measurement search engine quality index (DCG) evaluation score of each retrieval result page;Obtain corresponding measurement search engine quality index (DCG) the overall assessment fraction of described preceding K (TOPK) searching order results.

Description

The retrieval result evaluation method and device of a kind of search engine, computer-readable medium
Technical field
The present invention relates to the retrieval result evaluation method and dress of network communication technology field, more particularly to a kind of search engine Put, computer-readable medium.
Background technology
Search engine refer to it is automatic collect information from internet, there is provided inquired about to user after certain arrange System.Information vastness on internet is multifarious, and has no order, and all information are as the island one by one on vast sea, webpage Link is crisscross bridge between these islands, and search engine, then for user with drawing an open-and-shut information Figure, is consulted at any time for user.They extract the information of each website from internet (based on webpage word), it is established that data Storehouse, and the record matched with user's querying condition can be retrieved, by certain returning result that puts in order.Full-text search engine is The main flow search engine being widely used at present, it is Google that foreign countries, which represent search, domestic then have maximum Chinese search Baidu.They The information of each website is extracted from internet (based on webpage word), it is established that database, and can retrieved and user's inquiry bar The record that part matches, by certain returning result that puts in order.
Due to the fast development of internet information retrieval technique, various search engines emerge in an endless stream, and are on the one hand examined for user Rope information is provided convenience, and on the other hand many users is felt at a loss as to what to do, it is not known that how to select suitable retrieval to draw Hold up, thus propose the requirement for evaluating search engine.By reasonably being evaluated search engine, not only contribute to user's Select and use, and be conducive to itself improvement and development.One of existing main search engine evalution method is Cranfield appraisement systems:This title of Cranfield-like approach derives from Britain Cranfield University, because the university first proposed so a set of evaluation system in nineteen fifties:By inquiry sample The complete evaluation and test scheme that collection, correct option collection, evaluation metricses are constituted, and " evaluation " is established in Research into information retrieval from this Core status.Cranfield evaluation systems are widely used in major search engine companies.During concrete application, need first The problem of solving is one test inquiry set of words of construction.Conventional search engine evalution method also includes Precision- Recall (accuracy rate-recall rate) method, P@N methods, DCG (weighing search engine quality index) method etc..
But, the online evaluation of existing search engine search effect is how related to business, i.e., carry out certain to online user The shunting of rule is planted, by the different service releases of user guiding, and finally with the purchase conversion ratio with business strong correlation, download conversion Rate, music conversion ratio are as evaluation index, and to assess the search effect quality of different editions, business combination is too close, no It is enough general.
Meanwhile, DCG (weighing search engine quality index) evaluation algorithms of existing search engine search effect are used for Evaluated under line, and be that several tests colleague few in number is carried out based on evaluation marking, subjectivity is too strong, causes search under line to be commented Valency result is undesirable, not objective.
The content of the invention
It is a primary object of the present invention to propose that the retrieval result evaluation method and device, computer of a kind of search engine can Read medium, it is intended to the problem of retrieval result evaluation method for solving existing search engine lacks versatility and objectivity.
To achieve the above object, a kind of retrieval result evaluation method for search engine that the present invention is provided, this method includes Following steps:
The click data to retrieval result location of content in retrieval result page is obtained, and with the correspondence retrieval result content The clicking rate of position is position score;
Each retrieval will be obtained by weighing search engine quality index (DCG) evaluation model according to the position score Measurement search engine quality index (DCG) evaluation score of result page;
According to preceding K (TOPK) searching order results of search engine, searched with reference to the measurement of each retrieval result page Index holds up quality index (DCG) evaluation score;Obtain the corresponding measurement search engine of described preceding K (TOPK) searching order results Quality index (DCG) overall assessment fraction.
Further, the retrieval result evaluation method of the search engine is also included from server log file, guest access Search behavior data are obtained in journal file etc..
Further, the retrieval result evaluation method of the search engine also includes obtaining same from search behavior data The corresponding retrieval result page of all independent visitors of term.
Further, the retrieval result evaluation method of the search engine also includes K before being obtained from search behavior data Item (TOPK) searching order result.
First K (TOPK) searching order result is obtained by TOPK algorithms, and search engine can be by journal file independent Visitor retrieves all retrieval strings used and recorded every time, and the length of each query string is 1-255 bytes.Assuming that having at present (multiplicity of these query strings is higher, although sum is 1,000 ten thousand, but if except after deduplication, being no more than for 10000000 records 3000000.The multiplicity of one query string is higher, illustrates that the independent visitor for inquiring about it is more that is, more popular.), statistics is most 10 popular query strings, that is to say 10 retrieval results most popular in this search engine.
Further, in the retrieval result evaluation method of the search engine, the calculating process of the position score is as follows:
The corresponding retrieval result page content of same search term of each independent visitor is in same retrieval result location of content Click-through count once, corresponds to the click accumulated counts of retrieval result content diverse location;
Using clicking rate CTR as the position score, the CTR=numbers of clicks/exposure frequency;Wherein, exposure frequency is The quantity of retrieval result page, that is to say the corresponding independent visitor's quantity of same search term.
Wherein, UV (independent visitor):That is Unique Visitor, it is a visit to access a computer client of your website Visitor.00:00-24:Identical client is only calculated only once in 00.
For example, a) UV, same search term result list, it is allowed to click on the position of multiple different retrieval results, but Click on, only count 1 time for the position of same retrieval result, then correspondence position counts+1 for the position of different retrieval results;
B) using clicking rate CTR as position score, CTR=numbers of clicks/exposure frequency;Search behavior, obtains 10 Individual result, independent visitor A clicks position 2,3,5;Another independent visitor B, clicks position 1,2,3.So point of position 1 Hit rate:1/2, the clicking rate of position 2:2/2, the clicking rate of position 3:1/2, the clicking rate of position 4:0, the clicking rate of position 5:1/2.
Further, from the corresponding click behavior of all search terms of all independent visitors of search behavior data statistics, institute K (TOPK) searching order results are according to the corresponding clicking rate of click location before stating:
Wherein, i--- represents retrieval result positional number, and k--- represents independent visitor's number, and CTR--- is clicked on Rate.
Further, according to first K (TOPK) the searching order result, carry out log2's according to retrieval result position i Decay, corresponding measurement search engine quality index (DCG) overall assessment fraction calculation formula is:
Wherein i--- represents retrieval result positional number, and K--- represents searching order Preceding K bars result.
Another aspect of the present invention, to achieve the above object, the present invention also propose that a kind of retrieval result of search engine is commented Valency device, the device includes:
Data acquisition module, the search behavior data for obtaining all independent visitors, according to the search behavior data K (TOPK) searching order results before obtaining, and same term the corresponding retrieval result page of all independent visitors.
Weigh search engine quality index (DCG) computing module, for according in retrieval result page to retrieval result content The clicking rate that the click data of position obtains the correspondence retrieval result location of content is position score;And obtained according to the position Point, the measurement search engine quality of each retrieval result page is obtained by weighing search engine quality index (DCG) evaluation model Index (DCG) evaluation score;
Meanwhile, according to preceding K (TOPK) searching order results of search engine, with reference to the weighing apparatus of each retrieval result page Measure search engine quality index (DCG) evaluation score;Corresponding weigh of described preceding K (TOPK) searching order results is obtained to search for Engine masses index (DCG) overall assessment fraction.
Further, the data acquisition module is obtained solely from server log file, guest access journal file etc. The search behavior data of vertical visitor.
In addition, to achieve the above object, present invention also offers a kind of computer-readable recording medium, the computer can Read be stored with storage medium the retrieval result assessment process of search engine, the retrieval result assessment process quilt of the search engine The step of retrieval result evaluation method such as above-mentioned search engine is realized during computing device:
The click data to retrieval result location of content in retrieval result page is obtained, and with the correspondence retrieval result content The clicking rate of position is position score;
Each retrieval will be obtained by weighing search engine quality index (DCG) evaluation model according to the position score Measurement search engine quality index (DCG) evaluation score of result page;
According to preceding K (TOPK) searching order results of search engine, searched with reference to the measurement of each retrieval result page Index holds up quality index (DCG) evaluation score;Obtain the corresponding measurement search engine of described preceding K (TOPK) searching order results Quality index (DCG) overall assessment fraction.
The retrieval result evaluation method and device of search engine proposed by the present invention, computer-readable medium are by under traditional wire Search engine universal search performance measures search engine quality index (DCG) evaluation algorithms and online user's search behavior data It is combined, optimization obtains on-line search engine evaluation model, user's clicking rate can be changed into directly to search engine measurement Search engine quality index (DCG) score, the final search effect for evaluating retrieval result is carried out with the behavior of real user.To all The search behavior that all search terms of user occur is counted, and combines K (TOPK) searching order results before search engine Carry out weighing search engine quality index (DCG) overall assessment, score is higher, and explanation result is better.
Brief description of the drawings
Fig. 1 is the retrieval result evaluation method FB(flow block) for the first search engine for realizing each embodiment of the invention.
Fig. 2 is the retrieval result evaluation method FB(flow block) for second of search engine for realizing each embodiment of the invention.
Fig. 3 is a kind of retrieval result evaluating apparatus structured flowchart for the search engine for realizing each embodiment of the invention.
Fig. 4 is the structured flowchart for the DCG computing modules for realizing each embodiment of the invention.
Fig. 5 is the retrieval result assessment process action box figure for the search engine for realizing each embodiment of the invention.
Embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
Describe to realize the mobile terminal of each embodiment of the invention referring now to accompanying drawing.In follow-up description, use For represent element such as " module ", " part " or " unit " suffix only for be conducive to the present invention explanation, itself Not specific meaning.Therefore, " module " can be used mixedly with " part ".
The principle explanation of the present invention for weighing search engine quality index (DCG) evaluation method:
DCG is English Discounted Cumulative Gain abbreviation, and Chinese can be translated as " weighing search engine matter Figureofmerit ".Weigh search engine quality index (DCG) method basic thought be:
1. the correlation of every result classifies to weigh;
2. considering the position where result, the more forward then significance level in position is higher;
3. the result position of grade height (i.e. good result) is more forward, then value should be higher, otherwise gives and punishes.
Look first at first:Correlation is classified.Simple statistics " accurate " or " inaccurate when here than calculating Precision It is really " more fine.
Result can be subdivided into multiple grades by us.
Such as conventional 3 grades:Good (good), Fair (general), Bad (poor).Corresponding score value rel is:Good:3/ Fair:2/Bad:1.
Some more careful assessments use 5 grades of classification:Very Good (substantially good), Good (good), Fair (one As), Bad (poor), Very Bad (substantially poor), correspondence score value rel can be set to:Very Good:2/Good:1/Fair: 0/Bad:-1/Very Bad:-2.
The standard of evaluation result can determine that the theme that Very Good typically refer to result is complete according to specific application Total correlation, and web page contents enrich, are of high quality.And specific to every
DCG calculation formula is not unique, and the flatness of logarithm discount factor is only required in theory.Such as following DCG Formula is more reasonable, highlights correlation, the 1st, the discount factors of 2 articles of results it is also more reasonable:
Now before DCG on 4 positions the discount factor (Discount factor) of result numerical value is as shown in table 1 below is:
Table 1
i log2(i+1) 1/log2(i+1)
1 1 1
2 1.59 0.63
3 2 0.5
4 2.32 0.43
Take and also come from empirical equation with 2 for the log values at bottom, and in the absence of theoretic foundation.In fact, Log base Number can modify according to smooth demand, when increasing numerical value (such as using log5Instead of log2), discount factor is reduced more To be rapid, the weight of above result is now highlighted.
For the ease of lateral comparison between different types of query results, based on DCG, some evaluation systems are also right DCG has carried out normalizing, and these methods are referred to as nDCG (i.e. normalize DCG).The most frequently used computational methods be by divided by Ideal value iDCG (ideal DCG) of each inquiry carries out normalizing, and formula is:
Ask nDCG to need to calibrate the iDCG of ideal situation, be abnormal difficult when practical operation, because everyone Often different is understood to " best result ", it is highly difficult task that optimal result is selected in mass data, but ratio Compared with two groups of results, which is more preferably usually easier, so in practical application, the method for generally selecting Comparative result is estimated.
Embodiment 1
Based on above-mentioned measurement search engine quality index (DCG) evaluation method, each embodiment of the inventive method is proposed.
As shown in figure 1, first embodiment of the invention proposes a kind of retrieval result evaluation method of search engine, this method bag Include following steps:
To the click data of retrieval result location of content in S101, acquisition retrieval result page, and tied with the correspondence retrieval The clicking rate of fruit location of content is position score;
S102, it will obtain each by weighing search engine quality index (DCG) evaluation model according to the position score Measurement search engine quality index (DCG) evaluation score of retrieval result page;
S103, preceding K (TOPK) searching order results according to search engine, with reference to the weighing apparatus of each retrieval result page Measure search engine quality index (DCG) evaluation score;Corresponding weigh of described preceding K (TOPK) searching order results is obtained to search for Engine masses index (DCG) overall assessment fraction.
Measurement search engine matter is used as using the position clicking rate in independent visitor's search behavior data in above-mentioned evaluation method Test colleague during figureofmerit (DCG) is evaluated gives a mark:Behavior is clicked on as model using independent visitor, independent visitor is to retrieval result page When content is clicked on, represent to be satisfied with the location retrieval result;Most independent visitors are higher to the clicking rate of same position, Think that the retrieval result quality of the position is preferable;To the resource of sequence on the lower, calculate timesharing and give score decay, because sequence Better search engine, former resource searching quality are better;The search row that all search terms of all independent visitors occur To be counted, and combine K (TOPK) searching order results progress measurement search engine quality index (DCG) before search engine Overall assessment, score is higher, and explanation result is better.
As shown in Fig. 2 first embodiment of the invention proposes the retrieval result evaluation method of second of search engine, this method Comprise the following steps:
S201, the acquisition search behavior data from server log file, guest access journal file etc.;From search behavior All independent corresponding retrieval result pages of visitor and first K (TOPK) searching order knot of same term are obtained in data Really;
To the click data of retrieval result location of content in S202, acquisition retrieval result page, and tied with the correspondence retrieval The clicking rate of fruit location of content is position score;
S203, it will obtain each by weighing search engine quality index (DCG) evaluation model according to the position score Measurement search engine quality index (DCG) evaluation score of retrieval result page;
S204, preceding K (TOPK) searching order results according to search engine, with reference to the weighing apparatus of each retrieval result page Measure search engine quality index (DCG) evaluation score;Corresponding weigh of described preceding K (TOPK) searching order results is obtained to search for Engine masses index (DCG) overall assessment fraction.
It can be obtained for network user's row from server databases such as server log file, guest access journal files For user's operation behavior data of analysis, which search term such as user has used, and obtains which retrieval result page content, Result to which position of retrieval result page content is clicked on, then by user's operation behavior data of all users Analyzed, the retrieval result of K (TOPK) before resulting in, the most popular term, for a certain term, such as hand Machine, should include the near synonym of all terms for including mobile phone, such as smart mobile phone, and mobile phone, different translations etc..
First K (TOPK) searching order result is obtained by TOPK algorithms, and search engine can be by journal file independent Visitor retrieves all retrieval strings used and recorded every time, and the length of each query string is 1-255 bytes.Assuming that having at present (multiplicity of these query strings is higher, although sum is 1,000 ten thousand, but if except after deduplication, being no more than for 10000000 records 3000000.The multiplicity of one query string is higher, illustrates that the user for inquiring about it is more that is, more popular.), count most popular 10 query strings, that is to say 10 retrieval results most popular in this search engine.
Wherein, the calculating process of the position score is as follows:
The corresponding retrieval result page content of same search term of each independent visitor is in same retrieval result location of content Click-through count once, corresponds to the click accumulated counts of retrieval result content diverse location;
Using clicking rate CTR as the position score, the CTR=numbers of clicks/exposure frequency;Wherein, exposure frequency is The quantity of retrieval result page, that is to say the corresponding independent visitor's quantity of same search term.
Wherein, UV (independent visitor):That is Unique Visitor, it is a visit to access a computer client of your website Visitor.00:00-24:Identical client is only calculated only once in 00.
For example, a) UV, same search term result list, it is allowed to click on the position of multiple different retrieval results, but Click on, only count 1 time for the position of same retrieval result, then correspondence position counts+1 for the position of different retrieval results;
B) using clicking rate CTR as position score, CTR=numbers of clicks/exposure frequency;Search behavior, obtains 10 Individual result, independent visitor A clicks position 2,3,5;Another independent visitor B, clicks position 1,2,3.So point of position 1 Hit rate:1/2, the clicking rate of position 2:2/2, the clicking rate of position 3:1/2, the clicking rate of position 4:0, the clicking rate of position 5:1/2.
Further, from the corresponding click behavior of all search terms of all independent visitors of search behavior data statistics, institute K (TOPK) searching order results are according to the corresponding clicking rate of click location before stating:
Wherein, i--- represents retrieval result positional number, and k--- represents independent visitor's number, and CTR--- is clicked on Rate.
For example, being calculated according to above-mentioned first K (TOPK) searching order result according to the corresponding position clicking rate of click location Formula, the results list as shown in table 2 below.
Table 2
Position i CTR
Position 1 20%
Position 2 50%
Position 3
Position 4
Position K
Wherein, according to first K (TOPK) the searching order result, log2 decay is carried out according to retrieval result position i, It is corresponding measurement search engine quality index (DCG) overall assessment fraction calculation formula be:
Wherein i--- represents retrieval result positional number, and K--- is represented before searching order K bar results.
Embodiment 2
Another aspect of the present invention, to achieve the above object, as shown in figure 3, the present invention also proposes a kind of search engine Retrieval result evaluating apparatus, the device includes:
Data acquisition module 200, the search behavior data for obtaining all independent visitors, according to the search behavior number According to the corresponding retrieval result page of all independent visitors of K (TOPK) searching order results before obtaining, and same term.
Weigh search engine quality index (DCG) computing module 300, for according in retrieval result page in retrieval result The clicking rate for holding the click data acquisition correspondence retrieval result location of content of position is position score;And according to the position Score, the measurement search engine matter of each retrieval result page is obtained by weighing search engine quality index (DCG) evaluation model Figureofmerit (DCG) evaluation score;
Meanwhile, according to preceding K (TOPK) searching order results of search engine, with reference to the weighing apparatus of each retrieval result page Measure search engine quality index (DCG) evaluation score;Corresponding weigh of described preceding K (TOPK) searching order results is obtained to search for Engine masses index (DCG) overall assessment fraction.
Wherein, the device also includes DCG fractions output module 400, online for DCG computing modules 300 to be obtained into result It is output to operation interface so that user can intuitively obtain the evaluation result of search engine.
Wherein, the data acquisition module 200 obtains user from server log file, user access logses file etc. Search behavior data, these journal files are stored in server database 100, with network user's operation behavior analysis system It is used in conjunction with same database.
As shown in figure 4, weighing search engine quality index (DCG) computing module 300 includes the He of clicking rate computing unit 310 DCG evaluation scores computing unit 320, clicking rate computing unit 310 be used for according in retrieval result page to retrieval result content position The click data put obtains the clicking rate of the correspondence retrieval result location of content, and DCG evaluation scores computing unit 320 is used for root According to preceding K (TOPK) searching order results of search engine, with reference to the measurement search engine quality of each retrieval result page Index (DCG) evaluation score;Obtain the corresponding measurement search engine quality index of described preceding K (TOPK) searching order results (DCG) overall assessment fraction.
Embodiment 3
In addition, to achieve the above object, present invention also offers a kind of computer-readable recording medium, the computer can Read be stored with storage medium the retrieval result assessment process of search engine, the retrieval result assessment process quilt of the search engine The step of retrieval result evaluation method such as above-mentioned search engine is realized during computing device:
The click data to retrieval result location of content in retrieval result page is obtained, and with the correspondence retrieval result content The clicking rate of position is position score;
Each retrieval will be obtained by weighing search engine quality index (DCG) evaluation model according to the position score Measurement search engine quality index (DCG) evaluation score of result page;
According to preceding K (TOPK) searching order results of search engine, searched with reference to the measurement of each retrieval result page Index holds up quality index (DCG) evaluation score;Obtain the corresponding measurement search engine of described preceding K (TOPK) searching order results Quality index (DCG) overall assessment fraction.
Specifically, as shown in figure 5, the online general assessment process implementation procedure of the search engine is as follows:
The server of search engine gathers the click behavior of independent visitor in real time, user's search behavior data is obtained, with one The fixed time cycle stores these user's search behavior data, such as 1 day, 1 week or 1 month.
It is stored in the journal file of the journal file of server or visitor in server database, user's search behavior The storage mode of data depends on the mode of collection of server network user's operation behavior data.
From user's search behavior data, the corresponding term of each independent visitor can be obtained, in the retrieval result page Hold, click location information of the user in retrieval result page content.Based on same term, apposition exists including near synonym etc. Interior each corresponding retrieval result page content, by the click location information of counting user, can obtain TOPK retrievals As a result, that is, each term correspondence come before K bars retrieval result.
According to click location, the corresponding position of TOPK retrieval results resulted in retrieval result page content is clicked on Rate.
The corresponding position clicking rate of TOPK retrieval results calculates the DCG evaluation scores for obtaining search engine, divides when DCG is evaluated Number is higher, illustrates that the retrieval result accuracy of search engine is higher.
The retrieval result evaluation method and device of search engine proposed by the present invention, computer-readable medium are by under traditional wire Search engine universal search performance measures search engine quality index (DCG) evaluation algorithms and online user's search behavior data It is combined, optimization obtains on-line search engine evaluation model, user's clicking rate can be changed into directly to search engine measurement Search engine quality index (DCG) score, the final search effect for evaluating retrieval result is carried out with the behavior of real user.To all The search behavior that all search terms of user occur is counted, and combines K (TOPK) searching order results before search engine Carry out weighing search engine quality index (DCG) overall assessment, score is higher, and explanation result is better.
It should be noted that herein, term " comprising ", "comprising" or its any other variant are intended to non-row His property is included, so that process, method, article or device including a series of key elements not only include those key elements, and And also including other key elements being not expressly set out, or also include for this process, method, article or device institute inherently Key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including this Also there is other identical element in process, method, article or the device of key element.
The embodiments of the present invention are for illustration only, and the quality of embodiment is not represented.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Understood based on such, technical scheme is substantially done to prior art in other words Going out the part of contribution can be embodied in the form of software product, and the computer software product is stored in a storage medium In (such as ROM/RAM, magnetic disc, CD), including some instructions are to cause a station terminal equipment (can be mobile phone, computer, clothes It is engaged in device, air conditioner, or network equipment etc.) perform method described in each embodiment of the invention.
The preferred embodiments of the present invention are these are only, are not intended to limit the scope of the invention, it is every to utilize this hair Equivalent structure or equivalent flow conversion that bright specification and accompanying drawing content are made, or directly or indirectly it is used in other related skills Art field, is included within the scope of the present invention.

Claims (10)

1. the retrieval result evaluation method of a kind of search engine, it is characterised in that the described method comprises the following steps:
The click data to retrieval result location of content in retrieval result page is obtained, and with the correspondence retrieval result location of content Clicking rate be position score;
The weighing apparatus of each retrieval result page will be obtained by weighing search engine quality index evaluation model according to the position score Measure search engine quality index evaluation score;
According to preceding K searching order results of search engine, with reference to the measurement search engine quality of each retrieval result page Metrics evaluation fraction;Obtain the corresponding measurement search engine quality index overall assessment fraction of the preceding K searching order results.
2. the retrieval result evaluation method of a kind of search engine according to claim 1, it is characterised in that methods described is also Including obtaining search behavior data from server log file, guest access journal file.
3. the retrieval result evaluation method of a kind of search engine according to claim 2, it is characterised in that methods described is also The corresponding retrieval result page of all independent visitors including obtaining same term from search behavior data.
4. the retrieval result evaluation method of a kind of search engine according to claim 2, it is characterised in that methods described is also K searching order results before including being obtained from search behavior data.
5. the retrieval result evaluation method of a kind of search engine according to claim 1, it is characterised in that the search is drawn In the retrieval result evaluation method held up, the calculating process of the position score is as follows:
Click of the corresponding retrieval result page content of same search term of each independent visitor in same retrieval result location of content Count once, the click accumulated counts of correspondence retrieval result content diverse location;
Using clicking rate CTR as the position score, the CTR=numbers of clicks/exposure frequency;Wherein, exposure frequency is retrieval The quantity of result page, that is to say the corresponding independent visitor's quantity of same search term.
6. the retrieval result evaluation method of a kind of search engine according to claim 5, it is characterised in that from search behavior The corresponding click behavior of all search terms of all independent visitors of data statistics, the preceding K searching order result is according to click The corresponding clicking rate in position is:
Wherein, i--- represents retrieval result positional number, and k--- represents independent visitor's number, CTR--- clicking rates.
7. the retrieval result evaluation method of a kind of search engine according to claim 6, it is characterised in that before described K searching order results, log2 decay is carried out according to retrieval result position i, and corresponding measurement search engine quality index is total Body evaluation score calculation formula is:
Wherein i--- represents retrieval result positional number, and K--- represents K bars before searching order As a result.
8. the retrieval result evaluating apparatus of a kind of search engine, it is characterised in that described device includes:
Data acquisition module, the search behavior data for obtaining all independent visitors are obtained according to the search behavior data Preceding K searching order result, and same term the corresponding retrieval result page of all independent visitors;
Search engine quality index computing module is weighed, for according to the click in retrieval result page to retrieval result location of content The clicking rate that data obtain the correspondence retrieval result location of content is position score;And according to the position score, pass through weighing apparatus Measure the measurement search engine quality index evaluation score that search engine quality index evaluation model obtains each retrieval result page;
Meanwhile, according to preceding K searching order results of search engine, with reference to the measurement search engine of each retrieval result page Quality index evaluation score;Obtain the corresponding measurement search engine quality index overall assessment of the preceding K searching order results Fraction.
9. the retrieval result evaluating apparatus of a kind of search engine according to claim 8, it is characterised in that the data are obtained Modulus block obtains the search behavior data of independent visitor from server log file, guest access journal file.
10. a kind of computer-readable recording medium, it is characterised in that the search that is stored with the computer-readable recording medium is drawn The retrieval result assessment process held up, realizes that right such as will when the retrieval result assessment process of the search engine is executed by processor The step of seeking the retrieval result evaluation method of search engine described in any one of 1-7.
CN201710293371.7A 2017-04-26 2017-04-26 Search engine retrieval result evaluation method and device and computer readable medium Active CN107122467B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710293371.7A CN107122467B (en) 2017-04-26 2017-04-26 Search engine retrieval result evaluation method and device and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710293371.7A CN107122467B (en) 2017-04-26 2017-04-26 Search engine retrieval result evaluation method and device and computer readable medium

Publications (2)

Publication Number Publication Date
CN107122467A true CN107122467A (en) 2017-09-01
CN107122467B CN107122467B (en) 2020-12-29

Family

ID=59726440

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710293371.7A Active CN107122467B (en) 2017-04-26 2017-04-26 Search engine retrieval result evaluation method and device and computer readable medium

Country Status (1)

Country Link
CN (1) CN107122467B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885857A (en) * 2017-11-17 2018-04-06 山东师范大学 A kind of search results pages user's behavior pattern mining method, apparatus and system
CN108460085A (en) * 2018-01-19 2018-08-28 北京奇艺世纪科技有限公司 A kind of video search sequence training set construction method and device based on user journal
CN109190129A (en) * 2018-08-31 2019-01-11 传神语联网网络科技股份有限公司 A kind of multilingual translation quality evaluation engine based near synonym knowledge mapping
CN110580322A (en) * 2019-09-18 2019-12-17 北京百度网讯科技有限公司 Independent visitor information processing method and device, electronic equipment and storage medium
CN110674400A (en) * 2019-09-18 2020-01-10 北京字节跳动网络技术有限公司 Sorting method, sorting device, electronic equipment and computer-readable storage medium
CN110941786A (en) * 2018-09-21 2020-03-31 广州神马移动信息科技有限公司 Method and device for monitoring search effect
CN111061942A (en) * 2018-10-17 2020-04-24 阿里巴巴集团控股有限公司 Search ranking monitoring method and system
CN111367778A (en) * 2020-03-13 2020-07-03 百度在线网络技术(北京)有限公司 Data analysis method and device for evaluating search strategy
CN111612658A (en) * 2020-05-29 2020-09-01 北京华宇元典信息服务有限公司 Evaluation method and evaluation device for legal data retrieval and electronic equipment
CN113010776A (en) * 2021-03-03 2021-06-22 昆明理工大学 Monroe rule-based meta-search sorting Top-k polymerization method
CN113065065A (en) * 2021-03-30 2021-07-02 广联达科技股份有限公司 Method, device and equipment for evaluating search performance and readable storage medium
CN113220967A (en) * 2021-05-11 2021-08-06 北京百度网讯科技有限公司 Method and device for measuring ecological health degree of Internet environment and electronic equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1924855A (en) * 2006-09-18 2007-03-07 浙江大学 Arrangement method in image search engine
CN101055587A (en) * 2007-05-25 2007-10-17 清华大学 Search engine retrieving result reordering method based on user behavior information
CN103064852A (en) * 2011-10-20 2013-04-24 阿里巴巴集团控股有限公司 Website statistical information processing method and website statistical information processing system
CN103646092A (en) * 2013-12-18 2014-03-19 孙燕群 SE (search engine) ordering method based on user participation
CN104063523A (en) * 2014-07-21 2014-09-24 焦点科技股份有限公司 E-commerce search scoring and ranking method and system
CN104636407A (en) * 2013-11-15 2015-05-20 腾讯科技(深圳)有限公司 Parameter choice training and search request processing method and device
CN105808590A (en) * 2014-12-31 2016-07-27 中国电信股份有限公司 Search engine realization method as well as search method and apparatus
US20170091189A1 (en) * 2015-09-28 2017-03-30 Yahoo! Inc. Location-sensitive ranking for search and related techniques

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1924855A (en) * 2006-09-18 2007-03-07 浙江大学 Arrangement method in image search engine
CN101055587A (en) * 2007-05-25 2007-10-17 清华大学 Search engine retrieving result reordering method based on user behavior information
CN103064852A (en) * 2011-10-20 2013-04-24 阿里巴巴集团控股有限公司 Website statistical information processing method and website statistical information processing system
CN104636407A (en) * 2013-11-15 2015-05-20 腾讯科技(深圳)有限公司 Parameter choice training and search request processing method and device
CN103646092A (en) * 2013-12-18 2014-03-19 孙燕群 SE (search engine) ordering method based on user participation
CN104063523A (en) * 2014-07-21 2014-09-24 焦点科技股份有限公司 E-commerce search scoring and ranking method and system
CN105808590A (en) * 2014-12-31 2016-07-27 中国电信股份有限公司 Search engine realization method as well as search method and apparatus
US20170091189A1 (en) * 2015-09-28 2017-03-30 Yahoo! Inc. Location-sensitive ranking for search and related techniques

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
余锦秀: ""基于用户行为分析的搜索引擎自动评价技术研究"", 《中国优秀硕士学位论文全文数据库信息科技辑(月刊)》 *
王果等: ""基于双向热点跟踪的搜索引擎优化模型"", 《计算机应用与软件》 *
邓晓妹 等: ""基于点击日志的搜索引擎用户满意度评价研究"", 《计算机工程与应用》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885857B (en) * 2017-11-17 2019-02-12 山东师范大学 A kind of search results pages user's behavior pattern mining method, apparatus and system
CN107885857A (en) * 2017-11-17 2018-04-06 山东师范大学 A kind of search results pages user's behavior pattern mining method, apparatus and system
CN108460085A (en) * 2018-01-19 2018-08-28 北京奇艺世纪科技有限公司 A kind of video search sequence training set construction method and device based on user journal
CN109190129A (en) * 2018-08-31 2019-01-11 传神语联网网络科技股份有限公司 A kind of multilingual translation quality evaluation engine based near synonym knowledge mapping
CN110941786A (en) * 2018-09-21 2020-03-31 广州神马移动信息科技有限公司 Method and device for monitoring search effect
CN111061942A (en) * 2018-10-17 2020-04-24 阿里巴巴集团控股有限公司 Search ranking monitoring method and system
CN111061942B (en) * 2018-10-17 2023-04-18 阿里巴巴集团控股有限公司 Search ranking monitoring method and system
CN110674400B (en) * 2019-09-18 2022-05-10 北京字节跳动网络技术有限公司 Sorting method, sorting device, electronic equipment and computer-readable storage medium
CN110580322A (en) * 2019-09-18 2019-12-17 北京百度网讯科技有限公司 Independent visitor information processing method and device, electronic equipment and storage medium
CN110674400A (en) * 2019-09-18 2020-01-10 北京字节跳动网络技术有限公司 Sorting method, sorting device, electronic equipment and computer-readable storage medium
CN111367778A (en) * 2020-03-13 2020-07-03 百度在线网络技术(北京)有限公司 Data analysis method and device for evaluating search strategy
CN111367778B (en) * 2020-03-13 2023-07-07 百度在线网络技术(北京)有限公司 Data analysis method and device for evaluating search strategy
CN111612658A (en) * 2020-05-29 2020-09-01 北京华宇元典信息服务有限公司 Evaluation method and evaluation device for legal data retrieval and electronic equipment
CN113010776A (en) * 2021-03-03 2021-06-22 昆明理工大学 Monroe rule-based meta-search sorting Top-k polymerization method
CN113010776B (en) * 2021-03-03 2022-12-09 昆明理工大学 Meta-search sequencing Top-k polymerization method based on Monroe rule
CN113065065A (en) * 2021-03-30 2021-07-02 广联达科技股份有限公司 Method, device and equipment for evaluating search performance and readable storage medium
CN113220967A (en) * 2021-05-11 2021-08-06 北京百度网讯科技有限公司 Method and device for measuring ecological health degree of Internet environment and electronic equipment
CN113220967B (en) * 2021-05-11 2023-09-22 北京百度网讯科技有限公司 Ecological health degree measuring method and device for Internet environment and electronic equipment

Also Published As

Publication number Publication date
CN107122467B (en) 2020-12-29

Similar Documents

Publication Publication Date Title
CN107122467A (en) The retrieval result evaluation method and device of a kind of search engine, computer-readable medium
CN105701216B (en) A kind of information-pushing method and device
CN107391687B (en) Local log website-oriented hybrid recommendation system
CN106372249B (en) A kind of clicking rate predictor method, device and electronic equipment
US8380694B2 (en) Method and system for aggregating reviews and searching within reviews for a product
US8190556B2 (en) Intellegent data search engine
CN101355457B (en) Test method and test equipment
KR100863990B1 (en) Advertising System and method using category
CN109190043A (en) Recommended method and device, storage medium, electronic equipment and recommender system
KR100930786B1 (en) Ad list generation method and system
CN105765573A (en) Improvements in website traffic optimization
CN102841946A (en) Commodity data retrieval sequencing and commodity recommendation method and system
KR20090033989A (en) Method for advertising local information based on location information and system for executing the method
CN103902597A (en) Method and device for determining search relevant categories corresponding to target keywords
CN110334356A (en) Article matter method for determination of amount, article screening technique and corresponding device
CN107153656A (en) A kind of information search method and device
CN114238573B (en) Text countercheck sample-based information pushing method and device
CN103729365A (en) Searching method and system
CN111724238A (en) Method, device and equipment for evaluating product recommendation accuracy and storage medium
CN108920479B (en) Cross-information-source account recommendation method for two micro terminals
KR20100021888A (en) A profit distribution system for content provider and method thereof
CN106919588A (en) A kind of application program search system and method
CN109558544A (en) Sort method and device, server and storage medium
CN112487283A (en) Method and device for training model, electronic equipment and readable storage medium
JP2006318398A (en) Vector generation method and device, information classifying method and device, and program, and computer readable storage medium with program stored therein

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant