CN107122467A - The retrieval result evaluation method and device of a kind of search engine, computer-readable medium - Google Patents
The retrieval result evaluation method and device of a kind of search engine, computer-readable medium Download PDFInfo
- Publication number
- CN107122467A CN107122467A CN201710293371.7A CN201710293371A CN107122467A CN 107122467 A CN107122467 A CN 107122467A CN 201710293371 A CN201710293371 A CN 201710293371A CN 107122467 A CN107122467 A CN 107122467A
- Authority
- CN
- China
- Prior art keywords
- retrieval result
- search engine
- search
- quality index
- retrieval
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Abstract
The invention discloses the retrieval result evaluation method and device of a kind of search engine, computer-readable medium, it is intended to the problem of retrieval result evaluation method for solving existing search engine lacks versatility and objectivity.This method comprises the following steps:The click data to retrieval result location of content in retrieval result page is obtained, and using the clicking rate of the correspondence retrieval result location of content as position score;Measurement search engine quality index (DCG) evaluation score of each retrieval result page will be obtained by weighing search engine quality index (DCG) evaluation model according to the position score;According to preceding K (TOPK) searching order results of search engine, with reference to measurement search engine quality index (DCG) evaluation score of each retrieval result page;Obtain corresponding measurement search engine quality index (DCG) the overall assessment fraction of described preceding K (TOPK) searching order results.
Description
Technical field
The present invention relates to the retrieval result evaluation method and dress of network communication technology field, more particularly to a kind of search engine
Put, computer-readable medium.
Background technology
Search engine refer to it is automatic collect information from internet, there is provided inquired about to user after certain arrange
System.Information vastness on internet is multifarious, and has no order, and all information are as the island one by one on vast sea, webpage
Link is crisscross bridge between these islands, and search engine, then for user with drawing an open-and-shut information
Figure, is consulted at any time for user.They extract the information of each website from internet (based on webpage word), it is established that data
Storehouse, and the record matched with user's querying condition can be retrieved, by certain returning result that puts in order.Full-text search engine is
The main flow search engine being widely used at present, it is Google that foreign countries, which represent search, domestic then have maximum Chinese search Baidu.They
The information of each website is extracted from internet (based on webpage word), it is established that database, and can retrieved and user's inquiry bar
The record that part matches, by certain returning result that puts in order.
Due to the fast development of internet information retrieval technique, various search engines emerge in an endless stream, and are on the one hand examined for user
Rope information is provided convenience, and on the other hand many users is felt at a loss as to what to do, it is not known that how to select suitable retrieval to draw
Hold up, thus propose the requirement for evaluating search engine.By reasonably being evaluated search engine, not only contribute to user's
Select and use, and be conducive to itself improvement and development.One of existing main search engine evalution method is
Cranfield appraisement systems:This title of Cranfield-like approach derives from Britain Cranfield
University, because the university first proposed so a set of evaluation system in nineteen fifties:By inquiry sample
The complete evaluation and test scheme that collection, correct option collection, evaluation metricses are constituted, and " evaluation " is established in Research into information retrieval from this
Core status.Cranfield evaluation systems are widely used in major search engine companies.During concrete application, need first
The problem of solving is one test inquiry set of words of construction.Conventional search engine evalution method also includes Precision-
Recall (accuracy rate-recall rate) method, P@N methods, DCG (weighing search engine quality index) method etc..
But, the online evaluation of existing search engine search effect is how related to business, i.e., carry out certain to online user
The shunting of rule is planted, by the different service releases of user guiding, and finally with the purchase conversion ratio with business strong correlation, download conversion
Rate, music conversion ratio are as evaluation index, and to assess the search effect quality of different editions, business combination is too close, no
It is enough general.
Meanwhile, DCG (weighing search engine quality index) evaluation algorithms of existing search engine search effect are used for
Evaluated under line, and be that several tests colleague few in number is carried out based on evaluation marking, subjectivity is too strong, causes search under line to be commented
Valency result is undesirable, not objective.
The content of the invention
It is a primary object of the present invention to propose that the retrieval result evaluation method and device, computer of a kind of search engine can
Read medium, it is intended to the problem of retrieval result evaluation method for solving existing search engine lacks versatility and objectivity.
To achieve the above object, a kind of retrieval result evaluation method for search engine that the present invention is provided, this method includes
Following steps:
The click data to retrieval result location of content in retrieval result page is obtained, and with the correspondence retrieval result content
The clicking rate of position is position score;
Each retrieval will be obtained by weighing search engine quality index (DCG) evaluation model according to the position score
Measurement search engine quality index (DCG) evaluation score of result page;
According to preceding K (TOPK) searching order results of search engine, searched with reference to the measurement of each retrieval result page
Index holds up quality index (DCG) evaluation score;Obtain the corresponding measurement search engine of described preceding K (TOPK) searching order results
Quality index (DCG) overall assessment fraction.
Further, the retrieval result evaluation method of the search engine is also included from server log file, guest access
Search behavior data are obtained in journal file etc..
Further, the retrieval result evaluation method of the search engine also includes obtaining same from search behavior data
The corresponding retrieval result page of all independent visitors of term.
Further, the retrieval result evaluation method of the search engine also includes K before being obtained from search behavior data
Item (TOPK) searching order result.
First K (TOPK) searching order result is obtained by TOPK algorithms, and search engine can be by journal file independent
Visitor retrieves all retrieval strings used and recorded every time, and the length of each query string is 1-255 bytes.Assuming that having at present
(multiplicity of these query strings is higher, although sum is 1,000 ten thousand, but if except after deduplication, being no more than for 10000000 records
3000000.The multiplicity of one query string is higher, illustrates that the independent visitor for inquiring about it is more that is, more popular.), statistics is most
10 popular query strings, that is to say 10 retrieval results most popular in this search engine.
Further, in the retrieval result evaluation method of the search engine, the calculating process of the position score is as follows:
The corresponding retrieval result page content of same search term of each independent visitor is in same retrieval result location of content
Click-through count once, corresponds to the click accumulated counts of retrieval result content diverse location;
Using clicking rate CTR as the position score, the CTR=numbers of clicks/exposure frequency;Wherein, exposure frequency is
The quantity of retrieval result page, that is to say the corresponding independent visitor's quantity of same search term.
Wherein, UV (independent visitor):That is Unique Visitor, it is a visit to access a computer client of your website
Visitor.00:00-24:Identical client is only calculated only once in 00.
For example, a) UV, same search term result list, it is allowed to click on the position of multiple different retrieval results, but
Click on, only count 1 time for the position of same retrieval result, then correspondence position counts+1 for the position of different retrieval results;
B) using clicking rate CTR as position score, CTR=numbers of clicks/exposure frequency;Search behavior, obtains 10
Individual result, independent visitor A clicks position 2,3,5;Another independent visitor B, clicks position 1,2,3.So point of position 1
Hit rate:1/2, the clicking rate of position 2:2/2, the clicking rate of position 3:1/2, the clicking rate of position 4:0, the clicking rate of position 5:1/2.
Further, from the corresponding click behavior of all search terms of all independent visitors of search behavior data statistics, institute
K (TOPK) searching order results are according to the corresponding clicking rate of click location before stating:
Wherein, i--- represents retrieval result positional number, and k--- represents independent visitor's number, and CTR--- is clicked on
Rate.
Further, according to first K (TOPK) the searching order result, carry out log2's according to retrieval result position i
Decay, corresponding measurement search engine quality index (DCG) overall assessment fraction calculation formula is:
Wherein i--- represents retrieval result positional number, and K--- represents searching order
Preceding K bars result.
Another aspect of the present invention, to achieve the above object, the present invention also propose that a kind of retrieval result of search engine is commented
Valency device, the device includes:
Data acquisition module, the search behavior data for obtaining all independent visitors, according to the search behavior data
K (TOPK) searching order results before obtaining, and same term the corresponding retrieval result page of all independent visitors.
Weigh search engine quality index (DCG) computing module, for according in retrieval result page to retrieval result content
The clicking rate that the click data of position obtains the correspondence retrieval result location of content is position score;And obtained according to the position
Point, the measurement search engine quality of each retrieval result page is obtained by weighing search engine quality index (DCG) evaluation model
Index (DCG) evaluation score;
Meanwhile, according to preceding K (TOPK) searching order results of search engine, with reference to the weighing apparatus of each retrieval result page
Measure search engine quality index (DCG) evaluation score;Corresponding weigh of described preceding K (TOPK) searching order results is obtained to search for
Engine masses index (DCG) overall assessment fraction.
Further, the data acquisition module is obtained solely from server log file, guest access journal file etc.
The search behavior data of vertical visitor.
In addition, to achieve the above object, present invention also offers a kind of computer-readable recording medium, the computer can
Read be stored with storage medium the retrieval result assessment process of search engine, the retrieval result assessment process quilt of the search engine
The step of retrieval result evaluation method such as above-mentioned search engine is realized during computing device:
The click data to retrieval result location of content in retrieval result page is obtained, and with the correspondence retrieval result content
The clicking rate of position is position score;
Each retrieval will be obtained by weighing search engine quality index (DCG) evaluation model according to the position score
Measurement search engine quality index (DCG) evaluation score of result page;
According to preceding K (TOPK) searching order results of search engine, searched with reference to the measurement of each retrieval result page
Index holds up quality index (DCG) evaluation score;Obtain the corresponding measurement search engine of described preceding K (TOPK) searching order results
Quality index (DCG) overall assessment fraction.
The retrieval result evaluation method and device of search engine proposed by the present invention, computer-readable medium are by under traditional wire
Search engine universal search performance measures search engine quality index (DCG) evaluation algorithms and online user's search behavior data
It is combined, optimization obtains on-line search engine evaluation model, user's clicking rate can be changed into directly to search engine measurement
Search engine quality index (DCG) score, the final search effect for evaluating retrieval result is carried out with the behavior of real user.To all
The search behavior that all search terms of user occur is counted, and combines K (TOPK) searching order results before search engine
Carry out weighing search engine quality index (DCG) overall assessment, score is higher, and explanation result is better.
Brief description of the drawings
Fig. 1 is the retrieval result evaluation method FB(flow block) for the first search engine for realizing each embodiment of the invention.
Fig. 2 is the retrieval result evaluation method FB(flow block) for second of search engine for realizing each embodiment of the invention.
Fig. 3 is a kind of retrieval result evaluating apparatus structured flowchart for the search engine for realizing each embodiment of the invention.
Fig. 4 is the structured flowchart for the DCG computing modules for realizing each embodiment of the invention.
Fig. 5 is the retrieval result assessment process action box figure for the search engine for realizing each embodiment of the invention.
Embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
Describe to realize the mobile terminal of each embodiment of the invention referring now to accompanying drawing.In follow-up description, use
For represent element such as " module ", " part " or " unit " suffix only for be conducive to the present invention explanation, itself
Not specific meaning.Therefore, " module " can be used mixedly with " part ".
The principle explanation of the present invention for weighing search engine quality index (DCG) evaluation method:
DCG is English Discounted Cumulative Gain abbreviation, and Chinese can be translated as " weighing search engine matter
Figureofmerit ".Weigh search engine quality index (DCG) method basic thought be:
1. the correlation of every result classifies to weigh;
2. considering the position where result, the more forward then significance level in position is higher;
3. the result position of grade height (i.e. good result) is more forward, then value should be higher, otherwise gives and punishes.
Look first at first:Correlation is classified.Simple statistics " accurate " or " inaccurate when here than calculating Precision
It is really " more fine.
Result can be subdivided into multiple grades by us.
Such as conventional 3 grades:Good (good), Fair (general), Bad (poor).Corresponding score value rel is:Good:3/
Fair:2/Bad:1.
Some more careful assessments use 5 grades of classification:Very Good (substantially good), Good (good), Fair (one
As), Bad (poor), Very Bad (substantially poor), correspondence score value rel can be set to:Very Good:2/Good:1/Fair:
0/Bad:-1/Very Bad:-2.
The standard of evaluation result can determine that the theme that Very Good typically refer to result is complete according to specific application
Total correlation, and web page contents enrich, are of high quality.And specific to every
DCG calculation formula is not unique, and the flatness of logarithm discount factor is only required in theory.Such as following DCG
Formula is more reasonable, highlights correlation, the 1st, the discount factors of 2 articles of results it is also more reasonable:
Now before DCG on 4 positions the discount factor (Discount factor) of result numerical value is as shown in table 1 below is:
Table 1
i | log2(i+1) | 1/log2(i+1) |
1 | 1 | 1 |
2 | 1.59 | 0.63 |
3 | 2 | 0.5 |
4 | 2.32 | 0.43 |
Take and also come from empirical equation with 2 for the log values at bottom, and in the absence of theoretic foundation.In fact, Log base
Number can modify according to smooth demand, when increasing numerical value (such as using log5Instead of log2), discount factor is reduced more
To be rapid, the weight of above result is now highlighted.
For the ease of lateral comparison between different types of query results, based on DCG, some evaluation systems are also right
DCG has carried out normalizing, and these methods are referred to as nDCG (i.e. normalize DCG).The most frequently used computational methods be by divided by
Ideal value iDCG (ideal DCG) of each inquiry carries out normalizing, and formula is:
Ask nDCG to need to calibrate the iDCG of ideal situation, be abnormal difficult when practical operation, because everyone
Often different is understood to " best result ", it is highly difficult task that optimal result is selected in mass data, but ratio
Compared with two groups of results, which is more preferably usually easier, so in practical application, the method for generally selecting Comparative result is estimated.
Embodiment 1
Based on above-mentioned measurement search engine quality index (DCG) evaluation method, each embodiment of the inventive method is proposed.
As shown in figure 1, first embodiment of the invention proposes a kind of retrieval result evaluation method of search engine, this method bag
Include following steps:
To the click data of retrieval result location of content in S101, acquisition retrieval result page, and tied with the correspondence retrieval
The clicking rate of fruit location of content is position score;
S102, it will obtain each by weighing search engine quality index (DCG) evaluation model according to the position score
Measurement search engine quality index (DCG) evaluation score of retrieval result page;
S103, preceding K (TOPK) searching order results according to search engine, with reference to the weighing apparatus of each retrieval result page
Measure search engine quality index (DCG) evaluation score;Corresponding weigh of described preceding K (TOPK) searching order results is obtained to search for
Engine masses index (DCG) overall assessment fraction.
Measurement search engine matter is used as using the position clicking rate in independent visitor's search behavior data in above-mentioned evaluation method
Test colleague during figureofmerit (DCG) is evaluated gives a mark:Behavior is clicked on as model using independent visitor, independent visitor is to retrieval result page
When content is clicked on, represent to be satisfied with the location retrieval result;Most independent visitors are higher to the clicking rate of same position,
Think that the retrieval result quality of the position is preferable;To the resource of sequence on the lower, calculate timesharing and give score decay, because sequence
Better search engine, former resource searching quality are better;The search row that all search terms of all independent visitors occur
To be counted, and combine K (TOPK) searching order results progress measurement search engine quality index (DCG) before search engine
Overall assessment, score is higher, and explanation result is better.
As shown in Fig. 2 first embodiment of the invention proposes the retrieval result evaluation method of second of search engine, this method
Comprise the following steps:
S201, the acquisition search behavior data from server log file, guest access journal file etc.;From search behavior
All independent corresponding retrieval result pages of visitor and first K (TOPK) searching order knot of same term are obtained in data
Really;
To the click data of retrieval result location of content in S202, acquisition retrieval result page, and tied with the correspondence retrieval
The clicking rate of fruit location of content is position score;
S203, it will obtain each by weighing search engine quality index (DCG) evaluation model according to the position score
Measurement search engine quality index (DCG) evaluation score of retrieval result page;
S204, preceding K (TOPK) searching order results according to search engine, with reference to the weighing apparatus of each retrieval result page
Measure search engine quality index (DCG) evaluation score;Corresponding weigh of described preceding K (TOPK) searching order results is obtained to search for
Engine masses index (DCG) overall assessment fraction.
It can be obtained for network user's row from server databases such as server log file, guest access journal files
For user's operation behavior data of analysis, which search term such as user has used, and obtains which retrieval result page content,
Result to which position of retrieval result page content is clicked on, then by user's operation behavior data of all users
Analyzed, the retrieval result of K (TOPK) before resulting in, the most popular term, for a certain term, such as hand
Machine, should include the near synonym of all terms for including mobile phone, such as smart mobile phone, and mobile phone, different translations etc..
First K (TOPK) searching order result is obtained by TOPK algorithms, and search engine can be by journal file independent
Visitor retrieves all retrieval strings used and recorded every time, and the length of each query string is 1-255 bytes.Assuming that having at present
(multiplicity of these query strings is higher, although sum is 1,000 ten thousand, but if except after deduplication, being no more than for 10000000 records
3000000.The multiplicity of one query string is higher, illustrates that the user for inquiring about it is more that is, more popular.), count most popular
10 query strings, that is to say 10 retrieval results most popular in this search engine.
Wherein, the calculating process of the position score is as follows:
The corresponding retrieval result page content of same search term of each independent visitor is in same retrieval result location of content
Click-through count once, corresponds to the click accumulated counts of retrieval result content diverse location;
Using clicking rate CTR as the position score, the CTR=numbers of clicks/exposure frequency;Wherein, exposure frequency is
The quantity of retrieval result page, that is to say the corresponding independent visitor's quantity of same search term.
Wherein, UV (independent visitor):That is Unique Visitor, it is a visit to access a computer client of your website
Visitor.00:00-24:Identical client is only calculated only once in 00.
For example, a) UV, same search term result list, it is allowed to click on the position of multiple different retrieval results, but
Click on, only count 1 time for the position of same retrieval result, then correspondence position counts+1 for the position of different retrieval results;
B) using clicking rate CTR as position score, CTR=numbers of clicks/exposure frequency;Search behavior, obtains 10
Individual result, independent visitor A clicks position 2,3,5;Another independent visitor B, clicks position 1,2,3.So point of position 1
Hit rate:1/2, the clicking rate of position 2:2/2, the clicking rate of position 3:1/2, the clicking rate of position 4:0, the clicking rate of position 5:1/2.
Further, from the corresponding click behavior of all search terms of all independent visitors of search behavior data statistics, institute
K (TOPK) searching order results are according to the corresponding clicking rate of click location before stating:
Wherein, i--- represents retrieval result positional number, and k--- represents independent visitor's number, and CTR--- is clicked on
Rate.
For example, being calculated according to above-mentioned first K (TOPK) searching order result according to the corresponding position clicking rate of click location
Formula, the results list as shown in table 2 below.
Table 2
Position i | CTR |
Position 1 | 20% |
Position 2 | 50% |
Position 3 | |
Position 4 | |
… | |
Position K |
Wherein, according to first K (TOPK) the searching order result, log2 decay is carried out according to retrieval result position i,
It is corresponding measurement search engine quality index (DCG) overall assessment fraction calculation formula be:
Wherein i--- represents retrieval result positional number, and K--- is represented before searching order
K bar results.
Embodiment 2
Another aspect of the present invention, to achieve the above object, as shown in figure 3, the present invention also proposes a kind of search engine
Retrieval result evaluating apparatus, the device includes:
Data acquisition module 200, the search behavior data for obtaining all independent visitors, according to the search behavior number
According to the corresponding retrieval result page of all independent visitors of K (TOPK) searching order results before obtaining, and same term.
Weigh search engine quality index (DCG) computing module 300, for according in retrieval result page in retrieval result
The clicking rate for holding the click data acquisition correspondence retrieval result location of content of position is position score;And according to the position
Score, the measurement search engine matter of each retrieval result page is obtained by weighing search engine quality index (DCG) evaluation model
Figureofmerit (DCG) evaluation score;
Meanwhile, according to preceding K (TOPK) searching order results of search engine, with reference to the weighing apparatus of each retrieval result page
Measure search engine quality index (DCG) evaluation score;Corresponding weigh of described preceding K (TOPK) searching order results is obtained to search for
Engine masses index (DCG) overall assessment fraction.
Wherein, the device also includes DCG fractions output module 400, online for DCG computing modules 300 to be obtained into result
It is output to operation interface so that user can intuitively obtain the evaluation result of search engine.
Wherein, the data acquisition module 200 obtains user from server log file, user access logses file etc.
Search behavior data, these journal files are stored in server database 100, with network user's operation behavior analysis system
It is used in conjunction with same database.
As shown in figure 4, weighing search engine quality index (DCG) computing module 300 includes the He of clicking rate computing unit 310
DCG evaluation scores computing unit 320, clicking rate computing unit 310 be used for according in retrieval result page to retrieval result content position
The click data put obtains the clicking rate of the correspondence retrieval result location of content, and DCG evaluation scores computing unit 320 is used for root
According to preceding K (TOPK) searching order results of search engine, with reference to the measurement search engine quality of each retrieval result page
Index (DCG) evaluation score;Obtain the corresponding measurement search engine quality index of described preceding K (TOPK) searching order results
(DCG) overall assessment fraction.
Embodiment 3
In addition, to achieve the above object, present invention also offers a kind of computer-readable recording medium, the computer can
Read be stored with storage medium the retrieval result assessment process of search engine, the retrieval result assessment process quilt of the search engine
The step of retrieval result evaluation method such as above-mentioned search engine is realized during computing device:
The click data to retrieval result location of content in retrieval result page is obtained, and with the correspondence retrieval result content
The clicking rate of position is position score;
Each retrieval will be obtained by weighing search engine quality index (DCG) evaluation model according to the position score
Measurement search engine quality index (DCG) evaluation score of result page;
According to preceding K (TOPK) searching order results of search engine, searched with reference to the measurement of each retrieval result page
Index holds up quality index (DCG) evaluation score;Obtain the corresponding measurement search engine of described preceding K (TOPK) searching order results
Quality index (DCG) overall assessment fraction.
Specifically, as shown in figure 5, the online general assessment process implementation procedure of the search engine is as follows:
The server of search engine gathers the click behavior of independent visitor in real time, user's search behavior data is obtained, with one
The fixed time cycle stores these user's search behavior data, such as 1 day, 1 week or 1 month.
It is stored in the journal file of the journal file of server or visitor in server database, user's search behavior
The storage mode of data depends on the mode of collection of server network user's operation behavior data.
From user's search behavior data, the corresponding term of each independent visitor can be obtained, in the retrieval result page
Hold, click location information of the user in retrieval result page content.Based on same term, apposition exists including near synonym etc.
Interior each corresponding retrieval result page content, by the click location information of counting user, can obtain TOPK retrievals
As a result, that is, each term correspondence come before K bars retrieval result.
According to click location, the corresponding position of TOPK retrieval results resulted in retrieval result page content is clicked on
Rate.
The corresponding position clicking rate of TOPK retrieval results calculates the DCG evaluation scores for obtaining search engine, divides when DCG is evaluated
Number is higher, illustrates that the retrieval result accuracy of search engine is higher.
The retrieval result evaluation method and device of search engine proposed by the present invention, computer-readable medium are by under traditional wire
Search engine universal search performance measures search engine quality index (DCG) evaluation algorithms and online user's search behavior data
It is combined, optimization obtains on-line search engine evaluation model, user's clicking rate can be changed into directly to search engine measurement
Search engine quality index (DCG) score, the final search effect for evaluating retrieval result is carried out with the behavior of real user.To all
The search behavior that all search terms of user occur is counted, and combines K (TOPK) searching order results before search engine
Carry out weighing search engine quality index (DCG) overall assessment, score is higher, and explanation result is better.
It should be noted that herein, term " comprising ", "comprising" or its any other variant are intended to non-row
His property is included, so that process, method, article or device including a series of key elements not only include those key elements, and
And also including other key elements being not expressly set out, or also include for this process, method, article or device institute inherently
Key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including this
Also there is other identical element in process, method, article or the device of key element.
The embodiments of the present invention are for illustration only, and the quality of embodiment is not represented.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Understood based on such, technical scheme is substantially done to prior art in other words
Going out the part of contribution can be embodied in the form of software product, and the computer software product is stored in a storage medium
In (such as ROM/RAM, magnetic disc, CD), including some instructions are to cause a station terminal equipment (can be mobile phone, computer, clothes
It is engaged in device, air conditioner, or network equipment etc.) perform method described in each embodiment of the invention.
The preferred embodiments of the present invention are these are only, are not intended to limit the scope of the invention, it is every to utilize this hair
Equivalent structure or equivalent flow conversion that bright specification and accompanying drawing content are made, or directly or indirectly it is used in other related skills
Art field, is included within the scope of the present invention.
Claims (10)
1. the retrieval result evaluation method of a kind of search engine, it is characterised in that the described method comprises the following steps:
The click data to retrieval result location of content in retrieval result page is obtained, and with the correspondence retrieval result location of content
Clicking rate be position score;
The weighing apparatus of each retrieval result page will be obtained by weighing search engine quality index evaluation model according to the position score
Measure search engine quality index evaluation score;
According to preceding K searching order results of search engine, with reference to the measurement search engine quality of each retrieval result page
Metrics evaluation fraction;Obtain the corresponding measurement search engine quality index overall assessment fraction of the preceding K searching order results.
2. the retrieval result evaluation method of a kind of search engine according to claim 1, it is characterised in that methods described is also
Including obtaining search behavior data from server log file, guest access journal file.
3. the retrieval result evaluation method of a kind of search engine according to claim 2, it is characterised in that methods described is also
The corresponding retrieval result page of all independent visitors including obtaining same term from search behavior data.
4. the retrieval result evaluation method of a kind of search engine according to claim 2, it is characterised in that methods described is also
K searching order results before including being obtained from search behavior data.
5. the retrieval result evaluation method of a kind of search engine according to claim 1, it is characterised in that the search is drawn
In the retrieval result evaluation method held up, the calculating process of the position score is as follows:
Click of the corresponding retrieval result page content of same search term of each independent visitor in same retrieval result location of content
Count once, the click accumulated counts of correspondence retrieval result content diverse location;
Using clicking rate CTR as the position score, the CTR=numbers of clicks/exposure frequency;Wherein, exposure frequency is retrieval
The quantity of result page, that is to say the corresponding independent visitor's quantity of same search term.
6. the retrieval result evaluation method of a kind of search engine according to claim 5, it is characterised in that from search behavior
The corresponding click behavior of all search terms of all independent visitors of data statistics, the preceding K searching order result is according to click
The corresponding clicking rate in position is:
Wherein, i--- represents retrieval result positional number, and k--- represents independent visitor's number, CTR--- clicking rates.
7. the retrieval result evaluation method of a kind of search engine according to claim 6, it is characterised in that before described
K searching order results, log2 decay is carried out according to retrieval result position i, and corresponding measurement search engine quality index is total
Body evaluation score calculation formula is:
Wherein i--- represents retrieval result positional number, and K--- represents K bars before searching order
As a result.
8. the retrieval result evaluating apparatus of a kind of search engine, it is characterised in that described device includes:
Data acquisition module, the search behavior data for obtaining all independent visitors are obtained according to the search behavior data
Preceding K searching order result, and same term the corresponding retrieval result page of all independent visitors;
Search engine quality index computing module is weighed, for according to the click in retrieval result page to retrieval result location of content
The clicking rate that data obtain the correspondence retrieval result location of content is position score;And according to the position score, pass through weighing apparatus
Measure the measurement search engine quality index evaluation score that search engine quality index evaluation model obtains each retrieval result page;
Meanwhile, according to preceding K searching order results of search engine, with reference to the measurement search engine of each retrieval result page
Quality index evaluation score;Obtain the corresponding measurement search engine quality index overall assessment of the preceding K searching order results
Fraction.
9. the retrieval result evaluating apparatus of a kind of search engine according to claim 8, it is characterised in that the data are obtained
Modulus block obtains the search behavior data of independent visitor from server log file, guest access journal file.
10. a kind of computer-readable recording medium, it is characterised in that the search that is stored with the computer-readable recording medium is drawn
The retrieval result assessment process held up, realizes that right such as will when the retrieval result assessment process of the search engine is executed by processor
The step of seeking the retrieval result evaluation method of search engine described in any one of 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710293371.7A CN107122467B (en) | 2017-04-26 | 2017-04-26 | Search engine retrieval result evaluation method and device and computer readable medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710293371.7A CN107122467B (en) | 2017-04-26 | 2017-04-26 | Search engine retrieval result evaluation method and device and computer readable medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107122467A true CN107122467A (en) | 2017-09-01 |
CN107122467B CN107122467B (en) | 2020-12-29 |
Family
ID=59726440
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710293371.7A Active CN107122467B (en) | 2017-04-26 | 2017-04-26 | Search engine retrieval result evaluation method and device and computer readable medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107122467B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107885857A (en) * | 2017-11-17 | 2018-04-06 | 山东师范大学 | A kind of search results pages user's behavior pattern mining method, apparatus and system |
CN108460085A (en) * | 2018-01-19 | 2018-08-28 | 北京奇艺世纪科技有限公司 | A kind of video search sequence training set construction method and device based on user journal |
CN109190129A (en) * | 2018-08-31 | 2019-01-11 | 传神语联网网络科技股份有限公司 | A kind of multilingual translation quality evaluation engine based near synonym knowledge mapping |
CN110580322A (en) * | 2019-09-18 | 2019-12-17 | 北京百度网讯科技有限公司 | Independent visitor information processing method and device, electronic equipment and storage medium |
CN110674400A (en) * | 2019-09-18 | 2020-01-10 | 北京字节跳动网络技术有限公司 | Sorting method, sorting device, electronic equipment and computer-readable storage medium |
CN110941786A (en) * | 2018-09-21 | 2020-03-31 | 广州神马移动信息科技有限公司 | Method and device for monitoring search effect |
CN111061942A (en) * | 2018-10-17 | 2020-04-24 | 阿里巴巴集团控股有限公司 | Search ranking monitoring method and system |
CN111367778A (en) * | 2020-03-13 | 2020-07-03 | 百度在线网络技术(北京)有限公司 | Data analysis method and device for evaluating search strategy |
CN111612658A (en) * | 2020-05-29 | 2020-09-01 | 北京华宇元典信息服务有限公司 | Evaluation method and evaluation device for legal data retrieval and electronic equipment |
CN113010776A (en) * | 2021-03-03 | 2021-06-22 | 昆明理工大学 | Monroe rule-based meta-search sorting Top-k polymerization method |
CN113065065A (en) * | 2021-03-30 | 2021-07-02 | 广联达科技股份有限公司 | Method, device and equipment for evaluating search performance and readable storage medium |
CN113220967A (en) * | 2021-05-11 | 2021-08-06 | 北京百度网讯科技有限公司 | Method and device for measuring ecological health degree of Internet environment and electronic equipment |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1924855A (en) * | 2006-09-18 | 2007-03-07 | 浙江大学 | Arrangement method in image search engine |
CN101055587A (en) * | 2007-05-25 | 2007-10-17 | 清华大学 | Search engine retrieving result reordering method based on user behavior information |
CN103064852A (en) * | 2011-10-20 | 2013-04-24 | 阿里巴巴集团控股有限公司 | Website statistical information processing method and website statistical information processing system |
CN103646092A (en) * | 2013-12-18 | 2014-03-19 | 孙燕群 | SE (search engine) ordering method based on user participation |
CN104063523A (en) * | 2014-07-21 | 2014-09-24 | 焦点科技股份有限公司 | E-commerce search scoring and ranking method and system |
CN104636407A (en) * | 2013-11-15 | 2015-05-20 | 腾讯科技(深圳)有限公司 | Parameter choice training and search request processing method and device |
CN105808590A (en) * | 2014-12-31 | 2016-07-27 | 中国电信股份有限公司 | Search engine realization method as well as search method and apparatus |
US20170091189A1 (en) * | 2015-09-28 | 2017-03-30 | Yahoo! Inc. | Location-sensitive ranking for search and related techniques |
-
2017
- 2017-04-26 CN CN201710293371.7A patent/CN107122467B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1924855A (en) * | 2006-09-18 | 2007-03-07 | 浙江大学 | Arrangement method in image search engine |
CN101055587A (en) * | 2007-05-25 | 2007-10-17 | 清华大学 | Search engine retrieving result reordering method based on user behavior information |
CN103064852A (en) * | 2011-10-20 | 2013-04-24 | 阿里巴巴集团控股有限公司 | Website statistical information processing method and website statistical information processing system |
CN104636407A (en) * | 2013-11-15 | 2015-05-20 | 腾讯科技(深圳)有限公司 | Parameter choice training and search request processing method and device |
CN103646092A (en) * | 2013-12-18 | 2014-03-19 | 孙燕群 | SE (search engine) ordering method based on user participation |
CN104063523A (en) * | 2014-07-21 | 2014-09-24 | 焦点科技股份有限公司 | E-commerce search scoring and ranking method and system |
CN105808590A (en) * | 2014-12-31 | 2016-07-27 | 中国电信股份有限公司 | Search engine realization method as well as search method and apparatus |
US20170091189A1 (en) * | 2015-09-28 | 2017-03-30 | Yahoo! Inc. | Location-sensitive ranking for search and related techniques |
Non-Patent Citations (3)
Title |
---|
余锦秀: ""基于用户行为分析的搜索引擎自动评价技术研究"", 《中国优秀硕士学位论文全文数据库信息科技辑(月刊)》 * |
王果等: ""基于双向热点跟踪的搜索引擎优化模型"", 《计算机应用与软件》 * |
邓晓妹 等: ""基于点击日志的搜索引擎用户满意度评价研究"", 《计算机工程与应用》 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107885857B (en) * | 2017-11-17 | 2019-02-12 | 山东师范大学 | A kind of search results pages user's behavior pattern mining method, apparatus and system |
CN107885857A (en) * | 2017-11-17 | 2018-04-06 | 山东师范大学 | A kind of search results pages user's behavior pattern mining method, apparatus and system |
CN108460085A (en) * | 2018-01-19 | 2018-08-28 | 北京奇艺世纪科技有限公司 | A kind of video search sequence training set construction method and device based on user journal |
CN109190129A (en) * | 2018-08-31 | 2019-01-11 | 传神语联网网络科技股份有限公司 | A kind of multilingual translation quality evaluation engine based near synonym knowledge mapping |
CN110941786A (en) * | 2018-09-21 | 2020-03-31 | 广州神马移动信息科技有限公司 | Method and device for monitoring search effect |
CN111061942A (en) * | 2018-10-17 | 2020-04-24 | 阿里巴巴集团控股有限公司 | Search ranking monitoring method and system |
CN111061942B (en) * | 2018-10-17 | 2023-04-18 | 阿里巴巴集团控股有限公司 | Search ranking monitoring method and system |
CN110674400B (en) * | 2019-09-18 | 2022-05-10 | 北京字节跳动网络技术有限公司 | Sorting method, sorting device, electronic equipment and computer-readable storage medium |
CN110580322A (en) * | 2019-09-18 | 2019-12-17 | 北京百度网讯科技有限公司 | Independent visitor information processing method and device, electronic equipment and storage medium |
CN110674400A (en) * | 2019-09-18 | 2020-01-10 | 北京字节跳动网络技术有限公司 | Sorting method, sorting device, electronic equipment and computer-readable storage medium |
CN111367778A (en) * | 2020-03-13 | 2020-07-03 | 百度在线网络技术(北京)有限公司 | Data analysis method and device for evaluating search strategy |
CN111367778B (en) * | 2020-03-13 | 2023-07-07 | 百度在线网络技术(北京)有限公司 | Data analysis method and device for evaluating search strategy |
CN111612658A (en) * | 2020-05-29 | 2020-09-01 | 北京华宇元典信息服务有限公司 | Evaluation method and evaluation device for legal data retrieval and electronic equipment |
CN113010776A (en) * | 2021-03-03 | 2021-06-22 | 昆明理工大学 | Monroe rule-based meta-search sorting Top-k polymerization method |
CN113010776B (en) * | 2021-03-03 | 2022-12-09 | 昆明理工大学 | Meta-search sequencing Top-k polymerization method based on Monroe rule |
CN113065065A (en) * | 2021-03-30 | 2021-07-02 | 广联达科技股份有限公司 | Method, device and equipment for evaluating search performance and readable storage medium |
CN113220967A (en) * | 2021-05-11 | 2021-08-06 | 北京百度网讯科技有限公司 | Method and device for measuring ecological health degree of Internet environment and electronic equipment |
CN113220967B (en) * | 2021-05-11 | 2023-09-22 | 北京百度网讯科技有限公司 | Ecological health degree measuring method and device for Internet environment and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN107122467B (en) | 2020-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107122467A (en) | The retrieval result evaluation method and device of a kind of search engine, computer-readable medium | |
CN105701216B (en) | A kind of information-pushing method and device | |
CN107391687B (en) | Local log website-oriented hybrid recommendation system | |
CN106372249B (en) | A kind of clicking rate predictor method, device and electronic equipment | |
US8380694B2 (en) | Method and system for aggregating reviews and searching within reviews for a product | |
US8190556B2 (en) | Intellegent data search engine | |
CN101355457B (en) | Test method and test equipment | |
KR100863990B1 (en) | Advertising System and method using category | |
CN109190043A (en) | Recommended method and device, storage medium, electronic equipment and recommender system | |
KR100930786B1 (en) | Ad list generation method and system | |
CN105765573A (en) | Improvements in website traffic optimization | |
CN102841946A (en) | Commodity data retrieval sequencing and commodity recommendation method and system | |
KR20090033989A (en) | Method for advertising local information based on location information and system for executing the method | |
CN103902597A (en) | Method and device for determining search relevant categories corresponding to target keywords | |
CN110334356A (en) | Article matter method for determination of amount, article screening technique and corresponding device | |
CN107153656A (en) | A kind of information search method and device | |
CN114238573B (en) | Text countercheck sample-based information pushing method and device | |
CN103729365A (en) | Searching method and system | |
CN111724238A (en) | Method, device and equipment for evaluating product recommendation accuracy and storage medium | |
CN108920479B (en) | Cross-information-source account recommendation method for two micro terminals | |
KR20100021888A (en) | A profit distribution system for content provider and method thereof | |
CN106919588A (en) | A kind of application program search system and method | |
CN109558544A (en) | Sort method and device, server and storage medium | |
CN112487283A (en) | Method and device for training model, electronic equipment and readable storage medium | |
JP2006318398A (en) | Vector generation method and device, information classifying method and device, and program, and computer readable storage medium with program stored therein |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |