WO2016115944A1 - 网页质量模型的建立方法及装置 - Google Patents

网页质量模型的建立方法及装置 Download PDF

Info

Publication number
WO2016115944A1
WO2016115944A1 PCT/CN2015/096036 CN2015096036W WO2016115944A1 WO 2016115944 A1 WO2016115944 A1 WO 2016115944A1 CN 2015096036 W CN2015096036 W CN 2015096036W WO 2016115944 A1 WO2016115944 A1 WO 2016115944A1
Authority
WO
WIPO (PCT)
Prior art keywords
webpage
quality
score
user behavior
subunit
Prior art date
Application number
PCT/CN2015/096036
Other languages
English (en)
French (fr)
Inventor
陈子牛
Original Assignee
广州神马移动信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州神马移动信息科技有限公司 filed Critical 广州神马移动信息科技有限公司
Priority to RU2017129409A priority Critical patent/RU2680746C2/ru
Publication of WO2016115944A1 publication Critical patent/WO2016115944A1/zh
Priority to US15/653,780 priority patent/US10891350B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/954Navigation, e.g. using categorised browsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9574Browsing optimisation, e.g. caching or content distillation of access to content, e.g. by caching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents

Definitions

  • the present invention relates to the field of network technologies, and in particular, to a method and an apparatus for establishing a webpage quality model.
  • the user can input a keyword in the search engine
  • the search engine sends the keyword input by the user to the server
  • the server searches for the webpage corresponding to the keyword, and then sorts the searched webpage and feeds back to the search engine, so as to facilitate the user.
  • the server sorts the searched webpages according to the relevance and the quality of the webpage as much as possible.
  • the quality of the webpage is an important factor affecting the ranking of webpages.
  • the quality of the webpage is usually obtained according to the webpage quality model, and the accuracy of the webpage quality model directly affects the webpage sorting result and the user experience.
  • the existing method for establishing a web page quality model is to manually summarize a plurality of manual rules from a limited sample, for example, by observing hundreds or thousands of web pages to summarize features affecting web page quality, each feature can be used as a manual rule. Then combine these manual rules to get the page quality model.
  • this method since the number of observed samples is very limited, the accuracy of the established webpage quality model is poor, resulting in poor quality of the calculated webpage quality, thereby affecting the ranking result and user experience of the webpage.
  • the embodiment of the invention provides a method and a device for establishing a webpage quality model, which are used to solve the problem that the accuracy of the established webpage quality model existing in the prior art is poor.
  • a method for establishing a webpage quality model including:
  • a webpage quality model is established based on webpage quality and selected quality characteristics of each webpage included in the search engine log.
  • the selected user behavior indicator includes at least one or a combination of total clicks, long clicks, last clicks, and navigation clicks, wherein:
  • the total click volume is the number of times the webpage is clicked, and the long click volume is the number of times the webpage is clicked and the staying time on the webpage exceeds the first set duration, and the last click volume is the webpage in the search result.
  • the number of times the last click was made which is the number of times the web page was uniquely clicked in the search results.
  • the quality of the webpage of the corresponding webpage is calculated according to the selected user behavior index of each webpage that is excavated, and specifically includes:
  • the quality of the webpage corresponding to the user behavior ratio of the current webpage is determined according to the correspondence between the range of the user behavior ratio and the webpage quality.
  • the user behavior ratio of the current webpage is calculated according to the total number of clicks, long clicks, last clicks, and navigation clicks of the current webpage, and specifically includes:
  • it also includes:
  • the webpage included in the search engine log is filtered according to the webpage quality and the selected user behavior indicator, and then according to the webpage quality
  • the filtered search engine log includes the webpage quality of the webpage and the selected quality characteristics to establish a webpage quality model.
  • the webpage included in the search engine log is filtered according to the webpage quality and the selected user behavior indicator, and specifically includes:
  • the webpage with the smallest webpage quality is retained, and the webpage other than the retained webpage is deleted;
  • the webpage with the highest webpage quality is retained, and the webpage other than the retained webpage is deleted.
  • the selected quality feature comprises at least one or a combination of a user behavior dimension feature, a webpage dimension feature, and a third party evaluation feature.
  • it also includes:
  • the existing webpage ranking model is modified according to the comprehensive score and the webpage quality of the selected webpage, and a new webpage ranking model is obtained.
  • calculating a text score of the selected webpage specifically includes:
  • the degree of matching is determined as the text score of the selected web page.
  • the comprehensive score of the selected webpage is calculated according to the webpage quality and the text score of the selected webpage, and specifically includes:
  • the calculation of the escape penalty penalty score of the selected webpage according to the text score of the selected webpage includes:
  • a method for evaluating a webpage quality including:
  • the evaluation of the quality of the webpage is implemented according to the size of the comprehensive score of the selected webpage.
  • calculating a text score of the selected webpage includes:
  • the degree of matching is determined as the text score of the selected web page.
  • calculating a comprehensive score of the selected webpage according to the webpage quality and the text score of the selected webpage including:
  • calculating the escaping penalty of the selected webpage according to the text score of the selected webpage Points including:
  • the selected quality feature comprises at least one or a combination of a user behavior dimension feature, a webpage dimension feature, and a third party evaluation feature.
  • the method further comprises: modifying an existing webpage ranking model according to the comprehensive score of the selected webpage and the webpage quality, and obtaining a new webpage ranking model to sort the search results.
  • a device for establishing a webpage quality model including:
  • a webpage quality calculation unit configured to mine, from a search engine log, a selected user behavior indicator of each webpage included in the search engine log, and calculate a webpage quality of the corresponding webpage according to the selected user behavior index of each webpage that is mined ;
  • a selected quality feature extraction unit configured to extract, from the search engine log, selected quality features of each webpage included in the search engine log
  • the webpage quality model establishing unit is configured to establish a webpage quality model according to the webpage quality and the selected quality feature of each webpage included in the search engine log.
  • the selected user behavior indicator includes at least one or a combination of total clicks, long clicks, last clicks, and navigation clicks, wherein:
  • the total click volume is the number of times the webpage is clicked, and the long click volume is the number of times the webpage is clicked and the staying time on the webpage exceeds the first set duration, and the last click volume is the webpage in the search result.
  • the number of times the last click was made which is the number of times the web page was uniquely clicked in the search results.
  • the webpage quality calculation unit specifically includes a user behavior ratio calculation subunit and a webpage quality determination subunit; wherein
  • the user behavior ratio calculation sub-unit is configured to: for each webpage, perform: calculating a user behavior ratio of the current webpage according to a total click volume, a long click volume, a last click volume, and a navigation click volume of the current webpage;
  • the webpage quality determining sub-unit is configured to determine a webpage quality corresponding to a user behavior ratio of the current webpage according to a correspondence between a range of a user behavior ratio and a webpage quality.
  • the user behavior ratio calculation subunit specifically includes a first sum value calculation subunit, a second sum value calculation subunit, and a user behavior ratio determination subunit;
  • the first sum value calculation subunit is configured to calculate a sum of a last click amount, a navigation click amount, and a long click amount of the current webpage to obtain a first sum value
  • the second sum value calculation subunit is configured to calculate a sum of a total click amount of the current webpage and a first experience value to obtain a second sum value
  • the user behavior ratio determining subunit is configured to calculate a ratio of the first sum value and the second sum value, and determine the ratio as a user behavior ratio of the current webpage.
  • a web filtering unit is further included for:
  • Filtering webpages included in the search engine log according to webpage quality and selected user behavior indicators
  • the webpage quality model establishing unit is configured to establish a webpage quality model according to the webpage quality and the selected quality feature of the webpage included in the filtered search engine log.
  • the webpage filtering unit specifically includes a total click volume obtaining subunit and a webpage filtering subunit; wherein
  • the total click volume obtaining subunit is configured to obtain a total amount of clicks of each webpage included in the search engine log;
  • the webpage filtering sub-unit is configured to delete a webpage whose total click volume is less than or equal to the first set number of times; for a webpage whose total click volume is greater than the first set number of times and less than or equal to the second set number of times, the webpage is reserved.
  • the webpage with the smallest quality deletes the webpage except the reserved webpage; for the webpage whose total click volume is larger than the second set number of times, the webpage with the highest webpage quality is retained, and the webpage other than the retained webpage is deleted.
  • the selected quality feature comprises at least one or a combination of a user behavior dimension feature, a webpage dimension feature, and a third party evaluation feature.
  • the method further includes: a selected quality feature substitution unit, a text score calculation unit, a comprehensive score calculation unit, and a webpage sorting model correction unit; wherein
  • the selected quality feature substituting unit is configured to substitute the selected quality feature of the selected webpage in the webpage full set into the webpage quality model to obtain the webpage quality of the selected webpage;
  • the text score calculation unit is configured to calculate a text score of the selected webpage
  • the comprehensive score calculation unit is configured to calculate a comprehensive score of the selected webpage according to the webpage quality and the text score of the selected webpage;
  • the webpage sorting model correction unit is configured to correct an existing webpage sorting model according to the comprehensive score of the selected webpage and the webpage quality, and obtain a new webpage sorting model.
  • the text score calculation unit specifically includes a search request acquisition subunit, a matching degree calculation subunit, and a text score determination subunit; wherein
  • the search request acquisition subunit is configured to obtain a search request corresponding to the selected webpage
  • the matching degree calculation subunit is configured to calculate a matching degree between the text content of the selected webpage and the search request corresponding to the selected webpage;
  • the text score determining subunit is configured to determine the matching degree as a text score of the selected webpage.
  • the comprehensive score calculation unit specifically includes a normalization subunit, a escape penalty score calculation subunit, and a comprehensive score calculation subunit; wherein
  • the normalization subunit is configured to normalize the quality of the webpage of the selected webpage
  • the escape penalty score calculation subunit configured to calculate a text score according to the selected webpage Deducting the penalty score for the selected web page
  • the comprehensive score calculation sub-unit is configured to multiply the escape penalty score of the selected webpage by a text score, and then add a set floating point number, and normalize the obtained sum value with the selected webpage.
  • the page quality is multiplied to obtain a composite score for the selected web page.
  • the escape penalty score calculation subunit specifically includes a text score judgment subunit and a escape penalty score determination subunit; wherein
  • the text score determining subunit is configured to determine whether a text score of the selected webpage is greater than a first set value
  • the escape penalty score determining subunit configured to determine that the escape penalty penalty score of the selected webpage is equal to the second set value if the text score of the selected webpage is greater than or equal to the first set value If the text score of the selected webpage is less than the first set value, determining that the escape penalty score of the selected webpage is equal to a ratio of the text score of the selected webpage to the first set value .
  • a webpage quality evaluation apparatus comprising: the foregoing webpage quality model establishing apparatus, a selected quality feature substituting unit, a text score calculating unit, a comprehensive score calculating unit, and an evaluating unit; among them,
  • the selected quality feature substituting unit is configured to substitute the selected quality feature of the selected webpage in the webpage full set into the webpage quality model to obtain the webpage quality of the selected webpage;
  • the text score calculation unit is configured to calculate a text score of the selected webpage
  • the comprehensive score calculation unit is configured to calculate a comprehensive score of the selected webpage according to the webpage quality and the text score of the selected webpage;
  • the evaluation unit is configured to evaluate the quality of the webpage according to the size of the comprehensive score of the selected webpage.
  • the text score calculation unit specifically includes a search request acquisition subunit, a matching degree calculation subunit, and a text score determination subunit; wherein
  • the search request acquisition subunit is configured to obtain a search request corresponding to the selected webpage
  • the matching degree calculation subunit is configured to calculate a matching degree between the text content of the selected webpage and the search request corresponding to the selected webpage;
  • the text score determining subunit is configured to determine the matching degree as a text score of the selected webpage.
  • the comprehensive score calculation unit specifically includes a normalization subunit, a escape penalty score calculation subunit, and a comprehensive score calculation subunit; wherein
  • the normalization subunit is configured to normalize the quality of the webpage of the selected webpage
  • the escape penalty score calculation subunit is configured to calculate an escape penalty score of the selected webpage according to the text score of the selected webpage;
  • the comprehensive score calculation sub-unit is configured to multiply the escape penalty score of the selected webpage by a text score, and then add a set floating point number, and normalize the obtained sum value with the selected webpage.
  • the page quality is multiplied to obtain a composite score for the selected web page.
  • the escape penalty score calculation subunit specifically includes a text score judgment subunit and a escape penalty score determination subunit; wherein
  • the text score determining subunit is configured to determine whether a text score of the selected webpage is greater than a first set value
  • the escape penalty score determining subunit configured to determine that the escape penalty penalty score of the selected webpage is equal to the second set value if the text score of the selected webpage is greater than or equal to the first set value If the text score of the selected webpage is less than the first set value, determining that the escape penalty score of the selected webpage is equal to a ratio of the text score of the selected webpage to the first set value .
  • the selected quality feature comprises at least one or a combination of a user behavior dimension feature, a webpage dimension feature, and a third party evaluation feature.
  • the method further includes: a webpage sorting model correction unit, configured to correct an existing webpage sorting model according to the comprehensive score of the selected webpage and the webpage quality, to obtain a new webpage sorting model.
  • a webpage sorting model correction unit configured to correct an existing webpage sorting model according to the comprehensive score of the selected webpage and the webpage quality, to obtain a new webpage sorting model.
  • An embodiment of the present invention provides a method and an apparatus for establishing a webpage quality model, and a method for evaluating a webpage quality and a device thereof, by mining a selected user of each webpage included in the search engine log from a search engine log a behavior indicator, which calculates a webpage quality of the corresponding webpage according to the selected user behavior indicator of each webpage that is mined; and extracts, from the search engine log, a selected quality feature of each webpage included in the search engine log;
  • the search engine log includes the page quality and selected quality characteristics of each web page to establish a web page quality model.
  • the webpage quality model is automatically implemented based on a large number of search engine logs. Compared with the manual summarization method in the prior art, the accuracy of the established webpage quality model is high, and the calculated webpage quality accuracy is high, thereby Ensure the accuracy and user experience of page sorting results.
  • FIG. 1 is a schematic flowchart of a method for establishing a webpage quality model according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of search results obtained by comparing an existing web page sorting model and a new web page sorting model for webpage search according to an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a device for establishing a webpage quality model according to an embodiment of the present invention.
  • the embodiment of the present invention provides a method for establishing a webpage quality model.
  • the flow of the method is as shown in FIG. 1 , and the execution subject may be a server or the like. Etc.
  • the following takes the server as an example. The steps are as follows:
  • S11 Digging out a selected user behavior indicator of each webpage included in the search engine log from the search engine log, and calculating a webpage quality of the corresponding webpage according to the selected user behavior indicator of each webpage that is mined.
  • the server Searching according to the keyword, sorting the obtained webpages and then feeding back to the search engine for the user to select.
  • the server records the process of interaction with the search engine and saves it in the search engine log, so the page quality model can be built based on the search engine logs.
  • the search engine log in the set time period can be obtained, and then the webpage included in the search engine log can be obtained.
  • the set time period can be the last 30 days, the last 45 days, the last 60 days, and the like, and can be set according to actual needs.
  • the selected user behavior indicators of each webpage included in the search engine log are extracted from the search engine log, and the selected user behavior indicators include at least: total clicks, long clicks, last clicks, and navigation clicks.
  • the selected user behavior indicators include at least: total clicks, long clicks, last clicks, and navigation clicks.
  • the total number of clicks is the number of times the page is clicked, for example, the number of times the page was clicked in the search engine log of the last 60 days;
  • the long click volume is the number of times the webpage is clicked and the stay time on the webpage exceeds the first set duration.
  • the first set duration can be 30 seconds, 40 seconds, 50 seconds, etc., and can be set according to actual needs. For example, the number of times the webpage has been recorded in the search engine log of the last 60 days and the stay time on the webpage exceeds 40 seconds;
  • the last click is the number of times the webpage is last clicked on the search result. For example, it may be the number of times the webpage is last clicked by the server in the search engine log recorded in the last 60 days;
  • the navigation click is the number of times the webpage is uniquely clicked in the search results. For example, it may be the number of times the webpage is uniquely clicked in the search results of the search engine logs recorded by the webpage in the last 60 days of the search engine log.
  • the quality of the webpage of the corresponding webpage is calculated according to the selected user behavior index of each webpage that is mined, so that the webpage quality of the webpage in the search engine can be obtained.
  • S12 Extract the selected quality characteristics of each webpage included in the search engine log from the search engine log.
  • the selected quality features include at least one or a combination of user behavior dimension features, web page dimension features, and third party evaluation features. among them:
  • the user behavior dimension feature refers to judging the quality of the webpage from the perspective of the user, which may be the total click volume, the last click volume, and the average click location of the webpage, etc., and the user behavior dimension may be extracted from the search engine log;
  • the dimension of the webpage refers to the quality of the webpage only from the content of the webpage. Specifically, it refers to the title of the webpage, whether the content is fluent, whether there are any keywords such as keywords, such as the number of answers on a question and answer page, the number of users’ likes, and whether With the best answer, etc., the web page dimension feature can be directly extracted by analyzing the content of the web page;
  • the third-party evaluation feature refers to the quality of the webpage from the perspective of a third party, specifically whether a third party gives a link to the webpage, the size of the access traffic of the webpage, etc., and the third party may be another webpage. Third-party evaluation features need to be analyzed by links or obtained from a third party through cooperation.
  • S13 Establishing according to the webpage quality and selected quality features of each webpage included in the search engine log Web quality model.
  • the page quality model may be constructed by using a Gradient Boosting Decision Tree (GBDT) algorithm according to the quality of the webpage calculated in S11 and the selected quality features of each webpage extracted in S12, and the algorithm may be, but not limited to, the GBDT algorithm.
  • GBDT Gradient Boosting Decision Tree
  • the webpage quality model is automatically implemented based on a large number of search engine logs.
  • the accuracy of the established webpage quality model is high, and the calculated webpage quality accuracy is high, thereby Ensure the accuracy and user experience of page sorting results.
  • the quality of the webpage corresponding to the webpage is calculated according to the selected user behavior indicator of each webpage that is excavated in the foregoing S11, and specifically includes:
  • the quality of the webpage corresponding to the user behavior ratio of the current webpage is determined according to the correspondence between the range of the user behavior ratio and the webpage quality.
  • the sum of the last click volume, the navigation click volume, and the long click amount of the current webpage may be calculated first, and The first sum value; calculating the sum of the total click volume of the current webpage and the first experience value to obtain a second sum value; calculating a ratio of the first sum value to the second sum value, and determining the ratio as the user behavior ratio of the current webpage.
  • the value is preferably 20.
  • the correspondence between the range of the user behavior ratio and the quality of the webpage may be pre-established, and the webpage quality corresponding to the range of the ratio of each user behavior is saved in the correspondence relationship, and when the ratio of the user behavior of the webpage is obtained, the correspondence relationship may be obtained from the correspondence relationship.
  • the following example illustrates the correspondence between the range of user behavior ratio and the quality of the webpage, as shown in the following table:
  • the quality of the webpage is 0, 1, 2, 3, and 4. The higher the quality of the webpage, the better the quality of the webpage.
  • the method for establishing a webpage quality model further includes: filtering a webpage included in the search engine log according to the webpage quality and the selected user behavior indicator.
  • Determining the quality feature to establish the webpage quality model may further include: establishing a webpage quality model according to the webpage quality of the webpage included in the filtered search engine log and the selected quality feature.
  • the method for filtering the webpage included in the search engine log according to the webpage quality and the selected user behavior indicator is: obtaining the total click volume of each webpage; deleting the webpage whose total click volume is less than or equal to the first set number of times; A webpage with a click volume greater than the first set number of times and less than or equal to the second set number of times, the webpage with the smallest webpage quality is retained, and the webpage other than the retained webpage is deleted; for the webpage whose total click volume is greater than the second set number of times , retain the webpage with the highest quality of the webpage, and delete the webpage except the reserved webpage.
  • the first set number of times and the second set number of times can be set according to actual needs. In this example, the first set number of times is 4, and the second set number of times is 10.
  • the search engine logs of the last 60 days are mined, and after the above rules are filtered, 24 million web pages and corresponding webpage quality are finally obtained.
  • the model can also be used to modify the existing webpage sorting model to obtain a new webpage sorting model.
  • the specific execution steps are as follows:
  • the existing web page sorting model is modified according to the comprehensive score of the selected webpage and the webpage quality, and a new webpage sorting model is obtained.
  • All the current webpages are saved in the entire webpage. All or part of the webpages in the full set of webpages can be selected to participate in the correction of the webpage sorting model. Each time a webpage can be selected to correct the webpage sorting model, and after multiple corrections, a new webpage is obtained. Sorting the model, each time the selected webpage is the selected webpage, after obtaining the new webpage sorting model, the new webpage sorting model is used to sort the search results.
  • the selected quality features have been described in S12 and will not be described here.
  • the selected quality feature of the selected webpage may be substituted into the webpage quality model established in S13 to obtain the webpage quality of the selected webpage. It should be noted that if the selected webpage has not been visited yet, the selected webpage is not Web page dimension features contain only user behavior dimension features and third-party dimension features, but this does not affect the quality of the pages of the selected pages.
  • the search request corresponding to the selected webpage can be obtained, and the matching degree between the text content of the selected webpage and the search request corresponding to the selected webpage is calculated, and the matching degree is determined to be selected.
  • the method for calculating the matching degree can adopt the method of the prior art, and details are not described herein again.
  • the webpage search ranking model is trained by Gbrank algorithm.
  • the new webpage sorting model can also be modified by Gbrank algorithm.
  • the new webpage sorting model adds two features, namely webpage Comprehensive score and page quality of the page. Since the comprehensive score of the webpage and the quality of the webpage are comprehensively considered, the accuracy of the webpage sorting model can be increased when the new webpage sorting model is used to sort the search results, and the webpage with high comprehensive score and high webpage quality can be arranged in front, which is convenient. Users make choices to enhance the user experience.
  • the webpage quality of the selected webpage may be normalized; and the escape penalty of the selected webpage is calculated according to the text score of the selected webpage. Score; multiply the escaped penalty score of the selected webpage by the text score, and then add the set floating point number, multiply the obtained sum value by the quality of the webpage normalized by the selected webpage, and obtain the comprehensive score of the selected webpage. .
  • the normalized webpage quality of the selected webpage (the quality of the webpage of the selected webpage-the webpage quality corresponding to the webpage with the smallest webpage in the webpage of the webpage)/ (The quality of the page corresponding to the page with the largest page quality in the entire webpage - the quality of the page corresponding to the page with the smallest page quality in the entire page.)
  • the composite score of the selected webpage the normalized webpage quality of the selected webpage (the text score of the selected webpage* the escape penalty score of the selected webpage + setting) Floating point number), the floating point number is preferably set to 0.01f.
  • the escape penalty score of the selected webpage when calculating the escape penalty score of the selected webpage according to the text score of the selected webpage, it may first determine whether the text score of the selected webpage is greater than the first set value; if the text score of the selected webpage is greater than or equal to the first When the value is set, it is determined that the escape penalty score of the selected webpage is equal to the second set value; if the text score of the selected webpage is less than the first set value, determining that the escaped penalty score of the selected webpage is equal to the selected webpage The ratio of the text score to the first set value.
  • the first set value and the second set value may be set according to actual needs.
  • the following is an example in which the first set value is 130 and the second set value is 1 as an example. If the text score of the selected webpage is greater than or If it is equal to 130, the escape penalty score is equal to 1, otherwise, the escape penalty score is equal to the text score of the selected web page divided by 130.
  • the embodiment of the present invention further provides a webpage quality evaluation method, including:
  • the evaluation of the quality of the webpage is implemented according to the size of the comprehensive score of the selected webpage.
  • calculating a text score of the selected webpage includes:
  • the degree of matching is determined as the text score of the selected web page.
  • calculating a comprehensive score of the selected webpage according to the webpage quality and the text score of the selected webpage including:
  • calculating an escape penalty score of the selected webpage according to a text score of the selected webpage including:
  • the selected quality feature comprises at least one or a combination of a user behavior dimension feature, a webpage dimension feature, and a third party evaluation feature.
  • the method further comprises: modifying an existing webpage ranking model according to the comprehensive score of the selected webpage and the webpage quality, and obtaining a new webpage ranking model to sort the search results.
  • the webpage quality model in the present invention is based on the webpage included in the search engine log, and each webpage serves as a webpage.
  • the method of the present invention adopts a sample of 10 million orders, which is much higher than the hundreds and thousands of samples referenced when summarizing the artificial rule, and the sample is more comprehensive and generalized.
  • the manual rules are likely to appear the standards of the rule makers and the criteria for users to judge the quality of the page Inconsistent, resulting in a situation affecting the user experience, and the present invention establishes a webpage quality model by mining selected user behavior indicators, and judges the quality of the webpage by the user's standard, thereby ensuring the webpage quality standards and the standards of the user's mind as much as possible. Consistent, reducing the problem of non-uniformity between the two standards.
  • the present invention can effectively improve the ranking of web pages, reduce the chances of dead links, low quality, and cheating web pages being displayed to users, and improve the probability that high quality web pages are presented to users.
  • the following is an example to compare the search results obtained by using the existing web page sorting model and the new web page sorting model for web search, as shown in Figure 2, the keyword is "Xinyi City Third Middle School Post Bar", and the left side is adopted new
  • the search result obtained by the webpage sorting model, the search result obtained by using the existing webpage sorting model on the right, and the webpage in the box is the best result.
  • the webpage sorting model is used to sort the webpage.
  • the ranking of the best result is upgraded from the second to the first, the sorting position of the best result is improved, and the sorting result obtained by the user using the new webpage sorting model is easier to find the best result, thereby improving the user experience.
  • an embodiment of the present invention provides a device for establishing a webpage quality model, which may be disposed in a server, and has a structure as shown in FIG. 3, including a webpage quality calculating unit 31, a selected quality feature extracting unit 32, and a webpage. a quality model establishing unit 33; wherein
  • the webpage quality calculation unit 31 is configured to mine a selected user behavior indicator of each webpage included in the search engine log from the search engine log, and calculate a webpage quality of the corresponding webpage according to the selected user behavior index of each webpage that is mined. ;
  • the selected quality feature extraction unit 32 is configured to extract, from the search engine log, selected quality features of each webpage included in the search engine log;
  • the webpage quality model establishing unit 33 is configured to establish a webpage quality model according to the webpage quality and the selected quality feature of each webpage included in the search engine log.
  • the webpage quality model is automatically implemented based on a large number of search engine logs.
  • the accuracy of the established webpage quality model is high, and the calculated webpage quality accuracy is high, thereby Ensure the accuracy and user experience of page sorting results.
  • the selected user behavior indicator includes at least one or a combination of total clicks, long clicks, last clicks, and navigation clicks, wherein:
  • the total number of clicks is the number of times a webpage is clicked.
  • the long click volume is the number of times the webpage stays on the webpage after the click time is exceeded, and the last click volume is the number of times the webpage is last clicked in the search result.
  • Navigation clicks are the number of times a page is uniquely clicked in search results.
  • the webpage quality calculation unit 31 specifically includes a user behavior ratio calculation subunit and a webpage quality determination subunit; wherein
  • the user behavior ratio calculation sub-unit is configured to: for each webpage, perform: calculating a user behavior ratio of the current webpage according to the total click volume, the long click volume, the last click volume, and the navigation click volume of the current webpage;
  • the webpage quality determining sub-unit is configured to determine the webpage quality corresponding to the user behavior ratio of the current webpage according to the correspondence between the range of the user behavior ratio and the webpage quality.
  • the user behavior ratio calculation subunit specifically includes a first sum value calculation subunit, a second sum value calculation subunit, and a user behavior ratio determination subunit;
  • a first sum value calculation sub-unit configured to calculate a sum of a last hit, a navigation click, and a long click of the current webpage to obtain a first sum value
  • a second sum value calculation subunit configured to calculate a sum of a total click amount of the current webpage and a first experience value to obtain a second sum value
  • the user behavior ratio determining subunit is configured to calculate a ratio of the first sum value to the second sum value, and determine the ratio as the user behavior ratio of the current web page.
  • the foregoing webpage quality model establishing apparatus further includes a webpage filtering unit, configured to:
  • the webpage quality model establishing unit is configured to establish a webpage quality model according to the webpage quality and the selected quality feature of the webpage included in the filtered search engine log.
  • the webpage filtering unit specifically includes a total click volume obtaining subunit and a webpage filtering subunit;
  • the total click volume acquisition sub-unit is used to obtain the total amount of clicks of each webpage included in the search engine log;
  • a webpage filtering sub-unit configured to delete a webpage whose total click volume is less than or equal to the first set number of times; and for a webpage whose total click volume is greater than the first set number of times and less than or equal to the second set number of times, the webpage with the smallest webpage quality is retained , deleting the webpage except the reserved webpage; for the webpage whose total click volume is greater than the second set number of times, the webpage with the highest webpage quality is retained, and the webpage except the retained webpage is deleted.
  • the selected quality feature includes at least one or a combination of a user behavior dimension feature, a web page dimension feature, and a third party evaluation feature.
  • the foregoing webpage quality establishing apparatus further includes a selected quality feature substituting unit, a text score calculating unit, a comprehensive score calculating unit, and a webpage sorting model correcting unit; wherein
  • the selected quality feature substituting unit is configured to substitute the selected quality feature of the selected webpage in the entire webpage into the webpage quality model to obtain the webpage quality of the selected webpage;
  • a text score calculation unit for calculating a text score of the selected webpage
  • a comprehensive score calculation unit configured to calculate a comprehensive score of the selected webpage according to the webpage quality and the text score of the selected webpage
  • the webpage sorting model correction unit is configured to correct the existing webpage sorting model according to the comprehensive score of the selected webpage and the webpage quality, and obtain a new webpage sorting model.
  • the text score calculation unit specifically includes a search request acquisition subunit, a matching degree calculation subunit, and a text score determination subunit; wherein
  • a search request acquisition subunit configured to obtain a search request corresponding to the selected webpage
  • a matching degree calculation subunit configured to calculate a matching degree between the text content of the selected webpage and the search request corresponding to the selected webpage
  • a text score determination sub-unit for determining the degree of matching as the text score of the selected web page.
  • the comprehensive score calculation unit specifically includes a normalization subunit, a escape penalty score calculation subunit, and a comprehensive score calculation subunit;
  • An escape penalty score calculation sub-unit configured to calculate an escape penalty score of the selected webpage according to a text score of the selected webpage
  • the comprehensive score calculation sub-unit is configured to multiply the escape penalty score of the selected webpage by the text score, and then add the set floating point number, and multiply the obtained sum value by the quality of the webpage normalized by the selected webpage, and obtain The combined score for the selected page.
  • the escape penalty score calculation subunit specifically includes a text score judgment subunit and a escape penalty score determination subunit; wherein,
  • a text score judging sub-unit configured to determine whether a text score of the selected webpage is greater than a first set value
  • the escape penalty score determining subunit is configured to determine that the escape penalty penalty score of the selected webpage is equal to the second set value if the text score of the selected webpage is greater than or equal to the first set value; if the text score of the selected webpage is If it is less than the first set value, it is determined that the escape penalty score of the selected webpage is equal to the ratio of the text score of the selected webpage to the first set value.
  • the embodiment of the present invention further provides a webpage quality evaluation apparatus, comprising: the foregoing webpage quality model establishing apparatus, a selected quality feature substituting unit, a text score calculating unit, and an embodiment of the webpage quality model establishing apparatus. a comprehensive score calculation unit and an evaluation unit; wherein
  • the selected quality feature substituting unit is configured to substitute the selected quality feature of the selected webpage in the webpage full set into the webpage quality model to obtain the webpage quality of the selected webpage;
  • the text score calculation unit is configured to calculate a text score of the selected webpage
  • the comprehensive score calculation unit is configured to calculate a comprehensive score of the selected webpage according to the webpage quality and the text score of the selected webpage;
  • the evaluation unit is configured to evaluate the quality of the webpage according to the size of the comprehensive score of the selected webpage.
  • the text score calculation unit specifically includes a search request acquisition subunit, a matching degree calculation subunit, and a text score determination subunit; wherein
  • the search request acquisition subunit is configured to obtain a search request corresponding to the selected webpage
  • the matching degree calculation subunit is configured to calculate a matching degree between the text content of the selected webpage and the search request corresponding to the selected webpage;
  • the text score determining subunit is configured to determine the matching degree as a text score of the selected webpage.
  • the comprehensive score calculation unit specifically includes a normalization subunit, a escape penalty score calculation subunit, and a comprehensive score calculation subunit; wherein
  • the normalization subunit is configured to normalize the quality of the webpage of the selected webpage
  • the escape penalty score calculation subunit is configured to calculate an escape penalty score of the selected webpage according to the text score of the selected webpage;
  • the comprehensive score calculation sub-unit is configured to obtain an escape penalty score and text of the selected webpage After the phase multiplication, the floating point number is added, and the obtained sum value is multiplied by the quality of the web page normalized by the selected web page to obtain a comprehensive score of the selected web page.
  • the escape penalty score calculation subunit specifically includes a text score judgment subunit and a escape penalty score determination subunit; wherein
  • the text score determining subunit is configured to determine whether a text score of the selected webpage is greater than a first set value
  • the escape penalty score determining subunit configured to determine that the escape penalty penalty score of the selected webpage is equal to the second set value if the text score of the selected webpage is greater than or equal to the first set value If the text score of the selected webpage is less than the first set value, determining that the escape penalty score of the selected webpage is equal to a ratio of the text score of the selected webpage to the first set value .
  • the selected quality feature comprises at least one or a combination of a user behavior dimension feature, a webpage dimension feature, and a third party evaluation feature.
  • the method further includes: a webpage sorting model correction unit, configured to correct an existing webpage sorting model according to the comprehensive score of the selected webpage and the webpage quality, to obtain a new webpage sorting model.
  • a webpage sorting model correction unit configured to correct an existing webpage sorting model according to the comprehensive score of the selected webpage and the webpage quality, to obtain a new webpage sorting model.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • the present invention also discloses a terminal device comprising: a memory for storing computer program instructions for performing the method as described in FIG. 1; a processor coupled to the memory, the processor It is configured to execute computer program instructions stored in the memory.
  • the method according to the invention can also be implemented as a computer program executed by a processor (such as a CPU) in a mobile terminal and stored in a memory of the mobile terminal.
  • a processor such as a CPU
  • the processor performs the functions described above in the method of the present invention.
  • the method according to the invention may also be embodied as a computer program product comprising a computer readable medium on which is stored a computer program for performing the functions described above in the method of the invention. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种网页质量模型的建立方法及装置,该方法包括:从搜索引擎日志中挖掘出所述搜索引擎日志包括的每个网页的选定用户行为指标,根据挖掘出的每个网页的选定用户行为指标计算对应网页的网页质量(S11);从所述搜索引擎日志中提取所述搜索引擎日志包括的每个网页的选定质量特征(S12);根据所述搜索引擎日志包括的每个网页的网页质量和选定质量特征建立网页质量模型(S13)。该方案建立的网页质量模型的准确性较高,计算的网页质量准确性较高,从而确保网页排序结果的准确性和用户体验。

Description

网页质量模型的建立方法及装置 技术领域
本发明涉及网络技术领域,尤其涉及一种网页质量模型的建立方法及装置。
发明背景
随着网络技术的飞速发展,越来越多的用户通过网页获取信息。用户可以在搜索引擎中输入关键字(query),搜索引擎将用户输入的关键字发送给服务器,服务器搜索该关键字对应的网页,然后将搜索到的网页排序后反馈给搜索引擎,以便于用户进行选择。为了提升用户体验,服务器尽可能按照相关度以及网页质量对搜索到的网页进行排序,可见网页质量是影响网页排序的重要因素。目前,通常根据网页质量模型得到网页质量,而网页质量模型准确性的高低会直接影响到网页排序结果和用户体验。
现有的网页质量模型建立方法是,由人工从有限的样本中总结出多个人工规则,比如通过观察几百个、几千个网页总结影响网页质量的特征,每个特征可以作为一个人工规则;然后组合这些人工规则得到网页质量模型。该方法中,由于观察的样本个数非常有限,建立的网页质量模型的准确性较差,从而导致计算的网页质量准确性较差,进而影响网页排序结果和用户体验。
发明内容
本发明实施例提供一种网页质量模型的建立方法及装置,用以解决现有技术中存在的建立的网页质量模型的准确性较差的问题。
根据本发明实施例,提供一种网页质量模型的建立方法,包括:
从搜索引擎日志中挖掘出所述搜索引擎日志包括的每个网页的选定用户行为指标,根据挖掘出的每个网页的选定用户行为指标计算对应网页的网页质量;
从所述搜索引擎日志中提取所述搜索引擎日志包括的每个网页的选定质量特征;
根据所述搜索引擎日志包括的每个网页的网页质量和选定质量特征建立网页质量模型。
优选的,所述选定用户行为指标至少包括总点击量、长点击量、最后点击量和导航点击量四者之一或组合,其中:
所述总点击量是网页被点击的次数,所述长点击量是网页被点击后、在该网页上的停留时间超过第一设定时长的次数,所述最后点击量是网页在搜索结果中被最后点击的次数,所述导航点击量是网页在搜索结果中被唯一点击的次数。
优选的,根据挖掘出的每个网页的选定用户行为指标计算对应网页的网页质量,具体包括:
针对每个网页,执行:
根据当前网页的总点击量、长点击量、最后点击量和导航点击量计算所述当前网页的用户行为比值;
根据用户行为比值的范围与网页质量的对应关系,确定当前网页的用户行为比值对应的网页质量。
优选的,根据所述当前网页的总点击量、长点击量、最后点击量和导航点击量计算所述当前网页的用户行为比值,具体包括:
计算所述当前网页的最后点击量、导航点击量和长点击量之和,得到第一和值;
计算所述当前网页的总点击量和第一经验值之和,得到第二和值;
计算所述第一和值和所述第二和值的比值,将所述比值确定为所述当前网页的用户行为比值。
可选的,还包括:
在根据所述搜索引擎日志包括的每个网页的网页质量和选定质量特征建立网页质量模型步骤之前,先根据网页质量和选定用户行为指标过滤所述搜索引擎日志包括的网页,然后再根据过滤后的所述搜索引擎日志包括的网页的网页质量和选定质量特征建立网页质量模型。
优选的,根据网页质量和选定用户行为指标过滤所述搜索引擎日志包括的网页,具体包括:
获取所述搜索引擎日志包括的每个网页的总点击量;
删除总点击量小于或等于第一设定次数的网页;
对于总点击量大于所述第一设定次数且小于或等于第二设定次数的网页,保留网页质量最小的网页,删除掉除保留的网页之外的网页;
对于总点击量大于所述第二设定次数的网页,保留网页质量最大的网页,删除掉除保留的网页之外的网页。
优选的,所述选定质量特征至少包括用户行为维度特征、网页维度特征和第三方评价特征三者之一或组合。
可选的,还包括:
将网页全集中选定网页的选定质量特征代入所述网页质量模型,得到所述选定网页的网页质量;
计算所述选定网页的文本得分;
根据所述选定网页的网页质量和文本得分计算所述选定网页的综合得分;
根据所述选定网页的综合得分和网页质量修正已有网页排序模型,得到新的网页排序模型。
优选的,计算所述选定网页的文本得分,具体包括:
获取所述选定网页对应的搜索请求;
计算所述选定网页的文本内容与所述选定网页对应的搜索请求之间的匹 配度;
将所述匹配度确定为所述选定网页的文本得分。
优选的,根据所述选定网页的网页质量和文本得分计算所述选定网页的综合得分,具体包括:
将所述选定网页的网页质量归一化;
根据所述选定网页的文本得分计算所述选定网页的转义惩罚得分;
将所述选定网页的转义惩罚得分与文本得分相乘后再加上设定浮点数,将得到的和值与所述选定网页归一化的网页质量相乘,得到所述选定网页的综合得分。
优选的,根据所述选定网页的文本得分计算所述选定网页的转义惩罚得分,具体包括:
判断所述选定网页的文本得分是否大于第一设定数值;
若所述选定网页的文本得分大于或等于所述第一设定数值,则确定所述选定网页的转义惩罚得分等于第二设定数值;
若所述选定网页的文本得分小于所述第一设定数值,则确定所述选定网页的转义惩罚得分等于所述选定网页的文本得分与所述第一设定数值的比值。
优选的,将所述选定网页的网页质量归一化的方法包括:选定网页的归一化的网页质量=(选定网页的网页质量-网页全集中网页质量最小的网页对应的网页质量)/(网页全集中最大网页质量的网页对应的网页质量-网页全集中最小网页质量的网页对应的网页质量)。
根据本发明实施例,还提供一种网页质量评价方法,其特征在于,包括:
将网页全集中选定网页的选定质量特征代入根据上述方法建立的网页质量模型,得到所述选定网页的网页质量;
计算所述选定网页的文本得分;
根据所述选定网页的网页质量和文本得分计算所述选定网页的综合得分;
根据所述选定网页的综合得分的大小实现对网页质量的评价。
优选的,计算所述选定网页的文本得分,包括:
获取所述选定网页对应的搜索请求;
计算所述选定网页的文本内容与所述选定网页对应的搜索请求之间的匹配度;
将所述匹配度确定为所述选定网页的文本得分。
优选的,根据所述选定网页的网页质量和文本得分计算所述选定网页的综合得分,包括:
将所述选定网页的网页质量归一化;
根据所述选定网页的文本得分计算所述选定网页的转义惩罚得分;
将所述选定网页的转义惩罚得分与文本得分相乘后再加上设定浮点数,将得到的和值与所述选定网页归一化的网页质量相乘,得到所述选定网页的综合得分。
优选的,根据所述选定网页的文本得分计算所述选定网页的转义惩罚得 分,包括:
判断所述选定网页的文本得分是否大于第一设定数值;
若所述选定网页的文本得分大于或等于所述第一设定数值,则确定所述选定网页的转义惩罚得分等于第二设定数值;
若所述选定网页的文本得分小于所述第一设定数值,则确定所述选定网页的转义惩罚得分等于所述选定网页的文本得分与所述第一设定数值的比值。
优选的,将所述选定网页的网页质量归一化的方法包括:选定网页的归一化的网页质量=(选定网页的网页质量-网页全集中网页质量最小的网页对应的网页质量)/(网页全集中最大网页质量的网页对应的网页质量-网页全集中最小网页质量的网页对应的网页质量)。
优选的,所述选定质量特征至少包括用户行为维度特征、网页维度特征和第三方评价特征三者之一或组合。
优选的,还包括:根据所述选定网页的综合得分和网页质量来修正已有网页排序模型,得到新的网页排序模型以对搜索结果进行排序。
根据本发明实施例,还提供一种网页质量模型的建立装置,包括:
网页质量计算单元,用于从搜索引擎日志中挖掘出所述搜索引擎日志包括的每个网页的选定用户行为指标,根据挖掘出的每个网页的选定用户行为指标计算对应网页的网页质量;
选定质量特征提取单元,用于从所述搜索引擎日志中提取所述搜索引擎日志包括的每个网页的选定质量特征;
网页质量模型建立单元,用于根据所述搜索引擎日志包括的每个网页的网页质量和选定质量特征建立网页质量模型。
优选的,所述选定用户行为指标至少包括总点击量、长点击量、最后点击量和导航点击量四者之一或组合,其中:
所述总点击量是网页被点击的次数,所述长点击量是网页被点击后、在该网页上的停留时间超过第一设定时长的次数,所述最后点击量是网页在搜索结果中被最后点击的次数,所述导航点击量是网页在搜索结果中被唯一点击的次数。
优选的,所述网页质量计算单元具体包括用户行为比值计算子单元和网页质量确定子单元;其中,
所述用户行为比值计算子单元,用于针对每个网页,执行:根据当前网页的总点击量、长点击量、最后点击量和导航点击量计算所述当前网页的用户行为比值;
所述网页质量确定子单元,用于根据用户行为比值的范围与网页质量的对应关系,确定所述当前网页的用户行为比值对应的网页质量。
优选的,所述用户行为比值计算子单元具体包括第一和值计算子单元、第二和值计算子单元和用户行为比值确定子单元;其中,
所述第一和值计算子单元,用于计算当前网页的最后点击量、导航点击量和长点击量之和,得到第一和值;
所述第二和值计算子单元,用于计算所述当前网页的总点击量和第一经验值之和,得到第二和值;
所述用户行为比值确定子单元,用于计算所述第一和值和所述第二和值的比值,将所述比值确定为所述当前网页的用户行为比值。
可选的,还包括网页过滤单元,用于:
根据网页质量和选定用户行为指标过滤所述搜索引擎日志包括的网页;
此时,所述网页质量模型建立单元用于根据过滤后的所述搜索引擎日志包括的网页的网页质量和选定质量特征建立网页质量模型。
优选的,所述网页过滤单元具体包括总点击量获取子单元和网页过滤子单元;其中,
所述总点击量获取子单元,用于获取所述搜索引擎日志包括的每个网页的总点击量;
所述网页过滤子单元,用于删除总点击量小于或等于第一设定次数的网页;对于总点击量大于所述第一设定次数且小于或等于第二设定次数的网页,保留网页质量最小的网页,删除掉除保留的网页之外的网页;对于总点击量大于所述第二设定次数的网页,保留网页质量最大的网页,删除掉除保留的网页之外的网页。
优选的,所述选定质量特征至少包括用户行为维度特征、网页维度特征和第三方评价特征三者之一或组合。
可选的,还包括选定质量特征代入单元、文本得分计算单元、综合得分计算单元和网页排序模型修正单元;其中,
所述选定质量特征代入单元,用于将网页全集中选定网页的选定质量特征代入所述网页质量模型,得到所述选定网页的网页质量;
所述文本得分计算单元,用于计算所述选定网页的文本得分;
所述综合得分计算单元,用于根据所述选定网页的网页质量和文本得分计算所述选定网页的综合得分;
所述网页排序模型修正单元,用于根据所述选定网页的综合得分和网页质量修正已有网页排序模型,得到新的网页排序模型。
优选的,所述文本得分计算单元具体包括搜索请求获取子单元、匹配度计算子单元和文本得分确定子单元;其中,
所述搜索请求获取子单元,用于获取所述选定网页对应的搜索请求;
所述匹配度计算子单元,用于计算所述选定网页的文本内容与所述选定网页对应的搜索请求之间的匹配度;
所述文本得分确定子单元,用于将所述匹配度确定为所述选定网页的文本得分。
优选的,所述综合得分计算单元具体包括归一化子单元、转义惩罚得分计算子单元和综合得分计算子单元;其中,
所述归一化子单元,用于将所述选定网页的网页质量归一化;
所述转义惩罚得分计算子单元,用于根据所述选定网页的文本得分计算所 述选定网页的转义惩罚得分;
所述综合得分计算子单元,用于将所述选定网页的转义惩罚得分与文本得分相乘后再加上设定浮点数,将得到的和值与所述选定网页归一化的网页质量相乘,得到所述选定网页的综合得分。
优选的,所述转义惩罚得分计算子单元具体包括文本得分判断子单元和转义惩罚得分确定子单元;其中,
所述文本得分判断子单元,用于判断所述选定网页的文本得分是否大于第一设定数值;
所述转义惩罚得分确定子单元,用于若所述选定网页的文本得分大于或等于所述第一设定数值,则确定所述选定网页的转义惩罚得分等于第二设定数值;若所述选定网页的文本得分小于所述第一设定数值,则确定所述选定网页的转义惩罚得分等于所述选定网页的文本得分与所述第一设定数值的比值。
根据本发明实施例,还提供一种网页质量评价装置,其特征在于,包括:上述的网页质量模型的建立装置、选定质量特征代入单元、文本得分计算单元、综合得分计算单元和评价单元;其中,
所述选定质量特征代入单元,用于将网页全集中选定网页的选定质量特征代入所述网页质量模型,得到所述选定网页的网页质量;
所述文本得分计算单元,用于计算所述选定网页的文本得分;
所述综合得分计算单元,用于根据所述选定网页的网页质量和文本得分计算所述选定网页的综合得分;
所述评价单元,用于根据所述选定网页的综合得分的大小对网页质量进行评价。
优选的,所述文本得分计算单元具体包括搜索请求获取子单元、匹配度计算子单元和文本得分确定子单元;其中,
所述搜索请求获取子单元,用于获取所述选定网页对应的搜索请求;
所述匹配度计算子单元,用于计算所述选定网页的文本内容与所述选定网页对应的搜索请求之间的匹配度;
所述文本得分确定子单元,用于将所述匹配度确定为所述选定网页的文本得分。
优选的,所述综合得分计算单元具体包括归一化子单元、转义惩罚得分计算子单元和综合得分计算子单元;其中,
所述归一化子单元,用于将所述选定网页的网页质量归一化;
所述转义惩罚得分计算子单元,用于根据所述选定网页的文本得分计算所述选定网页的转义惩罚得分;
所述综合得分计算子单元,用于将所述选定网页的转义惩罚得分与文本得分相乘后再加上设定浮点数,将得到的和值与所述选定网页归一化的网页质量相乘,得到所述选定网页的综合得分。
优选的,所述转义惩罚得分计算子单元具体包括文本得分判断子单元和转义惩罚得分确定子单元;其中,
所述文本得分判断子单元,用于判断所述选定网页的文本得分是否大于第一设定数值;
所述转义惩罚得分确定子单元,用于若所述选定网页的文本得分大于或等于所述第一设定数值,则确定所述选定网页的转义惩罚得分等于第二设定数值;若所述选定网页的文本得分小于所述第一设定数值,则确定所述选定网页的转义惩罚得分等于所述选定网页的文本得分与所述第一设定数值的比值。
优选的,所述选定质量特征至少包括用户行为维度特征、网页维度特征和第三方评价特征三者之一或组合。
优选的,还包括:网页排序模型修正单元,用于根据所述选定网页的综合得分和网页质量修正已有网页排序模型,得到新的网页排序模型。
本发明实施例提供一种网页质量模型的建立方法及装置,还提供一种网页质量评价方法及其装置,通过从搜索引擎日志中挖掘出所述搜索引擎日志包括的每个网页的选定用户行为指标,根据挖掘出的每个网页的选定用户行为指标计算对应网页的网页质量;从所述搜索引擎日志中提取所述搜索引擎日志包括的每个网页的选定质量特征;根据所述搜索引擎日志包括的每个网页的网页质量和选定质量特征建立网页质量模型。该方案中,基于大量的搜索引擎日志来自动实现建立网页质量模型,相对于现有技术中人工总结的方式,建立的网页质量模型的准确性较高,计算的网页质量准确性较高,从而确保网页排序结果的准确性和用户体验。
附图简要说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例中一种网页质量模型的建立方法的流程示意图;
图2为本发明实施例中对比采用已有网页排序模型和新的网页排序模型进行网页搜索时得到的搜索结果的示意图;
图3为本发明实施例中一种网页质量模型的建立装置的流程示意图。
实施本发明的方式
针对现有技术中存在的建立的网页质量模型的准确性较差的问题,本发明实施例提供一种网页质量模型的建立方法,该方法的流程如图1所示,执行主体可以是服务器等等,以下以服务器为例进行说明,执行步骤如下:
S11:从搜索引擎日志中挖掘出搜索引擎日志包括的每个网页的选定用户行为指标,根据挖掘出的每个网页的选定用户行为指标计算对应网页的网页质量。
当用户需要搜索信息时,可以在客户端的搜索引擎上输入关键字,服务器 根据该关键字进行搜索,将得到的网页进行排序后反馈给搜索引擎,以供用户进行选择。服务器会记录与搜索引擎之间交互的过程,并保存在搜索引擎日志中,因此可以基于搜索引擎日志建立网页质量模型。
首先,可以获取设定时段内的搜索引擎日志,然后获取该搜索引擎日志包括的网页,设定时段可以最近30天、最近45天、最近60天等等,可以根据实际需要进行设定。
然后,从搜索引擎日志中挖掘出搜索引擎日志包括的每个网页的选定用户行为指标,选定用户行为指标至少包括:总点击量、长点击量、最后点击量和导航点击量四者之一或组合。其中:
总点击量是网页被点击的次数,例如可以是网页在最近60天的搜索引擎日志中记录的被点击的次数;
长点击量是网页被点击后、在该网页上的停留时间超过第一设定时长的次数,第一设定时长可以是30秒、40秒、50秒等等,可以根据实际需要进行设定,例如可以是网页在最近60天的搜索引擎日志中记录的被点击后、在该网页上的停留时间超过40秒的次数;
最后点击量是网页在搜索结果被最后点击的次数,例如可以是网页在最近60天的搜索引擎日志中记录的服务器反馈给搜索引擎的搜索结果中被最后点击的次数;
导航点击量是网页在搜索结果中被唯一点击的次数,例如可以是网页在最近60天的搜索引擎日志中记录的服务器反馈给搜索引擎的搜索结果中被唯一点击的次数。
最后,根据挖掘出的每个网页的选定用户行为指标计算对应网页的网页质量,这样就可以得到搜索引擎中的网页的网页质量。
S12:从搜索引擎日志中提取搜索引擎日志包括的每个网页的选定质量特征。
选定质量特征至少包括:用户行为维度特征、网页维度特征和第三方评价特征三者之一或组合。其中:
用户行为维度特征是指从用户的角度来判断网页质量高低,具体可以是网页的总点击量、最后点击量以及平均点击位置等等,用户行为维度可以从搜索引擎日志提取;
网页维度特征是指仅从网页内容来判断网页质量的高低,具体指网页的标题、内容是否通顺,是否有堆砌关键字等作弊行为,比如一个问答网页回答的数量、用户点赞的数量、是否有最佳答案等等,网页维度特征可以通过分析网页的内容直接抽取;
第三方评价特征是指从第三方的角度来看网页质量的高低,具体指是否有第三方给出了一个指向该网页的链接、以及该网页的访问流量大小等等,第三方可以是其它网页,第三方评价特征需要进行链接分析或者通过合作的方式从第三方拿到。
S13:根据搜索引擎日志包括的每个网页的网页质量和选定质量特征建立 网页质量模型。
可以根据S11中计算的网页质量和S12中提取的每个网页的选定质量特征,采用迭代决策树(Gradient Boosting Decision Tree,GBDT)算法构建页面质量模型,采用的算法可以但不限于GBDT算法。
该方案中,基于大量的搜索引擎日志来自动实现建立网页质量模型,相对于现有技术中人工总结的方式,建立的网页质量模型的准确性较高,计算的网页质量准确性较高,从而确保网页排序结果的准确性和用户体验。
具体的,上述S11中的根据挖掘出的每个网页的选定用户行为指标计算对应网页的网页质量,具体包括:
针对每个网页,执行:
根据当前网页的总点击量、长点击量、最后点击量和导航点击量计算当前网页的用户行为比值;
根据用户行为比值的范围与网页质量的对应关系,确定当前网页的用户行为比值对应的网页质量。
在根据当前网页的总点击量、长点击量、最后点击量和导航点击量计算当前网页的用户行为比值时,可以首先计算当前网页的最后点击量、导航点击量和长点击量之和,得到第一和值;计算当前网页的总点击量和第一经验值之和,得到第二和值;计算第一和值和第二和值的比值,将比值确定为当前网页的用户行为比值。具体可以采用下列公式计算用户行为比值:用户行为比值=(最后点击量+导航点击量+长点击量)/(总点击量+第一经验值),其中第一经验值为根据实际经验获取的值,优选的为20。
可以预先建立用户行为比值的范围与网页质量的对应关系,在该对应关系中保存每个用户行为比值的范围对应的网页质量,当获取到网页的用户行为比值时,就可以从该对应关系中确定该网页的网页质量,下面举例说明用户行为比值的范围与网页质量的对应关系,如下表所示:
用户行为比值的范围 网页质量
(0,0.1) 0
[0.1,0.3) 1
[0.3,0.5) 2
[0.5,0.8) 3
[0.8,1] 4
表1
在表1中,网页质量为0、1、2、3、4,网页质量越高表示网页的质量越好。
可选的,上述网页质量模型建立方法还包括:根据网页质量和选定用户行为指标过滤搜索引擎日志包括的网页。
相应地,上述S13中的根据搜索引擎日志包括的每个网页的网页质量和选 定质量特征建立网页质量模型,还可以包括:根据过滤后的搜索引擎日志包括的网页的网页质量和选定质量特征建立网页质量模型。
由于建立网页质量模型需要相关度高且网页质量高的网页,而搜索引擎日志包括的有些网页可能不满足要求,这时需要对搜索引擎日志包括的网页进行过滤,过滤后的网页是建立网页质量模型时真正需要的网页。
具体的,根据网页质量和选定用户行为指标过滤搜索引擎日志包括的网页的方法为:获取中每个网页的总点击量;删除总点击量小于或等于第一设定次数的网页;对于总点击量大于第一设定次数且小于或等于第二设定次数的网页,保留网页质量最小的网页,删除掉除保留的网页之外的网页;对于总点击量大于第二设定次数的网页,保留网页质量最大的网页,删除掉除保留的网页之外的网页。
当一个网页的总点击量太少时,即使网页质量很高,最终网页排序结果也不是很理想,因此,需要将搜索引擎日志包括的网页中的这类网页过滤掉。下面以选定用户行为指标为网页的点击总量为例进行说明,对于总点击量小于或等于4的网页,直接删除;对于总点击量大于4且小于或者等于10的网页,只保留网页质量为0的网页,删除其它网页质量的网页;对于总点击量大于10的网页,只保留网页质量为4的网页,删除其它网页质量的网页。第一设定次数和第二设定次数可以根据实际需要进行设定,在该示例中,第一设定次数为4,第二设定次数为10。
在实际应用中,对最近60天的搜索引擎日志进行挖掘,并经过如上的规则过滤后,最终得到2400万个网页和对应的网页质量。
以上介绍了建立网页质量模型的方法,在网页质量模型建立之后,还可以利用该模型对已有网页排序模型进行修正,得到新的网页排序模型,具体执行步骤如下:
将网页全集中选定网页的选定质量特征代入网页质量模型,得到选定网页的网页质量;
计算选定网页的文本得分;
根据选定网页的网页质量和文本得分计算选定网页的综合得分;
根据选定网页的综合得分和网页质量修正已有网页排序模型,得到新的网页排序模型。
网页全集中保存的是目前所有的网页,可以选取网页全集中的全部或部分网页参与修正网页排序模型,每次可以选取一个网页对网页排序模型进行修正,通过多次修正后,得到新的网页排序模型,每次选取的网页为选定网页,在得到新的网页排序模型后,使用新的网页排序模型对搜索结果进行排序。在S12中已经对选定质量特征进行了说明,在这里不再赘述。
可以将选定网页的选定质量特征代入到在S13中建立的网页质量模型中,得到选定网页的网页质量,需要说明的是,若选定网页还没有被访问过,则选定网页没有网页维度特征,只包含有用户行为维度特征和第三方维度特征,但是这并不影响计算选定网页的网页质量。
由于网页通常都会关联一定的搜索请求,因此,可以获取选定网页对应的搜索请求,计算选定网页的文本内容与选定网页对应的搜索请求之间的匹配度,将匹配度确定为选定网页的文本得分。计算匹配度的方法可以采用现有技术的方法,这里不再赘述。
通常已有网页搜索排序模型是用Gbrank算法训练得到的,新的网页排序模型也可以采用Gbrank算法进行修正,相对于已有网页排序模型,新的网页排序模型增加了两个特征,即网页的综合得分以及网页的网页质量。由于综合考虑网页的综合得分和网页质量,因此在使用新的网页排序模型对搜索结果进行排序时能够增加网页排序模型的准确性,可以将综合得分高且网页质量高的网页排在前面,便于用户进行选择,提升用户体验。
具体的,在根据选定网页的网页质量和文本得分计算选定网页的综合得分时,可以将选定网页的网页质量归一化;根据选定网页的文本得分计算选定网页的转义惩罚得分;将选定网页的转义惩罚得分与文本得分相乘后再加上设定浮点数,将得到的和值与选定网页归一化的网页质量相乘,得到选定网页的综合得分。
对选定网页的网页质量归一化时,可以采用如下公式:选定网页的归一化的网页质量=(选定网页的网页质量-网页全集中网页质量最小的网页对应的网页质量)/(网页全集中最大网页质量的网页对应的网页质量-网页全集中最小网页质量的网页对应的网页质量)。
在计算选定网页的综合得分,可采用如下公式:选定网页的综合得分=选定网页的归一化的网页质量(选定网页的文本得分*选定网页的转义惩罚得分+设定浮点数),设定浮点数优选为0.01f。
具体的,在根据选定网页的文本得分计算选定网页的转义惩罚得分时,可以首先判断选定网页的文本得分是否大于第一设定数值;若选定网页的文本得分大于或等于第一设定数值,则确定选定网页的转义惩罚得分等于第二设定数值;若选定网页的文本得分小于第一设定数值,则确定选定网页的转义惩罚得分等于选定网页的文本得分与第一设定数值的比值。
第一设定数值和第二设定数值可以根据实际需要进行设定,下面以第一设定数值为130,第二设定数值为1为例进行说明,若选定网页的文本得分大于或等于130,则转义惩罚得分等于1,否则,转义惩罚得分等于选定网页的文本得分除以130。
基于上述网页质量模型的建立方法的实施例,本发明实施例还提供一种网页质量评价方法,包括:
将网页全集中选定网页的选定质量特征代入根据上述方法建立的网页质量模型,得到所述选定网页的网页质量;
计算所述选定网页的文本得分;
根据所述选定网页的网页质量和文本得分计算所述选定网页的综合得分;
根据所述选定网页的综合得分的大小实现对网页质量的评价。
优选的,计算所述选定网页的文本得分,包括:
获取所述选定网页对应的搜索请求;
计算所述选定网页的文本内容与所述选定网页对应的搜索请求之间的匹配度;
将所述匹配度确定为所述选定网页的文本得分。
优选的,根据所述选定网页的网页质量和文本得分计算所述选定网页的综合得分,包括:
将所述选定网页的网页质量归一化;
根据所述选定网页的文本得分计算所述选定网页的转义惩罚得分;
将所述选定网页的转义惩罚得分与文本得分相乘后再加上设定浮点数,将得到的和值与所述选定网页归一化的网页质量相乘,得到所述选定网页的综合得分。
优选的,根据所述选定网页的文本得分计算所述选定网页的转义惩罚得分,包括:
判断所述选定网页的文本得分是否大于第一设定数值;
若所述选定网页的文本得分大于或等于所述第一设定数值,则确定所述选定网页的转义惩罚得分等于第二设定数值;
若所述选定网页的文本得分小于所述第一设定数值,则确定所述选定网页的转义惩罚得分等于所述选定网页的文本得分与所述第一设定数值的比值。
优选的,将所述选定网页的网页质量归一化的方法包括:选定网页的归一化的网页质量=(选定网页的网页质量-网页全集中网页质量最小的网页对应的网页质量)/(网页全集中最大网页质量的网页对应的网页质量-网页全集中最小网页质量的网页对应的网页质量)。
优选的,所述选定质量特征至少包括用户行为维度特征、网页维度特征和第三方评价特征三者之一或组合。
优选的,还包括:根据所述选定网页的综合得分和网页质量来修正已有网页排序模型,得到新的网页排序模型以对搜索结果进行排序。
以上介绍了建立网页质量模型的方法和网页质量评价方法,该方法相对于现有技术中人工总结网页质量模型网页质量评价的方法具有以下优势:
1、现有技术中在总结人工规则时参考的样本有限,导致人工规则容易不全面、泛化性差,而本发明中的网页质量模型建立基于搜索引擎日志中包括的网页,每个网页作为一个样本,本发明的方法采用千万量级的样本,远远高于总结人工规则时参考的几百、几千个样本,样本更加的全面,泛化性好。
2、由于问题的复杂性和人的局限性,只能总结出少量人工规则,可能会总结出错误的规则,或者遗漏一些关键的规则,而本发明采用机器学习的方法,通过误差最小化的原理提取出几千个选定质量特征,每个选定质量特征可以作为一个规则,这样就可以生成几千棵规则树,从而可以保证在已有的千万量级样本的基础上得到的网页质量模型的误差最小,最大化的避免错误规则的产生,大大降低了遗漏关键规则的风险。
3、人工规则很可能会出现规则制定者的标准和用户判断网页质量的标准 不一致,导致影响用户体验的情况,而本发明通过挖掘选定用户行为指标建立网页质量模型,是以用户的标准来判断网页质量的好坏,因而保证了网页质量标准和用户心中的标准尽可能的一致,减少两者标准不统一的问题。
4、单独把网页质量加入网页排序模型会减弱网页排序模型的排序效果,而本发明将网页的网页质量与文本得分拟合成一个网页的综合得分,然后根据网页的综合得分和网页质量修正已有网页排序模型,由于考虑网页质量和综合得分,而只有相关性好且质量好的综合得分才会高,因此网页排序模型采用综合得分这个特征后,可以提升网页上层排序模型的排序效果。
综合上述4点可以看出,本发明能有效提高网页排序,减少死链、低质、作弊网页被展现给用户的几率,提高高质量网页被展现给用户的几率。下面列举一个实例来对比采用已有网页排序模型和新的网页排序模型进行网页搜索时得到的搜索结果,如图2所示,关键词为“新沂市第三中学贴吧”,左边为采用新的网页排序模型得到的搜索结果,右边的为采用已有网页排序模型得到的搜索结果,方框中的网页为最佳结果,从图2中可以看出,采用新的网页排序模型对网页排序时,该最佳结果的排序从第二提升到第一,最佳结果的排序位置得到提升,用户采用新的网页排序模型得到的排序结果更容易找到最佳结果,从而提升用户体验。
基于同一发明构思,本发明实施例提供一种网页质量模型的建立装置,该装置可以设置在服务器中,结构如图3所示,包括网页质量计算单元31、选定质量特征提取单元32和网页质量模型建立单元33;其中,
上述网页质量计算单元31,用于从搜索引擎日志中挖掘出搜索引擎日志包括的每个网页的选定用户行为指标,根据挖掘出的每个网页的选定用户行为指标计算对应网页的网页质量;
上述选定质量特征提取单元32,用于从搜索引擎日志中提取搜索引擎日志包括的每个网页的选定质量特征;
上述网页质量模型建立单元33,用于根据搜索引擎日志包括的每个网页的网页质量和选定质量特征建立网页质量模型。
该方案中,基于大量的搜索引擎日志来自动实现建立网页质量模型,相对于现有技术中人工总结的方式,建立的网页质量模型的准确性较高,计算的网页质量准确性较高,从而确保网页排序结果的准确性和用户体验。
具体的,选定用户行为指标至少包括总点击量、长点击量、最后点击量和导航点击量四者之一或组合,其中:
总点击量是网页被点击的次数,长点击量是网页被点击时间后、在该网页上的停留超过第一设定时长的次数,最后点击量是网页在搜索结果中被最后点击的次数,导航点击量是网页在搜索结果中被唯一点击的次数。
具体的,上述网页质量计算单元31具体包括用户行为比值计算子单元和网页质量确定子单元;其中,
用户行为比值计算子单元,用于针对每个网页,执行:根据当前网页的总点击量、长点击量、最后点击量和导航点击量计算当前网页的用户行为比值;
网页质量确定子单元,用于根据用户行为比值的范围与网页质量的对应关系,确定当前网页的用户行为比值对应的网页质量。
具体的,用户行为比值计算子单元具体包括第一和值计算子单元、第二和值计算子单元和用户行为比值确定子单元;其中,
第一和值计算子单元,用于计算当前网页的最后点击量、导航点击量和长点击量之和,得到第一和值;
第二和值计算子单元,用于计算当前网页的总点击量和第一经验值之和,得到第二和值;
用户行为比值确定子单元,用于计算第一和值和第二和值的比值,将比值确定为当前网页的用户行为比值。
可选的,上述网页质量模型建立装置还包括网页过滤单元,用于:
根据网页质量和选定用户行为指标过滤搜索引擎日志包括的网页;
此时,所述网页质量模型建立单元用于根据过滤后的搜索引擎日志包括的网页的网页质量和选定质量特征建立网页质量模型。
具体的,网页过滤单元具体包括总点击量获取子单元和网页过滤子单元;其中,
总点击量获取子单元,用于获取搜索引擎日志包括的每个网页的总点击量;
网页过滤子单元,用于删除总点击量小于或等于第一设定次数的网页;对于总点击量大于第一设定次数且小于或等于第二设定次数的网页,保留网页质量最小的网页,删除掉除保留的网页之外的网页;对于总点击量大于第二设定次数的网页,保留网页质量最大的网页,删除掉除保留的网页之外的网页。
具体的,选定质量特征至少包括用户行为维度特征、网页维度特征和第三方评价特征三者之一或组合。
可选的,上述网页质量建立装置还包括选定质量特征代入单元、文本得分计算单元、综合得分计算单元和网页排序模型修正单元;其中,
选定质量特征代入单元,用于将网页全集中选定网页的选定质量特征代入网页质量模型,得到选定网页的网页质量;
文本得分计算单元,用于计算选定网页的文本得分;
综合得分计算单元,用于根据选定网页的网页质量和文本得分计算选定网页的综合得分;
网页排序模型修正单元,用于根据选定网页的综合得分和网页质量修正已有网页排序模型,得到新的网页排序模型。
具体的,文本得分计算单元具体包括搜索请求获取子单元、匹配度计算子单元和文本得分确定子单元;其中,
搜索请求获取子单元,用于获取选定网页对应的搜索请求;
匹配度计算子单元,用于计算选定网页的文本内容与选定网页对应的搜索请求之间的匹配度;
文本得分确定子单元,用于将匹配度确定为选定网页的文本得分。
具体的,综合得分计算单元具体包括归一化子单元、转义惩罚得分计算子单元和综合得分计算子单元;其中,
归一化子单元,用于将选定网页的网页质量归一化;
转义惩罚得分计算子单元,用于根据选定网页的文本得分计算选定网页的转义惩罚得分;
综合得分计算子单元,用于将选定网页的转义惩罚得分与文本得分相乘后再加上设定浮点数,将得到的和值与选定网页归一化的网页质量相乘,得到选定网页的综合得分。
具体的,转义惩罚得分计算子单元具体包括文本得分判断子单元和转义惩罚得分确定子单元;其中,
文本得分判断子单元,用于判断选定网页的文本得分是否大于第一设定数值;
转义惩罚得分确定子单元,用于若选定网页的文本得分大于或等于第一设定数值,则确定选定网页的转义惩罚得分等于第二设定数值;若选定网页的文本得分小于第一设定数值,则确定选定网页的转义惩罚得分等于选定网页的文本得分与第一设定数值的比值。
基于上述一种网页质量模型的建立装置的实施例,本发明实施例还提供一种网页质量评价装置,包括:上述的网页质量模型的建立装置、选定质量特征代入单元、文本得分计算单元、综合得分计算单元和评价单元;其中,
所述选定质量特征代入单元,用于将网页全集中选定网页的选定质量特征代入所述网页质量模型,得到所述选定网页的网页质量;
所述文本得分计算单元,用于计算所述选定网页的文本得分;
所述综合得分计算单元,用于根据所述选定网页的网页质量和文本得分计算所述选定网页的综合得分;
所述评价单元,用于根据所述选定网页的综合得分的大小对网页质量进行评价。
优选的,所述文本得分计算单元具体包括搜索请求获取子单元、匹配度计算子单元和文本得分确定子单元;其中,
所述搜索请求获取子单元,用于获取所述选定网页对应的搜索请求;
所述匹配度计算子单元,用于计算所述选定网页的文本内容与所述选定网页对应的搜索请求之间的匹配度;
所述文本得分确定子单元,用于将所述匹配度确定为所述选定网页的文本得分。
优选的,所述综合得分计算单元具体包括归一化子单元、转义惩罚得分计算子单元和综合得分计算子单元;其中,
所述归一化子单元,用于将所述选定网页的网页质量归一化;
所述转义惩罚得分计算子单元,用于根据所述选定网页的文本得分计算所述选定网页的转义惩罚得分;
所述综合得分计算子单元,用于将所述选定网页的转义惩罚得分与文本得 分相乘后再加上设定浮点数,将得到的和值与所述选定网页归一化的网页质量相乘,得到所述选定网页的综合得分。
优选的,所述转义惩罚得分计算子单元具体包括文本得分判断子单元和转义惩罚得分确定子单元;其中,
所述文本得分判断子单元,用于判断所述选定网页的文本得分是否大于第一设定数值;
所述转义惩罚得分确定子单元,用于若所述选定网页的文本得分大于或等于所述第一设定数值,则确定所述选定网页的转义惩罚得分等于第二设定数值;若所述选定网页的文本得分小于所述第一设定数值,则确定所述选定网页的转义惩罚得分等于所述选定网页的文本得分与所述第一设定数值的比值。
优选的,所述选定质量特征至少包括用户行为维度特征、网页维度特征和第三方评价特征三者之一或组合。
优选的,还包括:网页排序模型修正单元,用于根据所述选定网页的综合得分和网页质量修正已有网页排序模型,得到新的网页排序模型。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
换句话说,此外,本发明还公开一种终端设备,包括:存储器,用于存储用来执行如图1中所述的方法的计算机程序指令;处理器,与所述存储器耦合,该处理器被配置为执行所述存储器中存储的计算机程序指令。
此外,根据本发明的方法还可以被实现为由移动终端中的处理器(比如CPU)执行的计算机程序,并且存储在移动终端的存储器中。在该计算机程序被处理器执行时,处理器执行本发明的方法中限定的上述功能。
此外,根据本发明的方法还可以实现为一种计算机程序产品,该计算机程序产品包括计算机可读介质,在该计算机可读介质上存储有用于执行本发明的方法中限定的上述功能的计算机程序。
本领域技术人员还将明白的是,结合这里的公开所描述的各种示例性逻辑 块、模块、电路和算法步骤可以被实现为电子硬件、计算机软件或两者的组合。为了清楚地说明硬件和软件的这种可互换性,已经就各种示意性组件、方块、模块、电路和步骤的功能对其进行了一般性的描述。这种功能是被实现为软件还是被实现为硬件取决于具体应用以及施加给整个系统的设计约束。本领域技术人员可以针对每种具体应用以各种方式来实现所述的功能,但是这种实现决定不应被解释为导致脱离本发明的范围。
尽管已描述了本发明的可选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括可选实施例以及落入本发明范围的所有变更和修改。
显然,本领域的技术人员可以对本发明实施例进行各种改动和变型而不脱离本发明实施例的精神和范围。这样,倘若本发明实施例的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (27)

  1. 一种网页质量模型的建立方法,其特征在于,包括:
    从搜索引擎日志中挖掘出所述搜索引擎日志包括的每个网页的选定用户行为指标,根据挖掘出的每个网页的选定用户行为指标计算对应网页的网页质量;
    从所述搜索引擎日志中提取所述搜索引擎日志包括的每个网页的选定质量特征;
    根据所述搜索引擎日志包括的每个网页的网页质量和选定质量特征建立网页质量模型。
  2. 如权利要求1所述的方法,其特征在于,所述选定用户行为指标至少包括总点击量、长点击量、最后点击量和导航点击量四者之一或组合,其中:
    所述总点击量是网页被点击的次数,所述长点击量是网页被点击后、在该网页上的停留时间超过第一设定时长的次数,所述最后点击量是网页在搜索结果中被最后点击的次数,所述导航点击量是网页在搜索结果中被唯一点击的次数。
  3. 如权利要求2所述的方法,其特征在于,根据挖掘出的每个网页的选定用户行为指标计算对应网页的网页质量,包括:
    针对每个网页,执行:
    根据当前网页的总点击量、长点击量、最后点击量和导航点击量计算所述当前网页的用户行为比值;
    根据用户行为比值的范围与网页质量的对应关系,确定所述当前网页的用户行为比值对应的网页质量。
  4. 如权利要求3所述的方法,其特征在于,根据当前网页的总点击量、长点击量、最后点击量和导航点击量计算所述当前网页的用户行为比值,包括:
    计算当前网页的最后点击量、导航点击量和长点击量之和,得到第一和值;
    计算所述当前网页的总点击量和第一经验值之和,得到第二和值;
    计算所述第一和值和所述第二和值的比值,将所述比值确定为所述当前网页的用户行为比值。
  5. 如权利要求1所述的方法,其特征在于,还包括:
    在根据所述搜索引擎日志包括的每个网页的网页质量和选定质量特征建立网页质量模型步骤之前,先根据网页质量和选定用户行为指标过滤所述搜索引擎日志包括的网页,然后再根据过滤后的所述搜索引擎日志包括的网页的网页质量和选定质量特征建立网页质量模型。
  6. 如权利要求5所述的方法,其特征在于,根据网页质量和选定用户行为指标过滤所述搜索引擎日志包括的网页,包括:
    获取所述搜索引擎日志包括的每个网页的总点击量;
    删除总点击量小于或等于第一设定次数的网页;
    对于总点击量大于所述第一设定次数且小于或等于第二设定次数的网页, 保留网页质量最小的网页,删除掉除保留的网页之外的网页;
    对于总点击量大于所述第二设定次数的网页,保留网页质量最大的网页,删除掉除保留的网页之外的网页。
  7. 如权利要求1所述的方法,其特征在于,所述选定质量特征至少包括用户行为维度特征、网页维度特征和第三方评价特征三者之一或组合。
  8. 一种网页质量评价方法,其特征在于,包括:
    将网页全集中选定网页的选定质量特征代入根据权利要求1-7之一所述方法建立的网页质量模型,得到所述选定网页的网页质量;
    计算所述选定网页的文本得分;
    根据所述选定网页的网页质量和文本得分计算所述选定网页的综合得分;
    根据所述选定网页的综合得分的大小实现对网页质量的评价。
  9. 如权利要求8所述的方法,其特征在于,计算所述选定网页的文本得分,包括:
    获取所述选定网页对应的搜索请求;
    计算所述选定网页的文本内容与所述选定网页对应的搜索请求之间的匹配度;
    将所述匹配度确定为所述选定网页的文本得分。
  10. 如权利要求8所述的方法,其特征在于,根据所述选定网页的网页质量和文本得分计算所述选定网页的综合得分,包括:
    将所述选定网页的网页质量归一化;
    根据所述选定网页的文本得分计算所述选定网页的转义惩罚得分;
    将所述选定网页的转义惩罚得分与文本得分相乘后再加上设定浮点数,将得到的和值与所述选定网页归一化的网页质量相乘,得到所述选定网页的综合得分。
  11. 如权利要求10所述的方法,其特征在于,根据所述选定网页的文本得分计算所述选定网页的转义惩罚得分,包括:
    判断所述选定网页的文本得分是否大于第一设定数值;
    若所述选定网页的文本得分大于或等于所述第一设定数值,则确定所述选定网页的转义惩罚得分等于第二设定数值;
    若所述选定网页的文本得分小于所述第一设定数值,则确定所述选定网页的转义惩罚得分等于所述选定网页的文本得分与所述第一设定数值的比值。
  12. 如权利要求10所述的方法,其特征在于,将所述选定网页的网页质量归一化的方法包括:选定网页的归一化的网页质量=(选定网页的网页质量-网页全集中网页质量最小的网页对应的网页质量)/(网页全集中最大网页质量的网页对应的网页质量-网页全集中最小网页质量的网页对应的网页质量)。
  13. 如权利要求8所述的方法,其特征在于,所述选定质量特征至少包括用户行为维度特征、网页维度特征和第三方评价特征三者之一或组合。
  14. 如权利要求8-13之一所述的方法,其特征在于,还包括:根据所述选定网页的综合得分和网页质量来修正已有网页排序模型,得到新的网页排序模 型以对搜索结果进行排序。
  15. 一种网页质量模型的建立装置,其特征在于,包括:
    网页质量计算单元,用于从搜索引擎日志中挖掘出所述搜索引擎日志包括的每个网页的选定用户行为指标,根据挖掘出的每个网页的选定用户行为指标计算对应网页的网页质量;
    选定质量特征提取单元,用于从所述搜索引擎日志中提取所述搜索引擎日志包括的每个网页的选定质量特征;
    网页质量模型建立单元,用于根据所述搜索引擎日志包括的每个网页的网页质量和选定质量特征建立网页质量模型。
  16. 如权利要求15所述的装置,其特征在于,所述选定用户行为指标至少包括总点击量、长点击量、最后点击量和导航点击量四者之一或组合,其中:
    所述总点击量是网页被点击的次数,所述长点击量是网页被点击后、在该网页上的停留时间超过第一设定时长的次数,所述最后点击量是网页在搜索结果中被最后点击的次数,所述导航点击量是网页在搜索结果中被唯一点击的次数。
  17. 如权利要求16所述的装置,其特征在于,所述网页质量计算单元具体包括用户行为比值计算子单元和网页质量确定子单元;其中,
    所述用户行为比值计算子单元,用于针对每个网页,执行:根据当前网页的总点击量、长点击量、最后点击量和导航点击量计算所述当前网页的用户行为比值;
    所述网页质量确定子单元,用于根据用户行为比值的范围与网页质量的对应关系,确定所述当前网页的用户行为比值对应的网页质量。
  18. 如权利要求17所述的装置,其特征在于,所述用户行为比值计算子单元具体包括第一和值计算子单元、第二和值计算子单元和用户行为比值确定子单元;其中,
    所述第一和值计算子单元,用于计算当前网页的最后点击量、导航点击量和长点击量之和,得到第一和值;
    所述第二和值计算子单元,用于计算所述当前网页的总点击量和第一经验值之和,得到第二和值;
    所述用户行为比值确定子单元,用于计算所述第一和值和所述第二和值的比值,将所述比值确定为所述当前网页的用户行为比值。
  19. 如权利要求15所述的装置,其特征在于,还包括网页过滤单元,用于:
    根据网页质量和选定用户行为指标过滤所述搜索引擎日志包括的网页;
    此时,所述网页质量模型建立单元用于根据过滤后的所述搜索引擎日志包括的网页的网页质量和选定质量特征建立网页质量模型。
  20. 如权利要求19所述的装置,其特征在于,所述网页过滤单元具体包括总点击量获取子单元和网页过滤子单元;其中,
    所述总点击量获取子单元,用于获取所述搜索引擎日志包括的每个网页的 总点击量;
    所述网页过滤子单元,用于删除总点击量小于或等于第一设定次数的网页;对于总点击量大于所述第一设定次数且小于或等于第二设定次数的网页,保留网页质量最小的网页,删除掉除保留的网页之外的网页;对于总点击量大于所述第二设定次数的网页,保留网页质量最大的网页,删除掉除保留的网页之外的网页。
  21. 如权利要求15所述的装置,其特征在于,所述选定质量特征至少包括用户行为维度特征、网页维度特征和第三方评价特征三者之一或组合。
  22. 一种网页质量评价装置,其特征在于,包括:根据权利要求15-21之一所述的网页质量模型的建立装置、选定质量特征代入单元、文本得分计算单元、综合得分计算单元和评价单元;其中,
    所述选定质量特征代入单元,用于将网页全集中选定网页的选定质量特征代入所述网页质量模型,得到所述选定网页的网页质量;
    所述文本得分计算单元,用于计算所述选定网页的文本得分;
    所述综合得分计算单元,用于根据所述选定网页的网页质量和文本得分计算所述选定网页的综合得分;
    所述评价单元,用于根据所述选定网页的综合得分的大小对网页质量进行评价。
  23. 如权利要求22所述的装置,其特征在于,所述文本得分计算单元具体包括搜索请求获取子单元、匹配度计算子单元和文本得分确定子单元;其中,
    所述搜索请求获取子单元,用于获取所述选定网页对应的搜索请求;
    所述匹配度计算子单元,用于计算所述选定网页的文本内容与所述选定网页对应的搜索请求之间的匹配度;
    所述文本得分确定子单元,用于将所述匹配度确定为所述选定网页的文本得分。
  24. 如权利要求22所述的装置,其特征在于,所述综合得分计算单元具体包括归一化子单元、转义惩罚得分计算子单元和综合得分计算子单元;其中,
    所述归一化子单元,用于将所述选定网页的网页质量归一化;
    所述转义惩罚得分计算子单元,用于根据所述选定网页的文本得分计算所述选定网页的转义惩罚得分;
    所述综合得分计算子单元,用于将所述选定网页的转义惩罚得分与文本得分相乘后再加上设定浮点数,将得到的和值与所述选定网页归一化的网页质量相乘,得到所述选定网页的综合得分。
  25. 如权利要求24所述的装置,其特征在于,所述转义惩罚得分计算子单元具体包括文本得分判断子单元和转义惩罚得分确定子单元;其中,
    所述文本得分判断子单元,用于判断所述选定网页的文本得分是否大于第一设定数值;
    所述转义惩罚得分确定子单元,用于若所述选定网页的文本得分大于或等于所述第一设定数值,则确定所述选定网页的转义惩罚得分等于第二设定数 值;若所述选定网页的文本得分小于所述第一设定数值,则确定所述选定网页的转义惩罚得分等于所述选定网页的文本得分与所述第一设定数值的比值。
  26. 如权利要求21所述的装置,其特征在于,所述选定质量特征至少包括用户行为维度特征、网页维度特征和第三方评价特征三者之一或组合。
  27. 如权利要求21所述的装置,其特征在于,还包括:网页排序模型修正单元,用于根据所述选定网页的综合得分和网页质量修正已有网页排序模型,得到新的网页排序模型。
PCT/CN2015/096036 2015-01-21 2015-11-30 网页质量模型的建立方法及装置 WO2016115944A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
RU2017129409A RU2680746C2 (ru) 2015-01-21 2015-11-30 Способ и устройство для создания модели качества веб-страницы
US15/653,780 US10891350B2 (en) 2015-01-21 2017-07-19 Method and device for establishing webpage quality model

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510030753.1A CN104615680B (zh) 2015-01-21 2015-01-21 网页质量模型的建立方法及装置
CN201510030753.1 2015-01-21

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/653,780 Continuation US10891350B2 (en) 2015-01-21 2017-07-19 Method and device for establishing webpage quality model

Publications (1)

Publication Number Publication Date
WO2016115944A1 true WO2016115944A1 (zh) 2016-07-28

Family

ID=53150122

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/096036 WO2016115944A1 (zh) 2015-01-21 2015-11-30 网页质量模型的建立方法及装置

Country Status (4)

Country Link
US (1) US10891350B2 (zh)
CN (1) CN104615680B (zh)
RU (1) RU2680746C2 (zh)
WO (1) WO2016115944A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11682029B2 (en) 2018-03-23 2023-06-20 Lexisnexis, A Division Of Reed Elsevier Inc. Systems and methods for scoring user reactions to a software program

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615680B (zh) * 2015-01-21 2016-11-02 广州神马移动信息科技有限公司 网页质量模型的建立方法及装置
CN106897301A (zh) * 2015-12-18 2017-06-27 阿里巴巴集团控股有限公司 一种搜索质量的评测方法、装置及电子设备
CN106777132A (zh) * 2016-12-18 2017-05-31 深圳市辣妈帮科技有限公司 数据处理方法及装置
CN106886554A (zh) * 2016-12-27 2017-06-23 苏州思杰马克丁软件有限公司 一种文章质量的确定方法及装置
CN110928537B (zh) * 2018-09-19 2023-08-11 百度在线网络技术(北京)有限公司 模型评测方法、装置、设备及计算机可读介质
CN111597236A (zh) * 2020-05-22 2020-08-28 中国工商银行股份有限公司 制度信息处理方法、装置和计算机系统
CN111767444B (zh) * 2020-06-22 2024-04-09 北京百度网讯科技有限公司 页面特征构建方法、装置、设备和存储介质
CN113806660B (zh) * 2021-09-17 2024-04-26 北京百度网讯科技有限公司 数据评估方法、训练方法、装置、电子设备以及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101178728A (zh) * 2007-11-21 2008-05-14 北京搜狗科技发展有限公司 一种网址导航的方法和系统
US20080114624A1 (en) * 2006-11-13 2008-05-15 Microsoft Corporation Click-fraud protector
CN102486774A (zh) * 2010-12-01 2012-06-06 腾讯科技(深圳)有限公司 一种网络页面的质量获取方法、系统及服务器
CN103544169A (zh) * 2012-07-12 2014-01-29 百度在线网络技术(北京)有限公司 页面调整方法及装置
CN104615680A (zh) * 2015-01-21 2015-05-13 广州神马移动信息科技有限公司 网页质量模型的建立方法及装置

Family Cites Families (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001084351A2 (en) * 2000-04-28 2001-11-08 Inceptor, Inc. Method of and system for enhanced web page delivery
US20040006621A1 (en) * 2002-06-27 2004-01-08 Bellinson Craig Adam Content filtering for web browsing
US9223868B2 (en) * 2004-06-28 2015-12-29 Google Inc. Deriving and using interaction profiles
US20070038608A1 (en) 2005-08-10 2007-02-15 Anjun Chen Computer search system for improved web page ranking and presentation
US7483894B2 (en) 2006-06-07 2009-01-27 Platformation Technologies, Inc Methods and apparatus for entity search
US7996393B1 (en) 2006-09-29 2011-08-09 Google Inc. Keywords associated with document categories
US8938463B1 (en) * 2007-03-12 2015-01-20 Google Inc. Modifying search result ranking based on implicit user feedback and a model of presentation bias
CN100507920C (zh) * 2007-05-25 2009-07-01 清华大学 一种基于用户行为信息的搜索引擎检索结果重排序方法
US8429750B2 (en) * 2007-08-29 2013-04-23 Enpulz, L.L.C. Search engine with webpage rating feedback based Internet search operation
US8402031B2 (en) 2008-01-11 2013-03-19 Microsoft Corporation Determining entity popularity using search queries
US8484179B2 (en) 2008-12-08 2013-07-09 Microsoft Corporation On-demand search result details
US8639682B2 (en) 2008-12-29 2014-01-28 Accenture Global Services Limited Entity assessment and ranking
US8458171B2 (en) 2009-01-30 2013-06-04 Google Inc. Identifying query aspects
US20100293179A1 (en) 2009-05-14 2010-11-18 Microsoft Corporation Identifying synonyms of entities using web search
US8615514B1 (en) * 2010-02-03 2013-12-24 Google Inc. Evaluating website properties by partitioning user feedback
CN102654875B (zh) * 2011-03-04 2014-05-21 北京百度网讯科技有限公司 一种自动处理网页文本的内链的方法及装置
US8589399B1 (en) 2011-03-25 2013-11-19 Google Inc. Assigning terms of interest to an entity
US8843477B1 (en) 2011-10-31 2014-09-23 Google Inc. Onsite and offsite search ranking results
US9251249B2 (en) 2011-12-12 2016-02-02 Microsoft Technology Licensing, Llc Entity summarization and comparison
US9443021B2 (en) 2011-12-30 2016-09-13 Microsoft Technology Licensing, Llc Entity based search and resolution
US9116994B2 (en) 2012-01-09 2015-08-25 Brightedge Technologies, Inc. Search engine optimization for category specific search results
CN103577416B (zh) 2012-07-20 2017-09-22 阿里巴巴集团控股有限公司 扩展查询方法及系统
US9047278B1 (en) 2012-11-09 2015-06-02 Google Inc. Identifying and ranking attributes of entities
CN103544257B (zh) * 2013-10-15 2017-01-18 北京国双科技有限公司 网页质量检测方法和装置
CN106716402B (zh) 2014-05-12 2020-08-11 销售力网络公司 以实体为中心的知识发现

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080114624A1 (en) * 2006-11-13 2008-05-15 Microsoft Corporation Click-fraud protector
CN101178728A (zh) * 2007-11-21 2008-05-14 北京搜狗科技发展有限公司 一种网址导航的方法和系统
CN102486774A (zh) * 2010-12-01 2012-06-06 腾讯科技(深圳)有限公司 一种网络页面的质量获取方法、系统及服务器
CN103544169A (zh) * 2012-07-12 2014-01-29 百度在线网络技术(北京)有限公司 页面调整方法及装置
CN104615680A (zh) * 2015-01-21 2015-05-13 广州神马移动信息科技有限公司 网页质量模型的建立方法及装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11682029B2 (en) 2018-03-23 2023-06-20 Lexisnexis, A Division Of Reed Elsevier Inc. Systems and methods for scoring user reactions to a software program

Also Published As

Publication number Publication date
US10891350B2 (en) 2021-01-12
CN104615680B (zh) 2016-11-02
RU2017129409A (ru) 2019-02-21
RU2680746C2 (ru) 2019-02-26
US20170316109A1 (en) 2017-11-02
RU2017129409A3 (zh) 2019-02-21
CN104615680A (zh) 2015-05-13

Similar Documents

Publication Publication Date Title
WO2016115944A1 (zh) 网页质量模型的建立方法及装置
US9317550B2 (en) Query expansion
US8812493B2 (en) Search results ranking using editing distance and document information
US9898554B2 (en) Implicit question query identification
CN110390006B (zh) 问答语料生成方法、装置和计算机可读存储介质
CN110162593A (zh) 一种搜索结果处理、相似度模型训练方法及装置
CN105138558B (zh) 基于用户访问内容的实时个性化信息采集方法
US8412726B2 (en) Related links recommendation
US9547690B2 (en) Query rewriting using session information
CN106570144A (zh) 推荐信息的方法和装置
US20180150466A1 (en) System and method for ranking search results
CN106933947A (zh) 一种搜索方法及装置、电子设备
CN110019689A (zh) 职位匹配方法和职位匹配系统
US20120143895A1 (en) Query pattern generation for answers coverage expansion
CN103365910A (zh) 一种信息检索的方法和系统
CN104462399B (zh) 搜索结果的处理方法及装置
CN104281565B (zh) 语义词典构建方法和装置
CN106844640A (zh) 一种网页数据分析处理方法
CN103425650A (zh) 推荐搜索方法和系统
CN110209721A (zh) 判决文书调取方法、装置、服务器及存储介质
CN112559895A (zh) 一种数据处理方法、装置、电子设备及存储介质
CN105389328B (zh) 一种大规模开源软件搜索排序优化方法
CN109815337B (zh) 确定文章类别的方法及装置
CN105677664A (zh) 基于网络搜索的紧密度确定方法及装置
CN104281693A (zh) 一种语义搜索方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15878612

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017129409

Country of ref document: RU

Kind code of ref document: A

122 Ep: pct application non-entry in european phase

Ref document number: 15878612

Country of ref document: EP

Kind code of ref document: A1