CN111061942B - Search ranking monitoring method and system - Google Patents

Search ranking monitoring method and system Download PDF

Info

Publication number
CN111061942B
CN111061942B CN201811210424.5A CN201811210424A CN111061942B CN 111061942 B CN111061942 B CN 111061942B CN 201811210424 A CN201811210424 A CN 201811210424A CN 111061942 B CN111061942 B CN 111061942B
Authority
CN
China
Prior art keywords
index
search
monitoring
data
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811210424.5A
Other languages
Chinese (zh)
Other versions
CN111061942A (en
Inventor
张松
侯守虎
王鸣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201811210424.5A priority Critical patent/CN111061942B/en
Publication of CN111061942A publication Critical patent/CN111061942A/en
Application granted granted Critical
Publication of CN111061942B publication Critical patent/CN111061942B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a search sequencing monitoring method and a search sequencing monitoring system. The method comprises the following steps: obtaining a current online search sorting result; calculating a current search evaluation index based on the current online search sorting result; comparing the calculated current search evaluation index with a previously acquired reference search evaluation index to obtain a search evaluation index; and generating a monitoring result at least based on the search evaluation index. Therefore, the sequencing effect of the search system is effectively and accurately monitored. Signal dependence, model data and service log dimensions can also be introduced to supplement and cooperate with the on-line effect dimensions, so that more comprehensive and sensitive search system monitoring is realized.

Description

Search ranking monitoring method and system
Technical Field
The invention relates to the field of search, in particular to a search sequencing monitoring method and a search sequencing monitoring system.
Background
With the development of information technology and the popularization of terminal devices (especially smart phones), various search ranking systems are increasingly connected with the daily life of people. For example, one would use a conventional search engine interface or browser APP with built-in search engine portals to enter keywords and selectively click and read the ranking results returned by the search ranking system. Various content or service providers are also actively feeding the user various kinds of information streams for the user to read (i.e., feed search). The order of selection and presentation of information in the information stream is also typically determined by the search ranking system.
It is not exaggeratedly large that the quality of the information presentation provided by the search ranking system determines to a large extent the user experience, even the user's stay, and for this reason, effective monitoring and evaluation of the search ranking effect is a problem that must be solved by each large content or service provider.
Disclosure of Invention
In order to solve at least one of the above problems, the present application proposes a search ranking monitoring method and system, which can effectively monitor the ranking effect of a search system based on the comparison between the current ranking result and the previous reference result. The monitoring scheme may further involve signal dependence, model data, and/or traffic log signals to provide full monitoring of the search ranking system from multiple dimensions. In this context, the signal dependency indicators may include timeliness information in order to cope with highly timeliness search scenarios that are increasingly common in internet search services.
According to an aspect of the present invention, there is provided a search ranking monitoring method, including: obtaining a current online search sorting result; calculating a current search evaluation index based on the current online search ranking result; comparing the calculated current search evaluation index with a previously acquired reference search evaluation index to obtain a search evaluation index; and generating a monitoring result at least based on the search evaluation index. Therefore, by introducing comparison between the current real data and the reference data, the sequencing effect can be evaluated and monitored more accurately and timely.
Optionally, the search evaluation index may include at least one of: normalized loss cumulative gain (NDCG); positive inverse sequential Proportion (PARI); expected Reciprocal Rank (ERR); and a harmonic value of accuracy and recall (F-SCORE). Therefore, the search index can be digitalized, and the health condition of the search system can be conveniently judged according to the change trend.
Optionally, the reference search evaluation index acquired before may include at least one of: manually annotated index data; and calculating the obtained data based on the real click data of the online user.
Optionally, the search ranking monitoring method may further include: counting and acquiring signal dependence indexes, and generating monitoring results based on at least the search evaluation indexes comprises: generating a monitoring result based on the signal dependence indicator. The introduction of the numerical index depended by the search algorithm can further perfect the monitoring of the search system.
Optionally, the counting and obtaining the signal dependence index may include: acquiring and classifying a dependent signal including a time-dependent signal and/or a PV (Page view) cutoff signal via an interface; and counting the classified dependent signals to obtain the signal dependent index. This enables a search in a (today very common) time-sensitive scenario to be evaluated more accurately, in particular by the introduction of a time-sensitive signal.
Optionally, generating the monitoring result at least based on the search evaluation index may further include: the signal dependence index abnormity causes the current search evaluation index to slide down; and generating an alert based at least on the current search evaluation index glide. Therefore, through multi-dimensional signal linkage, more sensitive judgment on system abnormity is realized.
Optionally, the search ranking monitoring method may further include: calculating a model data index based on a search algorithm model, and generating a monitoring result based on at least the search evaluation index further comprises: and generating a monitoring result based on the change trend of the model data index. The introduction of intermediate data helps to locate the cause of the anomaly.
Optionally, generating the monitoring result based on the trend of change of the model data index may include: calculating the relative entropy between the model data indexes in the current unit time and the previous unit time; and generating a monitoring result based on the relative entropy and/or generating a monitoring result based on the quantile change trend of the model data index. The introduction of quantiles and relative entropy can more closely reflect system changes.
Optionally, generating the monitoring result based on the trend of change of the model data index may include: the statistical signal dependence index abnormity results in obvious change of the model data index; and generating an alert based at least on the apparent change in the model data indicator. Similarly, through multi-dimensional signal linkage, more sensitive judgment on system abnormity is realized.
Optionally, the search ranking monitoring method may further include: extracting and counting service log indexes based on the service logs, and generating a monitoring result further comprises: and generating a monitoring result based on the service log indexes obtained by statistics. Therefore, the monitoring omni-bearing performance is further improved by introducing the service log data.
According to another aspect of the present invention, there is also provided a search ranking monitoring method, including: acquiring current search data of a search system, wherein the search data comprises user input data, search sorting data and/or user click data; calculating a search system monitoring index based on the current search data, wherein the monitoring index comprises a search evaluation index, a signal dependence index and a model data index, and the search evaluation index is obtained by comparing the current search evaluation index with a reference search evaluation index obtained in the past; summarizing the search system monitoring indexes; and generating a monitoring result based on the variation trend of the summary index.
Optionally, the search ranking monitoring method may further include: acquiring service log data of the search system; and obtaining a service log index based on the service log data statistics, wherein the search system monitoring index comprises the service log index.
Optionally, the signal-dependent index abnormality may cause degradation of the search evaluation index and/or the model data index, and generating the monitoring result based on a trend of change of the summary index may include: generating an alert based on the anomaly and the degradation.
According to another aspect of the present invention, there is also provided a search ranking monitoring system, including: the current data acquisition device is used for acquiring a current online search sequencing result of the search system; monitoring index calculation means for calculating a current search evaluation index based on the current online search ranking result and comparing the calculated current search evaluation index with a reference search evaluation index acquired previously to acquire a search evaluation index, the monitoring index including the search evaluation index; and the monitoring result generating device is used for generating a monitoring result based on the monitoring index.
Optionally, the previously acquired reference search evaluation index may include at least one of: manually annotated index data; and computing data based on the online user's real click data.
Optionally, the current data obtaining device is further configured to obtain user input data and user click data of the search system, the monitoring index calculating device includes a search evaluation index calculating unit, a signal dependency index calculating unit, and a model data index calculating unit, the monitoring index further includes a signal dependency index and a model data index, and the search evaluation index calculating unit is configured to calculate the search evaluation index, the signal dependency index calculating unit is configured to calculate the signal dependency index based on at least the user input data, the model data index calculating unit is configured to calculate the model data index based on at least the user click data, the search ranking monitoring system further includes a monitoring index summarizing device configured to summarize the monitoring index, and the monitoring result generating device generates the monitoring result based on the summarized monitoring index.
Optionally, the search ranking monitoring system may further include: the service log obtaining device is used for obtaining service log data of the search system, wherein the monitoring indexes further comprise service log indexes, the monitoring index calculating device further comprises a service log index counting unit, and the service log index counting unit is used for obtaining the service log indexes based on service log data statistics.
Alternatively, the signal-dependent index abnormality may cause deterioration of the search evaluation index and/or the model data index, and the monitoring result generation means may generate an alarm based on the abnormality and the deterioration.
Optionally, the signal dependence indicator calculation unit may be configured to: acquiring and classifying a dependent signal including a time-dependent signal and/or a PV (Page view) cutoff signal via an interface; and counting the classified dependent signals to obtain the signal dependent index.
Alternatively, the model data index calculation means calculates a relative entropy between the current unit time and the model data index in the previous unit time and/or a quantile change tendency of the model data index, and the monitoring result generation means may generate the monitoring result based on the relative entropy and/or the quantile change tendency of the model data index.
Therefore, by the searching and sequencing monitoring scheme, the sequencing effect of the searching system can be effectively monitored based on the comparison between the current sequencing result and the previous reference result. Furthermore, the search sorting system can be comprehensively monitored from multiple dimensions such as signal dependence, model data and/or service log signals, and can cope with high-timeliness search scenes which are more and more common in internet search services.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be apparent from the following more particular descriptions of exemplary embodiments of the disclosure as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout the exemplary embodiments of the disclosure.
FIG. 1 shows a schematic flow diagram of a search ranking monitoring method according to one embodiment of the invention.
Fig. 2 shows a schematic diagram of a preferred embodiment of system monitoring based on-line effect data.
Figure 3 illustrates an NDCG index trend graph obtained according to one embodiment of the present invention.
Fig. 4A to 4D show examples of the variation trend graphs of various types of signal dependence indicators, in which the value of the ordinate in the graph represents the number of searches at the corresponding search strength.
5A-5D illustrate examples of quantile statistical variation trend plots indicating various types of model data indicators.
6A-6D illustrate examples of quantile histograms indicating various types of model data indices.
7A-7D illustrate an example of finding relative entropy based on histograms.
FIG. 8 shows an example of the distribution of various types of errors in the statistical log.
FIG. 9 shows a schematic flow diagram of a search ranking monitoring method according to one embodiment of the invention.
FIG. 10 illustrates a functional block diagram of multi-dimensional search system monitoring according to one embodiment of the present invention.
FIG. 11 shows a block diagram of a search ranking monitoring system according to one embodiment of the invention.
Detailed Description
Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As described above, as people spend more and more time using terminal devices such as smartphones to acquire information. Displaying information that the user wants to obtain, goods that the user wants to purchase, and even goods or information that the user potentially needs, are also targets of efforts required by various large network content providers or service providers.
For this reason, not only are conventional search engines (e.g., google, hundredths) required to continuously refine their search ranking system to provide users with search ranking results based on their currently input keywords, but also, in a syndicated (feed) search scenario, the search ranking system is required to recommend content or merchandise to the users that may be of interest to the users in the absence of their current specific input.
Users always want to obtain information they want or are interested in with minimal cost. The quality of the presentation of information provided by the search ranking system determines to a large extent the user's experience of use and even the user's stay. Therefore, effective monitoring and evaluation of search ranking effects becomes a problem that must be solved by each large network content or service provider.
In view of the above, the present invention provides an online monitoring and evaluating scheme for a search ranking effect, which can effectively and accurately monitor the ranking effect of a search system based on the comparison between a current ranking result and a previous reference result. The monitoring scheme may further involve signal dependence, model data, and/or traffic log signals to enable comprehensive and more immediate sensitive monitoring of the search ranking system from multiple dimensions. In this context, the signal dependency indicator may include timeliness information in order to cope with highly timeliness search scenarios that are increasingly common in internet search services. Further, anomalies in the above-described signal-dependent indicators, including timeliness information, can have an impact on the model data and the sequencing effect dimension to facilitate locating problems from multiple aspects when anomalies occur.
FIG. 1 shows a schematic flow diagram of a search ranking monitoring method according to one embodiment of the invention. The method fully utilizes the search effect evaluation index to effectively monitor and evaluate the health condition of the search sorting system.
In step S110, a current online search ranking result is obtained. In step S120, a current search evaluation index is calculated based on the current online search ranking result. Subsequently, in step S130, the calculated current search evaluation index is compared with the previously acquired reference search evaluation index to obtain a search evaluation index. In step S140, a monitoring result is generated based on at least the search evaluation index.
In step S110, the actual sorting result on the current line may be automatically requested by a previous setting. The ranking results obtained here are ranking results generated by the search ranking system for a large number of users (e.g., a large number of non-specific users). These ranking results may be the ranking results being generated by the system, the ranking results generated in a previous period of time, or the ranking results drawn according to some rule (e.g., randomly). Herein, "current" may refer to "real-time" herein, as well as to the more general term "near-term" herein. In one embodiment, the currently acquired data may be, or be part of, a real-time input search ranking system or data generated by the system in real-time. In another embodiment, the currently acquired data may be data acquired cumulatively over a recent period of time, such as data acquired over a recent day. In other embodiments, the system data may be acquired in batches, and the invention is not limited herein. In addition, the search ranking result can be a search ranking result returned by the system based on the search words input by the user, or can be a ranking result generated by the system according to the previous search and browse behaviors of the user or the current hot spot and the like.
In step S120, a current search evaluation index is calculated based on the current inline search ranking result. Here, the search evaluation index refers to an index that can be used to evaluate the search ranking quality of the search ranking system, such as evaluation indexes commonly used in information retrieval, such as NDCG, PARI, ERR, F-SCORE, and the like. The normalized loss cumulative gain (NDCG) is a document relevance ranking score obtained by normalizing the length of the resulting document. Positive-inverse-order-to-ratio (PAIR) refers to the ratio of positive-order to inverse-order PAIRs as the name implies. For example, assume that the reference data is (1,0), and if the prediction data is (1,0), it is a forward pair, and if the prediction data is opposite to the reference data (0,1), it is a reverse pair. A larger PAIR value indicates a better ranking. The Expected Reciprocal Rank (ERR) represents the expectation of the inverse of the position of the stop when the user demand is met. The harmonic value (F-SCORE) of the accuracy rate and the recall rate can also be expressed as F-Measure, and is an index considering both the accuracy rate and the recall rate.
The NDCG index can evaluate the sorting effect of the system model in a real number form, and other indexes (such as PARI, ERR and F-SCORE) can visually display and process the intermediate SCORE data so as to find the abnormality in a specific dimension in time and help locate the index abnormality of the NDCG. Thus, in one embodiment, the search ranking index may include only NDCG, while in other embodiments, one or more of PARI, ERR, and F-SCORE may also be included to provide an all-round index representation. Alternatively or additionally, a Mean of Average Precision (MAP) may be introduced as an evaluation index.
After the current search evaluation index is calculated, it may be compared with the reference search evaluation index acquired previously to obtain a search evaluation index, i.e., final search evaluation index data, at step S130. The previously acquired reference search evaluation index is data labeled in advance and is used as a standard for on-line effect evaluation. In one embodiment, the baseline data may include manually labeled metric data and/or calculated data based on prior real click data of the online user. Then, the evaluation data (i.e. the current online real data) can be compared with the reference data to obtain index data to reflect the current online search ranking effect. Fig. 2 shows a schematic diagram of a preferred embodiment of system monitoring based on-line effect data. Since the benchmark data is preferably dependent on previous clicks of real users, the actual ranking effect of the system can be more accurately reflected.
Indexes such as NDCG and the like can be obtained through the above process and stored and displayed in the scheme. Figure 3 illustrates an NDCG index trend graph obtained according to one embodiment of the present invention. In the NCDG index trend graph, the index trends corresponding to NCDG _10, NCDG _5, NCDG _3, and NCDG _1 are shown from top to bottom, respectively: the first 10, 5, 3, 1 results of the on-line search results have the same comparison with the reference data, and the trend of the obtained index data is obtained. The NDCG index trend graph may be considered as an example of a monitoring result, or a dimension. Based on the trend graph, the current situation of the search effect can be fed back in time. For example, an increase in the NDCG index indicates an increase in the system ranking effect, and a decrease in the index indicates a deterioration in the ranking effect. The change in NDCG may be cause located by analyzing other monitoring metrics (e.g., one or more of the PARI, ERR, and F-SCORE described above), such as determining whether the change is expected due to a model update, a failure of an online service, etc.
In other embodiments, the search ranking effect of the system may also be monitored and evaluated based on other dimensions.
In one embodiment, the search ranking monitoring method of the present invention may further include counting and obtaining the signal dependence index. Accordingly, step S140 may include: generating a monitoring result based on the signal dependence indicator. The final effect of the search algorithm depends on a number of signals, such as time-sensitive signals and (amount of page view) truncated signals. The method monitors the signal depending on the search algorithm, can reflect the system abnormity more intuitively, and is convenient to position the abnormity.
To this end, the counting and obtaining the signal-dependent index may further include: acquiring and classifying a dependent signal comprising a time-dependent signal and/or a PV cutoff signal via an interface; and counting the classified dependent signals to obtain the signal dependent index.
Search scenes with high timeliness are more and more in internet search services. Studies have shown that time-sensitive searches can account for 35% or more of the total search. In order to ensure normal operation of services and good search sequencing effect, the realization and perfection of the monitoring scheme under the scene is the direction of trying to explore in the industry all the time. Herein, "timeliness" of the search means that the search has a time distribution characteristic, and has a periodic or other regular relationship with time. Such as: the user types 'high-examination questions' in the search engine, and the users in different years have different meanings when inputting the search word. If the search engine does not recognize the timeliness of the search bands, the users in 2017 and 2018 can obtain the same results only according to the ordinary query processing, or the time factor is not considered in the final sorting result, and the old results are arranged in the front, the user experience is poor.
Therefore, it is necessary to evaluate the timeliness of the entire system using the timeliness signal. The signals which are more important in the timeliness scene may include timeliness signals, TS, TP and the like which indicate the strength of timeliness of the current search word (query). Herein, the timeline signal refers to the composite timeliness score; TP (time _ precision) refers to burst timeliness; TS (time _ simple) refers to universal timeliness. The timeliness strength is calculated from the same batch of data, i.e. for the same batch of trigger words, the timeliness signal is calculated by requesting online service at regular times. The value of the timeliness signal can reflect the value change caused by the model reason, and meanwhile, the influence of the search word factor is eliminated. Since the system time-based statistics are typically derived based on search data that excludes the effects of specific search term factors, the values for a fixed period of time (e.g., daily) are also typically consistent when the model is unchanged.
In one embodiment, only the timeless signal of the system may be counted and a signal classification statistical map, for example, may be plotted. In a preferred embodiment, a combination of the three may also be used to determine the strength of the requested timeliness.
The PV cutoff signal (cutoff _ PV) indicates whether PV (page view amount) cutoff is performed for a certain type of trigger result (e.g., a result of timeliness), i.e., no result is triggered after a certain PV value is exceeded. The signal will also have a large impact on the final outcome decision. For these signals (e.g., the time-sensitive signal and the PV cutoff signal described above), the signals can be classified and collected through data return of the interface, counting the signal type distribution in the sample set. Fig. 4A to 4D show examples of the variation trend graphs of various types of signal dependence indicators, in which the values of the ordinate indicate the number of searches at the corresponding search intensity, and the abscissa indicates the last three days in the statistical intervals 2017-12-22 to 2017-12-29, i.e., 2017-12-27, 2017-12-28, and 2017-12-29. Fig. 4A shows a timeless signal class statistics graph showing timeless signal class statistics with 0,1, 2, 3, 4 and no signal (none) for 12 month, 27 days, 28 days and 29 days in 2017, respectively. Fig. 4B shows a TS signal classification statistical chart showing TS signal classification statistics with the aging strengths of 0,1, 2, 3 and no signal (none) at 27, 28 and 29 months 12 and 2017, respectively. Fig. 4C shows a TP signal class statistics graph showing the TP signal class statistics with 0,1, 2, 3, 4, 5 and no such signal (none) for the age strength of 27, 28 and 29 months 12 and 2017, respectively. Fig. 4D shows a cutoff _ pv signal classification statistic chart showing cutoff _ pv signal classification statistics with aging strengths of 0,1, 2, 3 and no signal (none) at 12 month, 27 days, 28 days and 29 days in 2017, respectively.
In one embodiment, the search ranking monitoring method of the present invention may further comprise calculating a model data index based on the search algorithm model. Accordingly, step S140 may include: and generating a monitoring result based on the change trend of the model data index.
The search algorithm model can quantify data indexes and make final return result decision according to the indexes. The invention can further monitor the operation condition of the search system from the dimension of the data model by intuitively showing the model intermediate data of the search model in real time.
For example, in the case of ranking search results using an LR (logistic regression) model, a calculated score of the LR model, a text score for data evaluation, a final ranking score after manual rules, and the like may be used as model data indexes. The distribution of these quantized data is relatively stable and regular. If a large change occurs, it indicates that the system is currently at a high risk. In some embodiments, concepts of quantile statistics, histogram distribution, relative entropy may be introduced to more intuitively reflect the problem of the search system based on model data metrics.
Fig. 5A to 5D show examples of quantile statistical variation tendency charts indicating various types of model data indexes, with the abscissa being the last four days in the statistical intervals 2017-11-10 to 2017-11-17, i.e., 2017-11-14, 2017-11-15, 2017-11-16, and 2017-11-17. In the quantile statistics as shown in the figure, 1%, 10%, 36%, 50%, 64%, 90% and 99% are taken as markers (in addition, maximum (max), minimum (min) and no signal (none) markers are further added), and the absolute distribution trend of the model data is represented. LR _ S, CLK _ S _ M, score and CLK _ S in the figure are basic scores in the sorting model, and the ordinate represents the value of a corresponding mark point reached every day; wherein LR _ S (Learn to Rank Score) is a ranking Score; CLK _ S (click score) is click score; CLK _ S _ M (click score mapped) is a mapping click score and is a result of the CLK _ S segment mapping amplification; score is a composite ranking score that may represent a weighted sum of the first scores. It will be appreciated that other model scores may be used in other models, and that the trend of the scores may also reflect the operation of the search system. The curves in the graph are the variation trend graphs of the scores, and the scores of the models are kept stable when the search model is not greatly changed.
In a preferred embodiment, a histogram may also be used to more intuitively represent the relative distribution trend of the data. 6A-6D illustrate examples of quantile histograms indicating various types of model data indices. In the distribution of the columnar proportion as shown in the figure (11/17/2017), the total data is divided into 100, and the distribution proportion of the columnar display values represents the relative distribution trend of the model data.
To evaluate the difference between the two models in model iteration, the distance between the two model columnar distributions can be calculated. Relative entropy can be introduced to measure the stability of the algorithm model or to decide the change trend of the old model and the new model. 7A-7D illustrate an example of finding the relative entropy based on a histogram, where the relative entropy is marked with black boxes to characterize the change in day 17 of 11 months in 2017 from the previous day. Relative entropy represents the ratio of the number distributions between the two models, e.g., A, B two-day number distributions a _ list, B _ list. The relative entropy can be found by:
Figure BDA0001832317000000111
where p and q represent the probability distribution of the same variable x in two series, e.g., here, a _ list, B _ list, respectively. D thus obtained represents the variation of B _ list relative to A _ list, i.e., the relative entropy of p versus q. The greater the relative entropy, the greater the model variation. Without major changes to the model, the smaller the relative entropy the better, as shown the relative entropy is less than 0.001.
From the above, in one embodiment, generating the monitoring result based on the trend of the model data index may include: calculating the relative entropy between the model data indexes in the current unit time and the previous unit time; and generating a monitoring result based on the relative entropy. Alternatively or additionally, generating monitoring results based on a trend of change of the model data indicator may include: and generating a monitoring result based on the quantile change trend of the model data index.
In one embodiment, the search ranking monitoring method of the present invention may further include extracting and counting the service log indexes based on the service log. Accordingly, step S140 may include: and generating a monitoring result based on the service log indexes obtained by statistics.
Besides the three dimensions of the real-time search data based on the search system, the error log and the core business index can be extracted through the business log, and the health degree of the current service basic function can be represented. FIG. 8 shows an example of the distribution of various types of errors in the statistics log. The figure shows the current time sharing index of 12 and 30 months in 2017, which is a logical error field of the service. The health condition of the system can be reflected intuitively through the extracted numerical value of the error field and the change trend generated by the numerical value. In this, preCalc Failed | with a larger number of errors is marked separately on the right side of the figure! (failure to precompute!) and get illegal http code (get illegal http code), the other less erroneous classifications are shown collectively on the bottom side of the figure. In one embodiment, an alarm can be set according to data to discover business problems in time.
In different embodiments of the present invention, the ranking effect of the system can be monitored and evaluated in an all-round manner depending on the search evaluation index, the signal dependence index, the model data index and the service log index, or all or any combination thereof. In this case, the signal abnormality on which the search algorithm depends also has a linked influence on the model data index and the search evaluation index. Thus, step S140 may include: the signal dependence index abnormity causes the current search evaluation index to slide down; and generating an alert based at least on the current search evaluation index glide. Alternatively or additionally, step S140 may include: the statistical signal dependence index abnormity results in obvious change of the model data index; and generating an alert based at least on the apparent change in the model data indicator.
In one embodiment, the search ranking monitoring method of the invention can comprehensively evaluate the system operation condition at least from three dimensions of search evaluation, signal dependence and model data according to the acquired real-time search data of the search system.
FIG. 9 shows a schematic flow diagram of a search ranking monitoring method according to one embodiment of the invention. In step S910, current search data of the search system is obtained, where the search data includes user input data, search ranking data, and/or user click data. Likewise, "current" search data may represent "real-time" search data, or search data that has been continuously acquired by the system over a recent period of time, or a portion of data that has been acquired or generated by the system.
In step S920, a search system monitoring index is calculated based on the current search data, where the monitoring index includes a search evaluation index, a signal dependency index, and a model data index, and the search evaluation index is obtained based on a comparison between the current search evaluation index and a previously obtained reference search evaluation index. Here, the generation of the current search evaluation index requires at least search ranking data generated based on the system. The generation of the signal-dependent index requires at least input data from a user of the system, while the model data index requires at least click data from the user.
In step S930, the search system monitoring metrics are aggregated. In step S940, a monitoring result is generated based on the variation trend of the summary index. In one embodiment, the monitoring results may include a summary of trend graphs, histograms, etc. for each monitoring dimension as shown in FIGS. 3-8. Preferably, the monitoring result may further include an alarm generated when the fluctuation amplitude or the value of the relevant index reaches a specific value. For example, a signal-dependent index abnormality may cause a search evaluation index and/or a model data index to deteriorate, whereby an alarm may be generated based on the abnormality and deterioration at step S940.
Preferably, the search ranking monitoring method shown in fig. 9 may further include: acquiring service log data of the search system; and obtaining a service log index based on the service log data statistics, wherein the search system monitoring index comprises the service log index. FIG. 10 illustrates a functional block diagram of multi-dimensional search system monitoring according to one embodiment of the present invention. In the preferred embodiment shown in fig. 10, the monitoring for the search ranking system can be done simultaneously from four dimensions, online effects, signal dependence, model data and traffic logs. Anomalies in either dimension can cause the generation of a monitoring alert. The combination of the signals can conveniently and integrally grasp all aspects of the system operation, and quickly locate the occurrence point and the occurrence reason of the problem when the system is abnormal. In addition, when the system is adjusted or upgraded locally or integrally, the preferable multi-dimensional monitoring scheme can reflect and evaluate the adjusting or upgrading effect quickly and accurately.
In one embodiment of the invention, a search ranking monitoring system is also disclosed. FIG. 11 shows a block diagram of a search ranking monitoring system according to one embodiment of the invention.
As shown in fig. 11, the search ranking monitoring system 1100 of the present invention may include a real-time data acquisition means 1110, a monitoring index calculation means 1120, and a monitoring result generation means 1130.
The real-time data acquisition device 1110 can be used to acquire the current online search ranking results of the search system.
The monitoring index calculation means 1120 can calculate a current search evaluation index based on the current online search ranking result and compare the calculated current search evaluation index with a previously acquired reference search evaluation index to acquire a search evaluation index, the monitoring index including the search evaluation index.
Wherein the search evaluation index may include at least one of: normalized loss cumulative gain (NDCG); positive inverse sequential Proportion (PARI); expected Reciprocal Rank (ERR); and a harmonic value of accuracy and recall (F-SCORE). The previously acquired reference search evaluation index may include at least one of: manually annotated index data; and computing data based on the online user's real click data.
The monitoring result generation means 1130 can generate a monitoring result based on the monitoring index.
Optionally, as an embodiment of the present invention, the real-time data obtaining device 1110 may further be configured to obtain user input data and user click data of the search system, the monitoring index calculating device 1120 may include a search evaluation index calculating unit, a signal dependence index calculating unit, and a model data index calculating unit, the monitoring index further includes a signal dependence index and a model data index, the search evaluation index calculating unit is configured to calculate the search evaluation index, the signal dependence index calculating unit is configured to calculate the signal dependence index based on at least the user input data, the model data index calculating unit is configured to calculate the model data index based on at least the user click data, the search ranking monitoring system 1100 may further include a monitoring index summarizing device (not shown in the figure) configured to summarize the monitoring indexes, and the monitoring result generating device 1130 may generate the monitoring result based on the summarized monitoring indexes.
Optionally, as an example of the present invention, the search ranking monitoring system may further include a service log obtaining device (not shown in the figure) for obtaining service log data of the search system. The monitoring index may further include a service log index, and the monitoring index calculation apparatus further includes a service log index statistical unit, where the service log index statistical unit is configured to obtain the service log index based on service log data statistics.
Alternatively, the signal-dependent index abnormality causes the search evaluation index and/or the model data index to deteriorate, and the monitoring result generation means 1130 can generate an alarm based on the abnormality and deterioration.
Optionally, the signal dependence index calculation unit may be configured to obtain and classify a dependence signal including a time-dependent signal and/or a PV (page view volume) cutoff signal via an interface; and counting the classified dependent signals to obtain the signal dependent index.
Alternatively, the model data index calculation means may be capable of calculating a relative entropy between the model data indexes in the current unit time and the previous unit time and/or a quantile change tendency of the model data indexes, and the monitoring result generation means generates the monitoring result based on the relative entropy and/or the quantile change tendency of the model data indexes. The functions implemented by the search sequencing monitoring device may specifically refer to the above related description, and are not described herein again.
The search ranking monitoring method and system according to the present invention, which can effectively monitor the ranking effect of a search system based on the comparison of the current ranking result and the previous reference result, have been described in detail above with reference to the accompanying drawings. The monitoring scheme may further involve signal dependence, model data, and/or traffic log signals to provide full monitoring of the search ranking system from multiple dimensions. In this context, the signal dependency indicators may include timeliness information in order to cope with highly timeliness search scenarios that are increasingly common in internet search services. The signal dependence index has linkage influence on other indexes, so that local or overall abnormality of the search system can be reflected more sensitively and accurately. In addition, the matching of the indexes can facilitate the system maintenance personnel to locate the abnormity and accurately judge the reason of the abnormity.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
While embodiments of the present invention have been described above, the above description is illustrative, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (17)

1. A search ranking monitoring method, comprising:
obtaining a current online search sorting result;
calculating a current search evaluation index based on the current online search ranking result;
comparing the calculated current search evaluation index with a previously acquired reference search evaluation index to obtain a search evaluation index; and
generating a monitoring result based on at least the search evaluation index,
wherein the method further comprises:
counting and obtaining a signal dependence index, wherein the signal dependence index is a statistical classification index of a signal on which a search algorithm depends;
calculating model intermediate data indexes of the search model based on the search algorithm model,
and generating a monitoring result based on at least the search evaluation index comprises:
the signal dependence index abnormality obtained through statistics causes obvious change of the model intermediate data index; and
generating an alert based at least on a significant change in the model intermediate data indicator.
2. The method of claim 1, wherein the search evaluation index comprises at least one of:
normalized loss cumulative gain (NDCG);
positive inverse sequential Proportion (PARI);
expected Reciprocal Rank (ERR); and
a harmonic value of accuracy and recall (F-SCORE).
3. The method of claim 1, wherein the previously acquired baseline search evaluation index comprises at least one of:
manually annotated index data; and
and calculating the obtained data based on the real click data of the online user.
4. The method of claim 1, wherein counting and obtaining signal dependence indicators comprises:
acquiring and classifying the dependent signal via an interface, the dependent signal comprising a time-dependent signal and/or a PV (Page browsing volume) cutoff signal;
and counting the classified dependent signals to obtain the signal dependent index.
5. The method of claim 1, wherein generating monitoring results based at least on the search evaluation index further comprises:
the signal dependence index abnormity causes the current search evaluation index to slide down; and
generating an alert based at least on the current search evaluation index downslide.
6. The method of claim 1, wherein generating monitoring results based on trends in the model intermediate data indicators comprises:
calculating the relative entropy between the model intermediate data indexes in the current unit time and the previous unit time; and
generating a monitoring result based on the relative entropy, and/or
And generating a monitoring result based on the quantile change trend of the model intermediate data index.
7. The method of claim 1, further comprising:
extracting and counting service log indexes based on the service log, and
generating the monitoring result further comprises:
and generating a monitoring result based on the service log indexes obtained by statistics.
8. A search ranking monitoring method, comprising:
acquiring current search data of a search system, wherein the search data comprises user input data, search sorting data and/or user click data;
calculating a search system monitoring index based on the current search data, wherein the monitoring index comprises a search evaluation index, a signal dependence index and a model intermediate data index, the search evaluation index is obtained by comparing the current search evaluation index with a reference search evaluation index obtained in the past, the signal dependence index is a statistical classification index of a signal on which a search algorithm depends, and the model intermediate data index is an index of model intermediate data of a search model calculated based on a search algorithm model;
summarizing the search system monitoring indexes; and
generating a monitoring result based on the variation trend of the summary index,
the monitoring result generation based on the variation trend of the summary index comprises the following steps:
the signal dependence index abnormality obtained by statistics causes obvious change of the model intermediate data index; and
generating an alert based at least on a significant change in the model intermediate data indicator.
9. The method of claim 8, further comprising:
acquiring service log data of the search system;
obtaining a service log index based on the service log data statistics, wherein,
the search system monitoring index comprises the service log index.
10. The method of claim 8, wherein the signal-dependent metric anomaly causes the search evaluation metric to degrade,
generating the monitoring result based on the variation trend of the summary index comprises the following steps:
generating an alert based on the anomaly and the degradation.
11. A search ranking monitoring system comprising:
the current data acquisition device is used for acquiring a current online search sorting result of the search system;
monitoring index calculation means for calculating a current search evaluation index based on the current on-line search ranking result and comparing the calculated current search evaluation index with a previously acquired reference search evaluation index to acquire a search evaluation index, the monitoring index including the search evaluation index; and
monitoring result generating means for generating a monitoring result based on the monitoring index,
wherein the monitoring index further includes a signal dependence index and a model intermediate data index, wherein the signal dependence index is counted and obtained, the signal dependence index is a statistical classification index of a signal on which the search algorithm depends, the model intermediate data index is an index of model intermediate data of a model calculation search model based on the search algorithm model,
and, the monitoring result generating means is configured to:
the signal dependence index abnormality obtained by statistics causes obvious change of the model intermediate data index; and
generating an alert based at least on a significant change in the model intermediate data indicator.
12. The system of claim 11, wherein the previously acquired baseline search evaluation index comprises at least one of:
manually annotated index data; and
and calculating data based on real click data of the online user.
13. The system of claim 11, wherein,
the current data acquisition means is further adapted to acquire user input data and user click data of the search system,
the monitoring index calculation means includes a search evaluation index calculation unit for calculating the search evaluation index, a signal dependency index calculation unit for calculating the signal dependency index based on at least the user input data, and a model intermediate data index calculation unit for calculating the model intermediate data index based on at least the user click data,
the search ranking monitoring system also comprises a monitoring index summarizing device which is used for summarizing the monitoring indexes and is used for summarizing the monitoring indexes
The monitoring result generation device generates the monitoring result based on the summarized monitoring indexes.
14. The system of claim 13, further comprising:
service log acquiring means for acquiring service log data of the search system,
the monitoring index further comprises a service log index, and the monitoring index calculation device further comprises a service log index statistical unit, wherein the service log index statistical unit is used for obtaining the service log index based on service log data statistics.
15. The system of claim 13, wherein the signal dependent metric anomaly causes the search evaluation metric to degrade,
the monitoring result generation means generates an alarm based on the abnormality and the deterioration.
16. The system of claim 13, wherein the signal dependence indicator calculation unit is to:
acquiring and classifying the dependent signal via an interface, the dependent signal comprising a time-dependent signal and/or a PV (Page browsing volume) cutoff signal;
and counting the classified dependent signals to obtain the signal dependent index.
17. The system according to claim 13, wherein the model intermediate data index calculation unit calculates a relative entropy between the current unit time and the model intermediate data index in the previous unit time and/or a quantile change tendency of the model intermediate data index, and the monitoring result generation means generates the monitoring result based on the relative entropy and/or the quantile change tendency of the model data index.
CN201811210424.5A 2018-10-17 2018-10-17 Search ranking monitoring method and system Active CN111061942B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811210424.5A CN111061942B (en) 2018-10-17 2018-10-17 Search ranking monitoring method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811210424.5A CN111061942B (en) 2018-10-17 2018-10-17 Search ranking monitoring method and system

Publications (2)

Publication Number Publication Date
CN111061942A CN111061942A (en) 2020-04-24
CN111061942B true CN111061942B (en) 2023-04-18

Family

ID=70297318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811210424.5A Active CN111061942B (en) 2018-10-17 2018-10-17 Search ranking monitoring method and system

Country Status (1)

Country Link
CN (1) CN111061942B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343046B (en) * 2021-05-20 2023-08-25 成都美尔贝科技股份有限公司 Intelligent search ordering system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989152A (en) * 2015-03-02 2016-10-05 深圳市腾讯计算机系统有限公司 Search engine service quality monitoring methods, apparatus and system
CN107122467A (en) * 2017-04-26 2017-09-01 努比亚技术有限公司 The retrieval result evaluation method and device of a kind of search engine, computer-readable medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120130814A1 (en) * 2007-11-14 2012-05-24 Paul Vincent Hayes System and method for search engine result ranking
US9251185B2 (en) * 2010-12-15 2016-02-02 Girish Kumar Classifying results of search queries

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989152A (en) * 2015-03-02 2016-10-05 深圳市腾讯计算机系统有限公司 Search engine service quality monitoring methods, apparatus and system
CN107122467A (en) * 2017-04-26 2017-09-01 努比亚技术有限公司 The retrieval result evaluation method and device of a kind of search engine, computer-readable medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张智超 ; 胡轶宁 ; 秦永林 ; 罗立民 ; .基于有序子窗搜索的非局部约束稀疏角度锥束CT重建算法.东南大学学报(自然科学版).2017,47(05),906-912. *

Also Published As

Publication number Publication date
CN111061942A (en) 2020-04-24

Similar Documents

Publication Publication Date Title
US10824682B2 (en) Enhanced online user-interaction tracking and document rendition
US10552536B2 (en) System and method for analyzing and categorizing text
US8825638B1 (en) System for generating behavior-based associations for multiple domain-specific applications
US9443245B2 (en) Opinion search engine
KR101708444B1 (en) Method for evaluating relation between keyword and asset value and Apparatus thereof
US20130097152A1 (en) Topical activity monitor system and method
CN105868256A (en) Method and system for processing user behavior data
CN107153656A (en) A kind of information search method and device
CN109582859B (en) Insurance pushing method and device, computer equipment and storage medium
CN104615741B (en) Cold-start project recommendation method and device based on cloud computing
JPWO2008087728A1 (en) Keyword management program, keyword management system, and keyword management method
WO2020135059A1 (en) Search engine evaluation method, apparatus and device, and readable storage medium
Dasu Data glitches: Monsters in your data
Hutterer Enhancing a job recommender with implicit user feedback
CN118467842B (en) Data popularization system and method of mobile internet
CN111767938A (en) Abnormal data detection method and device and electronic equipment
CN111061942B (en) Search ranking monitoring method and system
JP2006053616A (en) Server device, web site recommendation method and program
CN105786810A (en) Method and device for establishment of category mapping relation
US20190205341A1 (en) Systems and methods for measuring collected content significance
CN113836412B (en) Information recommendation method and device, electronic equipment and storage medium
CN109934740B (en) patent monitoring method and device
CN113449212A (en) Quality evaluation and optimization method, device and equipment for search results
Gezici Quantifying political bias in news articles
CN113836422B (en) Information searching method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant