CN109190004B - Method for reducing search complexity based on specific strategy - Google Patents

Method for reducing search complexity based on specific strategy Download PDF

Info

Publication number
CN109190004B
CN109190004B CN201810999884.4A CN201810999884A CN109190004B CN 109190004 B CN109190004 B CN 109190004B CN 201810999884 A CN201810999884 A CN 201810999884A CN 109190004 B CN109190004 B CN 109190004B
Authority
CN
China
Prior art keywords
search
processing
strategy
requests
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810999884.4A
Other languages
Chinese (zh)
Other versions
CN109190004A (en
Inventor
姜平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Focus Technology Co Ltd
Original Assignee
Focus Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Focus Technology Co Ltd filed Critical Focus Technology Co Ltd
Priority to CN201810999884.4A priority Critical patent/CN109190004B/en
Publication of CN109190004A publication Critical patent/CN109190004A/en
Application granted granted Critical
Publication of CN109190004B publication Critical patent/CN109190004B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a method for reducing search complexity based on a specific strategy to deal with an overload search request, which is characterized by comprising the following steps of 1: identifying a search request; step 2: distinguishing the request and the additional processing strategy mark; and step 3: processing the request by the strategy; and 4, step 4: returning a result; and 5: monitoring a dynamic adjustment strategy at regular intervals; step 6: the process was recorded to evaluate the impact. The effect of dynamically adjusting the degree of search complexity reduction along with the concurrent load condition of the search request under the condition of controlling the related range of the lossy service to be the minimum influence degree is achieved.

Description

Method for reducing search complexity based on specific strategy
Technical Field
The invention relates to the technical field of data search, in particular to a method for reducing search complexity based on a specific strategy;
background
The search system is a high-load computing system, needs to occupy a large amount of CPU computing time and memory resources, meanwhile, the online search system needs to keep high availability, and the response time is required to be in millisecond level, so the system resources occupied by the search system and the maximum search request capable of being processed by the search system are in certain range proportion;
the economic benefit needs to be considered in company operation, and the cost of hardware is limited; for a search platform, the search platform belongs to a background service provider in the whole service system, is open for all accessed service systems to use, and has relatively limited total processing capacity; when an overload search request occurs, namely the concurrency number of the search request exceeds the maximum load of a system, the search platform is easy to cause the situation that the instance memory overflows or the load is too high, so that the application is crashed and the service cannot be provided; in order to cope with such an overloaded search request, with limited hardware cost; a search platform system is required to be capable of reducing the search complexity based on a certain specific strategy and responding to an overload request through the idea of lossy service;
the concept of lossy service is generally recognized to meet basic service requirements to the greatest extent by sacrificing some unimportant or non-core user experience based on limited resources; the search platform provides a search service, the search service has certain ambiguity, the search service is naturally distinguished from the consistency requirement of a database system, different search result sets can be provided for the same search word, and the search result sets can meet certain accuracy and relevance; the complexity of the search service of the search platform in the electronic commerce is very high, the search calculation not only comprises recall sequencing based on Lucene score, but also needs to add a plurality of complex service customization logics, for example, the information quality of commodities contained in the search domain, the member level of a merchant, the historical transaction condition and the user click condition are considered, and a proper amount of attenuation is needed to be carried out on the commodities concentrated under the name of the same member merchant; the processing logic of the system consumes a lot of computing resources and memory resources, particularly, personalized search is accessed, and related search behavior preference, interest networks and the like in a search user portrait are also considered; by reducing the search complexity, the search platform can improve the concurrency capability of processing search requests under the same search hardware resources, and the method is an efficient and practical technical application scene;
disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a method for reducing the search complexity based on a specific strategy, wherein different strategies are carried out on all received search requests to divide different user groups, the search requests with little influence on the service value are subjected to the reduction of the search complexity, and the reduction method is to remove the sequence in the search query requests and reduce the recall quantity, thereby reducing the occupation of the search server performance and improving the throughput capacity of a search service cluster, and avoiding the problem that the whole service cluster cannot be served due to a large number of sudden overload search requests. The overload search request refers to that the search platform encounters a search concurrent request exceeding the maximum processing load of the search platform, so that the system load average index is increased.
In order to solve the above technical problem, the present invention provides a method for reducing search complexity based on a specific strategy, comprising the following steps:
step 1: identifying search requests, namely all search requests sent to a search service cluster need to carry specific user identifiers, wherein the identifiers comprise a user source IP, Cookie information logged by a user and a marker bit of a Spider (crawler engine), and all the search requests are submitted to a search cluster repeater (Dispatcher) for processing;
step 2: distinguishing requests and additional processing strategy marks, namely, a search cluster repeater (Dispatcher) distinguishes user priorities of the search requests according to a strategy rule configured by an administrator, judges whether the number of the current search requests and the server load meet the standard of triggering surge coping, distinguishes different user search requests, attaches a specific search processing strategy mark, and forwards the search request and the additional processing strategy mark to a certain search service instance (Searcher) in the cluster for processing;
and step 3: strategy processing requests, namely after a certain search service instance (Searcher) in a search cluster is responsible for receiving the search request of the search processing strategy zone bit, corresponding processing is carried out according to the difference of the strategy marks;
and 4, step 4: returning a result, namely returning the processed search result to a search cluster repeater (Dispatcher) by the search service instance (Searcher), and returning the result to the corresponding search client initiating the request by the search cluster repeater;
and 5: the method comprises the steps of regularly monitoring a dynamic adjustment strategy, namely when a search cluster repeater (Dispatcher) regularly scans and confirms that an overload search request is slowed down or aggravated (when indexes related to load and access amount fluctuate at a limit value), carrying out fluctuation processing according to specific processing logic until the overload search request is ended, and recovering all indexes to be normal;
step 6: and (4) evaluating the influence in the recording process, namely recording all processing flows and related search request information by the search system, and automatically notifying related business affiliates by a background in an email mode after the surge is finished so as to evaluate the influence range for reducing the search complexity.
In the step 2, the method further comprises:
step 2-1: setting policy rules, including setting policy level indexes and setting user distinguishing processing rules;
the setting of the strategy level index is as follows: the administrator can configure the strategy into four levels, from level 1 to level 4, different levels are called according to the number of current search requests and the server load, and the data threshold value of each level can be manually configured by the administrator according to the strategy level index; the policy level indicators include: the current search request per second of the corresponding index is concurrent, and the average server load of the servers in the current search cluster in 5 minutes (taking the parameter value of the Loadaverage of the Linux operating system as a reference);
the setting of the user distinguishing processing rule refers to: an administrator can distinguish that a user sets different level rules, divide the user into a crawler identifier Spidername and a user identifier Username, and set the crawler identifier Spidername and the user identifier Username at different strategy levels;
step 2-2: 1-level strategy processing, namely after scanning the Load of a search server regularly, if the number of search requests exceeds the system preset maximum capacity value or the 5-minute Load Average value of the Load of the search server is greater than the total core number of a server CPU, starting a level 1-level corresponding strategy, identifying the search requests with Spidenname in all user identifications received by a Dispatcher by the strategy, filtering the search requests with OtherSpider flag bits, adding specific search processing strategy flag bits after the requests, and then handing the specific search processing strategy flag bits to corresponding search cluster service instances for processing, wherein the added search processing strategy flag bits are A;
step 2-3: after periodic scanning, when the number of search requests exceeds a system preset maximum capacity value or the 5-minute Load Average value of the Load of a search server is still larger than the total core number of a server CPU, a level 2 level corresponding strategy is started, all search requests with crawler identifications can be identified in all user identifications received by a Dispatcher by the strategy, and the priority of the Spidenname is not distinguished; then adding a specific search processing strategy flag bit after the requests and then handing the flag bit to a corresponding search cluster service instance for processing, wherein the added search processing strategy flag bit is A, and the original correspondingly added search flag bit in the step 2-2 is changed into B;
step 2-4: level 3 policy processing, i.e., after periodic scanning, the Dispatcher confirms that when the number of search requests exceeds the system preset maximum capacity value or when the 5 minute Load Average value of the search server Load is still greater than the total number of server CPU cores, that is, the level 3 corresponding policy is started, the policy will determine whether the user belongs to the registered user of the corresponding service platform in the user Cookie information received by the Dispatcher, if the user in the request is judged not to be the platform registered user, the user name of the user is marked as Other, the request is added with a mark bit of a specific search processing strategy and then is handed to a corresponding search cluster service instance for processing, the mark bit of the added search processing strategy is A, simultaneously, the step-by-step upgrading changes the original correspondingly added search mark bits in the step 2-3 into B, and changes the original correspondingly added search mark bits in the step 2-2 into C;
step 2-5: after regular scanning, the Dispatch confirms that when the number of search requests exceeds the system preset maximum capacity value or when the Load Average value of the search server in 5 minutes is still larger than the total core number of the CPU of the server, the 4-level strategy is started, the strategy adds a specific search processing strategy flag bit into a user request of Username Login and then delivers the user request to a corresponding search cluster service instance for processing after all user requests are received by the Dispatch, the added search processing strategy flag bit is A, the original correspondingly added search flag bit in the step 2-4 is changed into B, the original correspondingly added search flag bit in the step 2-3 is changed into C, and the original correspondingly added search flag bit in the step 2-2 is changed into D;
step 2-6: step-by-step strategy gear shifting, namely after periodic scanning, the Dispatch confirms that when the number of search requests exceeds the system preset maximum capacity value or when the 5-minute Load Average value of the search server Load is still larger than the total core number of the server CPU, the Dispatch sequentially and sequentially forwards the search strategy flag bits of the corresponding search requests in the steps 2-2,2-3,2-4 and 2-5 to D from A until all the search strategy flag bits after all the user requests except the service system registered user (namely the user of the Usermae Login) are D, and the Cookie can judge that the highest level of the search request strategy flag bits of the system registered user is promoted to C.
In step 3, the method further comprises:
step 3-1: loading a cache, namely, a Searcher receives a search request with a flag bit A of a specific search processing strategy, according to the flag bit, the search request is loaded from the cache of the current Searcher at first, and if the cache is not loaded, the search operation is carried out again, so that the search calculation complexity is reduced through cache calling;
step 3-2: the method comprises the following steps that a single-point lossy service, namely a Searcher receives a search request of a specific search processing strategy zone bit B, the search request can perform the single-point lossy service according to the zone bit, the single-point lossy service means that a distributed search mechanism is adopted for search service in a search cluster, namely, one-time search needs to respectively request 2 or more Searcher search instances to execute the search request, then merging processing is performed, when the Searcher starts the single-point lossy service, namely, the Searcher does not respectively request other Searcher instances, and the direct single-point Searcher returns after the execution is finished;
step 3-3: dimension reduction simplification, namely, a Searcher receives a search request of a flag bit C of a specific search processing strategy, and according to the flag bit, the search request can carry out dimension reduction lossy service, wherein the dimension reduction lossy service refers to that the calculation complexity of the search request of a user is reduced and simplified, and part of service logic which consumes longer time is skipped, for example, the service logic processing such as self-defined grading sorting, reduction of recall quantity and the like is omitted;
step 3-4: and returning an empty result, namely the Searcher receives a search request of a flag bit D of a specific search processing strategy, and according to the flag bit, the Searcher directly rejects the search request and returns the empty search result.
In the step 5, the method further comprises:
step 5-1: after periodic scanning, the Dispatch confirms that when the number of search requests is less than or equal to a system preset maximum capacity value or when a Load Average value of 5 minutes of the Load of a search server is less than or equal to the total number of cores of a server CPU, all the specific search processing strategy flag bits of the search requests of different corresponding processing levels are shifted down, and the shift-down rule is that D is gradually reduced to C, C is gradually reduced to B, and B is gradually reduced to A;
step 5-2: after periodic scanning, the Dispatcher determines that when the number of the search requests is still less than or equal to the system preset maximum capacity value or when the Load Average value of the search server in 5 minutes is still less than or equal to the total core number of the CPU of the server, all the processing marks after the search requests with the processing marks of A are removed, and the processing is converted into normal search processing;
step 5-3: and (3) repeatedly executing the rules according to the steps 5-1 and 5-2 until all processing marks are cleared, namely all search requests are normally processed, if in the next timing scanning period, the number of the search requests is confirmed to be larger than the preset maximum capacity value of the system or when the Load Average value of the search server in 5 minutes is larger than the total core number of the CPU of the server, the specific search processing strategy marks of all the search requests of different corresponding processing levels are subjected to upshifting, and the downshifting rule is that C is increased to D, B is increased to C, and A is increased to B.
In step 6, the method further comprises:
step 6-1: the search system records the e-mail contact ways of the affiliates of the corresponding search service, and all processing conditions are collated and summarized and reported to the relevant affiliates to evaluate the influence range; each time a specific search request processing report is generated
Step 6-2: the search system records the concurrent number of the server CPU load and the search request corresponding to the strategy level at that time, draws a columnar comparison graph and adds the columnar comparison graph into a processing report;
step 6-3: the search system also records the change condition of the processing mark of the search request corresponding to the strategy level at the moment, for example, when the change history condition from A to B is added into the processing report;
step 6-4: the processing report is sent to all the affiliates of the corresponding service in a mail mode;
in the step 2-1, the policy level index is set as follows:
1:[200,24];2:[250,24];3:[300,24];4:[400,32];
i.e. taking the meaning of the first parameter as an example, it means that the data threshold of the level 1 policy is more than 200 (including 200) requests per second, the 5 minute average server load of Cpu is more than 24, and so on, this parameter can support dynamic configuration;
the differentiated user handling rules are set as follows:
1:[Spidername=Other:*A*];
2:[Spidername=All:*A*];
3:[Username=Other:*A*];
4:[Username=Login:*A*]。
the invention achieves the following beneficial effects:
(1) the user identification information carried by the search client in the invention can be used as the rating of the user, all the classification and rating work of the user is completed in the Dispatcher in the search platform, the client does not need to perform other configuration, namely when a high-load search request occurs, the client does not need to intervene, and all the countermeasures are that the Dispatcher and Searcher instances in the search platform automatically complete the processing.
(2) The invention applies the concept of lossy service by combining different weights of user's influence on service, different ranges and different degrees of lossy service modes. The method is not simply a cutting fusing mode, but a step-type progressive-lifting lossy service mode is adopted to deal with the high-load search request, so that the range of the service affected by the lossy service is as small as possible.
(3) The invention utilizes the characteristics of the distributed search platform, and reduces the search complexity by utilizing the mode of reducing concurrent node recalls and adopting a single-node recall search result. Under the high load condition, the calculation complexity is reduced by limiting the recall range, meanwhile, the search accuracy is also ensured, and compared with a service mode of directly rejecting excess, the method is more friendly to a client.
(4) The invention provides a concept of passing dimension reduction service under the condition of high load by utilizing the calculation requirements of a search request including two parts of basic Lucene query and custom service scoring and sorting, namely, a search platform Searcher only completes the calculation amount including the basic Lucene query and abandons the calculation of the custom service scoring and sorting. By means of the dimension reduction service, the search complexity can be reduced, and the performance of coping with high-load search requests can be improved.
(5) The invention provides a method for sending detailed processes of automatically processing and reducing complexity of all systems to each preset service affiliate in a report processing mode after coping with a high-load search request, so that the service affiliate can know the influence range of the system on reducing the complexity and evaluate the influence on the service. In this way, the administrator can improve the way of configuring a specific policy, so that the whole search platform can better handle a high-load search request without additionally increasing the expenditure of a hardware server.
Drawings
FIG. 1 is a simplified process flow diagram of an exemplary embodiment of the present invention.
Detailed Description
The invention will be further described with reference to the drawings and the exemplary embodiments:
as shown in fig. 1, the present invention comprises the steps of:
step 1: identifying search requests
All search requests sent to a search service cluster need to carry specific user identification, wherein the identification comprises a user source IP, Cookie information logged by a user and a marker bit of a Spider engine, and all the search requests are submitted to a search cluster repeater (Dispatcher) for processing;
step 2: differentiating request and additional processing policy flags
A search cluster repeater (Dispatcher) distinguishes user priorities of the search requests according to a policy rule configured by an administrator, judges whether the number of the current search requests and the server load meet a surge response triggering standard, distinguishes different user search requests, attaches a specific search processing policy mark, and forwards the search requests to a certain search service instance (Searcher) in the cluster for processing;
and step 3: policy-based processing of requests
After a certain search service instance (Searcher) in the search cluster is responsible for receiving a search request of the search processing strategy mark bit, corresponding processing is carried out according to the difference of the strategy marks;
and 4, step 4: returning results
The search service instance (Searcher) returns the processed search result to the search cluster repeater (Dispatcher), and the search cluster repeater returns the result to the corresponding search client initiating the request;
and 5: periodic monitoring dynamic adjustment strategy
When a search cluster repeater (Dispatcher) regularly scans and confirms that an overload search request is slowed down or aggravated (when indexes related to load and access amount fluctuate at a limit value), fluctuation processing is carried out according to specific processing logic until the overload search request is ended, and all indexes return to normal;
step 6: recording process assessment impact
The search system records all processing flows and related search request information, and after the surge is finished, the background automatically notifies related business affiliates in an email mode so as to evaluate the influence range of reducing the search complexity.
In the step 2, the method further comprises:
step 2-1: setting policy rules including setting policy level indexes and setting user-distinguishing processing rules
Setting policy level index
The administrator can configure the strategy into four levels, from level 1 to level 4, different levels are called according to the number of current search requests and the server load, and the data threshold value of each level can be manually configured by the administrator according to the strategy level index; the policy level indicators include: the current search request per second of the corresponding index is concurrent, and the Average server Load of the servers in the current search cluster in 5 minutes (by taking the parameter value of Load Average of the Linux operating system as a reference); for example, the settings are as follows: 1 [200,24 ]; 2 [250,24 ]; 3, [300,24 ]; 4:[400,32]. Taking the meaning of the first parameter as an example, it means that the data threshold for the level 1 policy is more than 200 (including 200) requests per second, and the 5 minute average server load for Cpu exceeds 24. By analogy, this parameter may support dynamic configuration.
Setting differentiated user processing rules
The administrator can distinguish the user to set different level rules, and divide the user into a crawler identifier spiername and a user identifier useername, for example, if spiders of Google and Bing have high priority and Other spiders have low priority, the user identifier spiders are set on different policy levels, for example, the user identifier spiders are divided into a registered user Login and Other non-registered users Other according to different user levels.
The example settings are as follows: 1: [ spidiname ═ Other:a ];
2:[Spidername=All:*A*];
3:[Username=Other:*A*];
4:[Username=Login:*A*]
step 2-2: level 1 policy handling
After scanning the Load of the search server regularly, if the number of the search requests exceeds the system preset maximum capacity value or the 5-minute Load Average value of the Load of the search server is greater than the total core number of the CPU of the server, the level 1 level of the corresponding strategy is started, the strategy rule is predefined in the step 2-1, the search requests with Spidenname are identified in all the user identifications received by the Dispatcher, and the search requests with the OtherSpider flag bit are filtered by taking the 1 level of the user processing strategy rule 1 illustrated in the step 2-1 as an example [ Spidenname is Otherna ] "as an example, and then the specific search processing strategy flag bit is added after the requests and then is delivered to the corresponding search cluster service instance for processing, wherein the added search processing strategy flag bit is A;
step 2-3: level 2 policy handling
After periodic scanning, when the number of search requests exceeds a system preset maximum capacity value or a Load Average value of a search server Load for 5 minutes is still larger than the total core number of a server CPU, a level 2 level corresponding strategy is started, taking a level 2 user processing strategy rule 1 [ spidinname: [ All:a ] "as an example in step 2-1, the strategy can identify All search requests with crawler identifications in All user identifications received by a Dispatcher, and does not distinguish the spidinnames; then adding specific search processing strategy flag bits after the requests and then handing the flag bits to corresponding search cluster service instances for processing, wherein the added search processing strategy flag bits are A, and the original correspondingly added search flag bits in the step 2-2 are changed into B, so that the step-by-step upgrading strategy processing in the figure-1 is realized;
step 2-4: level 3 policy processing
After periodic scanning, the Dispatch confirms that when the number of search requests exceeds the system preset maximum capacity value or when the Load Average value of the search server in 5 minutes is still larger than the total core number of the CPU of the server, the level 3 level corresponding strategy is started, taking the 3 level user processing strategy rule' 3: [ user name: A > ] as an example in step 2-1, the strategy judges whether the user belongs to the registered user of the corresponding service platform in the user Cookie information received by the Dispatch, if the user is judged to be not a platform registered user, the user name of the user is marked as Other, the request is added with a specific search processing strategy mark bit and then delivered to the corresponding search cluster service instance for processing, the added search processing strategy mark bit is A, and the step-by-step upgrading is carried out to change the original correspondingly added search mark bit in step 2-3 into B, simultaneously changing the original correspondingly added search mark bits in the step 2-2 into C;
step 2-5: level 4 policy handling
After periodic scanning, the Dispatcher confirms that when the number of search requests exceeds the system preset maximum capacity value or when the 5 minute Load Average value of the search server Load is still greater than the total number of server CPU cores, that is, the level 4 handling policy is activated, and the level 4 user handling policy rule "1: "Username ═ Login: a is taken as an example, after all user requests received by the Dispatcher, the strategy adds a specific search processing strategy flag bit into a registered user identifier, namely a user request with user name being Login, and then delivers the specific search processing strategy flag bit to a corresponding search cluster service instance for processing, wherein the added search processing strategy flag bit is A, simultaneously changing the original correspondingly added search mark bits in the step 2-4 into B, simultaneously changing the original correspondingly added search mark bits in the step 2-3 into C, and simultaneously changing the original correspondingly added search mark bits in the step 2-2 into D;
step 2-6: step-by-step strategic gear shifting
After periodic scanning, the Dispatcher confirms that when the number of search requests exceeds the system preset maximum capacity value or when the Load Average value of the search server for 5 minutes is still larger than the total core number of the server CPU, the Dispatcher sequentially and sequentially forwards the search strategy flag bits of the corresponding search requests in steps 2-2,2-3,2-4,2-5 from a to D until all the search strategy flag bits after all the user requests except the service system registered user (i.e. the user with Usernmae Login) are D, and the highest level of the search request strategy flag bits in the Cookie can be judged as the system registered user is increased to C.
In step 3, the method further comprises:
step 3-1: load cache
The Searcher receives a search request with a flag bit A of a specific search processing strategy, the search request is loaded from a cache of the current Searcher firstly according to the flag bit, and if the cache is not loaded, the search operation is carried out again, so that the search calculation complexity is reduced through cache calling;
step 3-2: single point lossy service
The Searcher receives a search request of a flag bit B of a specific search processing strategy, the search request can carry out single-point lossy service according to the flag bit, the single-point lossy service means that because a distributed search mechanism is adopted for search service in a search cluster, 2 or more than 2 Searcher search instances are respectively requested for one-time search and then are combined, when the Searcher starts the single-point lossy service, namely the Searcher does not respectively request other Searcher instances, the Searcher returns after directly finishing the execution of the single-point Searcher, and the search computation complexity is reduced; but the returned search results are less than the normal search processing results, but the search results can still meet the condition that most search requests are returned;
step 3-3: reduction of dimension and simplification
The Searcher receives a search request of a flag bit C of a specific search processing strategy, and according to the flag bit, the search request can perform dimension reduction lossy service, wherein the dimension reduction lossy service refers to that the calculation complexity of the search request of a user is reduced and simplified, part of service logic which consumes longer time is skipped, for example, service logic processing such as self-defined grading sorting, reduction of recall quantity and the like is omitted, the accuracy of a returned search result is deviated relative to the accuracy of a normal search processing result, but a certain degree of correlation can be ensured, and the search calculation complexity is reduced;
step 3-4: returning null results
The Searcher receives a search request of a flag bit D of a specific search processing strategy, and according to the flag bit, the Searcher directly rejects the search request and returns an empty search result so as to reduce the system load.
In the step 5, the method further comprises:
step 5-1: after periodic scanning, the Dispatch confirms that when the number of search requests is less than or equal to a system preset maximum capacity value or when a Load Average value of 5 minutes of the Load of a search server is less than or equal to the total number of cores of a server CPU, all the specific search processing strategy flag bits of the search requests of different corresponding processing levels are shifted down, and the shift-down rule is that D is gradually reduced to C, C is gradually reduced to B, and B is gradually reduced to A;
step 5-2: after periodic scanning, the Dispatcher determines that when the number of the search requests is still less than or equal to the system preset maximum capacity value or when the Load Average value of the search server in 5 minutes is still less than or equal to the total core number of the CPU of the server, all the processing marks after the search requests with the processing marks of A are removed, and the processing is converted into normal search processing;
step 5-3: and (3) repeatedly executing the rules according to the steps 5-1 and 5-2 until all processing marks are cleared, namely all search requests are normally processed, if in the next timing scanning period, the number of the search requests is confirmed to be larger than the preset maximum capacity value of the system or when the Load Average value of the search server in 5 minutes is larger than the total core number of the CPU of the server, the specific search processing strategy marks of all the search requests of different corresponding processing levels are subjected to upshifting, and the downshifting rule is that C is increased to D, B is increased to C, and A is increased to B.
In step 6, the method further comprises:
step 6-1: the search system records the e-mail contact ways of the affiliates of the corresponding search service, because the search complexity is reduced to deal with certain loss of search accuracy caused by overload search requests, the service is possibly influenced, and all processing conditions need to be collated and summarized to be reported to the relevant affiliates to evaluate the influence range; each time a specific search request processing report is generated
Step 6-2: the search system records the concurrent number of the server CPU load and the search request corresponding to the strategy level at that time, draws a columnar comparison graph and adds the columnar comparison graph into a processing report;
step 6-3: the search system also records the change condition of the processing mark of the search request corresponding to the strategy level at the moment, for example, when the change history condition from A to B is added into the processing report;
step 6-4: the processing report is sent to all the affiliates of the corresponding service in a mail mode.
In other words, the invention aims at identifying and distinguishing users through specific strategies aiming at practical technical application scenes, and all received search requests of a search platform comprise information of Cookie, IP and crawler zone bits of the users; different levels of the users can be distinguished through the information, and different corresponding strategy levels facing overload search requests can be set through dividing the different levels of the users; different coping processing marks can be added according to different coping strategy levels; after the coping processing marks are attached to the corresponding search requests of the search platform, the search platform can be distinguished and processed, and complexity reduction in different degrees is carried out;
the reduction of the search complexity can be realized by calling a cache, reducing concurrent collection nodes and reducing the custom sequencing calculation of the search request; the related range of the lossy service can be controlled under the minimum influence degree by matching with the differentiation of corresponding different user groups instead of adopting a cutting mode; meanwhile, by setting a specific strategy mode, the degree of reducing the search complexity can be dynamically adjusted along with the concurrent load condition of the search request; that is, the degree and range of the reduction of the search complexity are increased when the overload request duration period is increased, and the degree and range of the reduction of the search complexity are reduced when the overload request duration period is decreased; until finally quitting the corresponding intervention for reducing the search complexity completely, and sending the condition of the corresponding intervention to the related business affiliate in a report processing way;
the invention is mainly used for providing a method for reducing the search complexity based on a specific strategy, and has the advantages that:
(1) the user identification information carried by the search client in the invention can be used as the rating of the user, all the classification and rating work of the user is completed in the Dispatcher in the search platform, the client does not need to perform other configuration, namely when a high-load search request occurs, the client does not need to intervene, and all the countermeasures are that the Dispatcher and Searcher instances in the search platform automatically complete the processing.
(2) The invention applies the concept of lossy service by combining different weights of user's influence on service, different ranges and different degrees of lossy service modes. The method is not simply a cutting fusing mode, but a step-type progressive-lifting lossy service mode is adopted to deal with the high-load search request, so that the range of the service affected by the lossy service is as small as possible.
(3) The invention utilizes the characteristics of the distributed search platform, and reduces the search complexity by utilizing the mode of reducing concurrent node recalls and adopting a single-node recall search result. Under the high load condition, the calculation complexity is reduced by limiting the recall range, meanwhile, the search accuracy is also ensured, and compared with a service mode of directly rejecting excess, the method is more friendly to a client.
(4) The invention provides a concept of passing dimension reduction service under the condition of high load by utilizing the calculation requirements of a search request including two parts of basic Lucene query and custom service scoring and sorting, namely, a search platform Searcher only completes the calculation amount including the basic Lucene query and abandons the calculation of the custom service scoring and sorting. By means of the dimension reduction service, the search complexity can be reduced, and the performance of coping with high-load search requests can be improved.
(5) The invention provides a method for sending detailed processes of automatically processing and reducing complexity of all systems to each preset service affiliate in a report processing mode after coping with a high-load search request, so that the service affiliate can know the influence range of the system on reducing the complexity and evaluate the influence on the service. In this way, the administrator can improve the way of configuring a specific policy, so that the whole search platform can better handle a high-load search request without additionally increasing the expenditure of a hardware server.
The above embodiments do not limit the present invention in any way, and all other modifications and applications that can be made to the above embodiments in equivalent ways are within the scope of the present invention.

Claims (4)

1. A method for reducing search complexity based on a specific strategy is characterized by comprising the following steps:
step 1: identifying search requests, namely all search requests sent to a search service cluster need to carry specific user identifiers, wherein the user identifiers comprise user source IP, Cookie information logged by a user and a crawler engine flag bit, and all the search requests are submitted to a search cluster transponder for processing;
step 2: distinguishing the requests and adding processing strategy marks, namely, the search cluster repeater distinguishes the user priorities of the search requests according to strategy rules configured by an administrator, judges whether the number of the current search requests and the server load meet the standard of triggering surge coping, distinguishes different user search requests and adds specific search processing strategy marks, and then forwards the search cluster repeater to the search service instance in the cluster for processing;
and step 3: strategy processing requests, namely, after a search service instance in a search cluster is responsible for receiving a search request of a specific search processing strategy mark, corresponding processing is carried out according to the difference of the specific search processing strategy mark;
in step 3, the method further comprises: loading a cache, namely, a search service instance receives a search request with a specific search processing strategy mark A, the search request with the specific search processing strategy mark A is loaded from the cache of the current search service instance firstly, and if the cache cannot load the result of the search request, the re-search operation is carried out, and the search calculation complexity is reduced through cache calling;
the single-point lossy service is characterized in that a search service instance receives a search request of a specific search processing strategy mark B, and the search request with the specific search processing strategy mark B carries out the single-point lossy service; the single-point lossy service is characterized in that a distributed search mechanism is adopted for searching services in a cluster, namely, more than 2 search service instances are required to be respectively requested for one-time search, the search service instances perform merging processing after executing a search request, and when the single-point lossy service is started, the search service instances do not respectively request other search service instances, and the single search service instance is directly returned after being executed;
the dimensionality reduction simplification is to reduce the dimensionality of the calculation complexity of the search request of a user;
returning an empty result, namely the search service instance receives a search request with a specific search processing strategy mark D, the search service instance directly rejects the search request with the specific search processing strategy mark D, and the empty search result is returned;
and 4, step 4: returning results, namely, the search service instance returns the processed search results to the search cluster repeater, and the search cluster repeater returns the results to the search client corresponding to the initiated request;
and 5: monitoring a dynamic adjustment strategy regularly, namely when a search cluster repeater scans and confirms that an overload search request is slowed down or added, and indexes related to load and access amount fluctuate within a threshold value, carrying out fluctuation processing according to specific processing logic until the overload search request is ended, and recovering all indexes to be normal;
step 6: recording the process and evaluating the influence, namely recording all processing flows and related search request information by the search system, and automatically notifying related business affiliates by a background in an email mode after the surge is finished so as to evaluate the influence range for reducing the search complexity.
2. A method for reducing search complexity based on a particular strategy according to claim 1, characterized by: in the step 2, the method further comprises:
step 2-1: setting policy rules, including setting policy level indexes and setting user distinguishing processing rules;
the setting of the strategy level index is as follows: the administrator configuration strategy is divided into four levels, from level 1 to level 4, different levels of calling are carried out according to the number of current search requests and the server load, the data threshold value of each level is manually configured by the administrator according to a strategy level index, and the strategy level index comprises: the search request per second of the current corresponding index is concurrent, the average server load of the servers in the current search cluster in 5 minutes is referred to the parameter value of the average server load of the Linux operating system;
the setting of the user distinguishing processing rule refers to: the administrator distinguishes users to set different level rules, divides the users into a crawler identifier Spidername and a user identifier Username, and corresponds to different strategy levels;
step 2-2: 1-level strategy processing, namely after periodically scanning the load of a search server, a search cluster repeater confirms that the number of search requests exceeds a system preset maximum capacity value or the 5-minute average server load value of the load of the search server is greater than the total core number of a server CPU (central processing unit), namely, a level 1-level coping strategy is started, the level 1 coping strategy identifies the search requests with crawler identifications in all user identifications received by the search cluster repeater, the search requests with the crawler identifications are filtered out, then the search requests with the crawler identifications are added into specific search processing strategy marks and then are delivered to corresponding search cluster service instances for processing, and the added specific search processing strategy marks are A;
step 2-3: 2-level strategy processing, namely after periodic scanning, a search cluster repeater confirms that when the number of search requests exceeds a system preset maximum capacity value or an average server load value of 5 minutes of search server load is still greater than the total core number of a server CPU, a level 2-level coping strategy is started, the level 2 coping strategy identifies all search requests with crawler identifications in all user identifications received by the search cluster repeater, and the priorities of the crawler identifications are not distinguished; then adding a specific search processing strategy mark into the search request with the crawler mark and then handing the search request to a corresponding search cluster service instance for processing, wherein the added specific search processing strategy mark is A, and the original correspondingly added search mark bit in the step 2-2 is changed into B;
step 2-4: after regular scanning, the search cluster repeater confirms that when the number of search requests exceeds the maximum capacity value preset by the system or when the 5-minute average server load value of the search server load is still greater than the total core number of the server CPU, the level 3 coping strategy is started, the level 3 coping strategy judges whether the user belongs to a registered user of a corresponding service platform or not in the Cookie information of the user received by the search cluster repeater, if the user in the request is judged to be not a platform registered user, the user identification of the user is Other, the requests are added into a specific search processing strategy mark and then are handed to a corresponding search cluster service instance for processing, the added specific search processing strategy mark is A, and meanwhile, the step-by-step upgrading changes the originally correspondingly added search mark bit in the step 2-3 into B, simultaneously changing the original correspondingly added search mark bits in the step 2-2 into C;
step 2-5: after regular scanning, the search cluster repeater determines that when the number of search requests exceeds the system preset maximum capacity value or when the 5-minute average server load value of the search server load is still greater than the total core number of the server CPUs, the 4-level coping strategy is started, after the search cluster repeater receives all user requests, a registered user identifier, namely a user request of Username = Login, is added with a specific search processing strategy mark and then is delivered to a corresponding search cluster service instance for processing, the added specific search processing strategy mark is A, meanwhile, the original correspondingly added search mark bit in the step 2-4 is changed to B, and meanwhile, the original correspondingly added search mark bit in the step 2-3 is changed to C, simultaneously changing the original correspondingly added search mark bits in the step 2-2 into D;
step 2-6: step-by-step strategy shifting, namely after periodic scanning, confirming that when the number of search requests exceeds a system preset maximum capacity value or when an average server load value of 5 minutes of search server load is still larger than the total core number of server CPUs, the search cluster forwarder sequentially forwards specific search processing strategy marks of corresponding search requests in the steps 2-2,2-3,2-4 and 2-5 to D from A in sequence until the specific search processing strategy marks after all user requests except service system registered users, namely users with Username = Login, are all D, and judging that the highest level of the specific search processing strategy marks of the search requests of the system registered users in Cookie is promoted to C.
3. A method for reducing search complexity based on a particular strategy according to claim 2, characterized in that:
in the step 5, the method further comprises: step 5-1: after periodic scanning, the search cluster transponder confirms that when the number of search requests is less than or equal to a system preset maximum capacity value or when an average server load value of 5 minutes of search server loads is less than or equal to the total number of cores of a server CPU, all the specific search processing strategy marks of the search requests of different corresponding processing levels are subjected to downshift, and the downshift rule is that D is gradually reduced to C, C is gradually reduced to B, and B is gradually reduced to A; step 5-2: after periodic scanning, the search cluster repeater confirms that when the number of search requests is still less than or equal to the system preset maximum capacity value or when the 5-minute average server load value of the search server load is still less than or equal to the total core number of the server CPU, the search strategy marks of all the search requests with the specific search strategy marks of A are removed, and the search processing is switched to normal search processing;
step 5-3: and repeating the execution of the rules according to the steps 5-1 and 5-2 until all processing marks are cleared, namely all search requests are normally processed, and if the number of the search requests is confirmed to be larger than the preset maximum capacity value of the system in the next timing scanning period or when the 5-minute average server load value of the search server load is larger than the total core number of the server CPUs, the specific search processing strategy marks of all the search requests of different corresponding processing levels are subjected to the upshifting, wherein the upshifting rule is from C to D, from B to C, and from A to B.
4. A method for reducing search complexity based on a particular strategy according to claim 3, characterized by: in step 6, the method further comprises:
step 6-1: the search system records the e-mail contact ways of the affiliates of the corresponding search service, and reports all processing conditions to relevant affiliates through sorting and summarizing so as to evaluate the influence range; generating a specific search request processing report every time of surge processing;
step 6-2: the search system records the concurrent number of the server CPU load and the search request corresponding to the strategy level at that time, draws a columnar comparison graph and adds the columnar comparison graph into a processing report;
step 6-3: the search system also records the change condition of the processing mark of the search request corresponding to the strategy grade at the moment, including when the history condition changes from A to B, and adds the history condition into the processing report;
step 6-4: the processing report is sent to all the affiliates of the corresponding service in a mail mode.
CN201810999884.4A 2018-08-30 2018-08-30 Method for reducing search complexity based on specific strategy Active CN109190004B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810999884.4A CN109190004B (en) 2018-08-30 2018-08-30 Method for reducing search complexity based on specific strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810999884.4A CN109190004B (en) 2018-08-30 2018-08-30 Method for reducing search complexity based on specific strategy

Publications (2)

Publication Number Publication Date
CN109190004A CN109190004A (en) 2019-01-11
CN109190004B true CN109190004B (en) 2020-07-07

Family

ID=64917232

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810999884.4A Active CN109190004B (en) 2018-08-30 2018-08-30 Method for reducing search complexity based on specific strategy

Country Status (1)

Country Link
CN (1) CN109190004B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113869976A (en) * 2021-09-26 2021-12-31 中国联合网络通信集团有限公司 Cargo list generation method and device, server and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1627688A (en) * 2003-12-10 2005-06-15 联想(北京)有限公司 Method for searching sharing files under wireless network grid
CN103346971A (en) * 2013-06-19 2013-10-09 华为技术有限公司 Data forwarding method, controller, forwarding device and system
CN103428101A (en) * 2013-08-01 2013-12-04 华为技术有限公司 Load sharing method and device
US9038079B2 (en) * 2009-12-30 2015-05-19 International Business Machines Corporation Reducing cross queue synchronization on systems with low memory latency across distributed processing nodes
CN106407011A (en) * 2016-09-20 2017-02-15 焦点科技股份有限公司 A routing table-based search system cluster service management method and system
CN106453564A (en) * 2016-10-18 2017-02-22 北京京东尚科信息技术有限公司 Elastic cloud distributed massive request processing method, device and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101379757B (en) * 2006-02-07 2011-12-07 思科技术公司 Methods and systems for providing telephony services and enforcing policies in a communication network
US20150117217A1 (en) * 2011-07-15 2015-04-30 Telefonaktiebolaget L M Ericsson (Publ) Policy Tokens in Communication Networks

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1627688A (en) * 2003-12-10 2005-06-15 联想(北京)有限公司 Method for searching sharing files under wireless network grid
US9038079B2 (en) * 2009-12-30 2015-05-19 International Business Machines Corporation Reducing cross queue synchronization on systems with low memory latency across distributed processing nodes
CN103346971A (en) * 2013-06-19 2013-10-09 华为技术有限公司 Data forwarding method, controller, forwarding device and system
CN103428101A (en) * 2013-08-01 2013-12-04 华为技术有限公司 Load sharing method and device
CN106407011A (en) * 2016-09-20 2017-02-15 焦点科技股份有限公司 A routing table-based search system cluster service management method and system
CN106453564A (en) * 2016-10-18 2017-02-22 北京京东尚科信息技术有限公司 Elastic cloud distributed massive request processing method, device and system

Also Published As

Publication number Publication date
CN109190004A (en) 2019-01-11

Similar Documents

Publication Publication Date Title
CN106375472B (en) Access request processing method, device and server
CN109240830B (en) Application intelligent request management based on server health and client information
US8918792B2 (en) Workflow monitoring and control system, monitoring and control method, and monitoring and control program
US7734676B2 (en) Method for controlling the number of servers in a hierarchical resource environment
CN108173698B (en) Network service management method, device, server and storage medium
CN110417903B (en) Information processing method and system based on cloud computing
US10171620B2 (en) Non-transitory computer-readable recording medium having stored therein control program, control apparatus and control method
EP1564638A1 (en) A method of reassigning objects to processing units
US9405834B1 (en) System and method for identifying search results satisfying a search query
US20050060497A1 (en) Selectively accepting cache content
CN106230997A (en) A kind of resource regulating method and device
CA2830360C (en) Information monitoring apparatus and information monitoring method
CN109190004B (en) Method for reducing search complexity based on specific strategy
WO2022062981A1 (en) Resource scheduling method and system, electronic device, and computer-readable storage medium
US20050089063A1 (en) Computer system and control method thereof
CN117707763A (en) Hierarchical calculation scheduling method, system, equipment and storage medium
CN110752941A (en) QOS control method and device of cloud storage system, storage medium and server
US20050060496A1 (en) Selectively caching cache-miss content
CN111444183B (en) Distributed self-adaptive user request scheduling method in key value storage system
JP2018025944A (en) Resource control program, resource control method and resource controller
CN111382196B (en) Distributed accounting processing method and system
CN113535038A (en) Front-end menu tree generation method and device, computer equipment and storage medium
CN108243348A (en) A kind of stream process asks distribution server
CN112783673A (en) Method and device for determining call chain, computer equipment and storage medium
Fu et al. Task assignment strategy for overloaded systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant