CN110309390B - Index reduction method and device suitable for search and server - Google Patents

Index reduction method and device suitable for search and server Download PDF

Info

Publication number
CN110309390B
CN110309390B CN201810214501.8A CN201810214501A CN110309390B CN 110309390 B CN110309390 B CN 110309390B CN 201810214501 A CN201810214501 A CN 201810214501A CN 110309390 B CN110309390 B CN 110309390B
Authority
CN
China
Prior art keywords
search
host
search system
hosts
returned
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810214501.8A
Other languages
Chinese (zh)
Other versions
CN110309390A (en
Inventor
王俊杰
胡健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN201810214501.8A priority Critical patent/CN110309390B/en
Publication of CN110309390A publication Critical patent/CN110309390A/en
Application granted granted Critical
Publication of CN110309390B publication Critical patent/CN110309390B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides an index column-narrowing method, an index column-narrowing device and a server suitable for searching. The method comprises the following steps: receiving a search request of a user; sending the search request to a first search system and a second search system; receiving at least one first search result returned by the first search system and at least one second search result returned by the second search system; and determining the number of second search results to be returned by each second host in the second search system by comparing at least one first search result with at least one second search result. According to the embodiment of the invention, two search systems are arranged, one search system is used as a reference search system, and the number of the search results which are required to be returned by each host with higher performance in the other search system is adjusted by comparing the search results respectively returned by the two search systems, so that the same number of index data can be distributed to fewer hosts, the number of hosts is saved, and the resource utilization rate is improved.

Description

Index reduction method and device suitable for search and server
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to an index column-shrinking method, an index column-shrinking device and a server suitable for searching.
Background
In the prior art, the main working processes of the search engine include: the method comprises the following main processes of grabbing, storing, page analyzing, indexing, retrieving and the like. In order to improve the coverage rate of search results, a search system may capture a large number of documents from the internet and construct index data, and in the process of constructing the index data, because the number of documents captured by the search system is large, one host may not be able to store so many documents, so it is considered that a large number of documents captured by the search system are distributed on a plurality of different hosts.
With the improvement of the performance of the hosts, the number of documents which can be stored by one host is continuously increased, and when index data is constructed, if the number of the set hosts is large, the number of the documents which are actually stored by each host is possibly small, and the utilization rate of host resources is not high; if the number of hosts is set to be small, documents actually stored by each host may be large, the number of search results returned by each host may also be increased, the performance of each host may be reduced, and the correlation between the search results of each host and the user's needs may be low. Therefore, an effective method is lacked in the prior art, so that the performance of the host is not seriously affected and more search results can be returned on the premise of ensuring that the search effect is not reduced.
Disclosure of Invention
The embodiment of the invention provides an index column-narrowing method, an index column-narrowing device and a server suitable for searching, so as to improve the resource utilization rate.
In a first aspect, an embodiment of the present invention provides an index column reduction method suitable for search, including:
receiving a search request of a user;
sending the search request to a first search system and a second search system, wherein the first search system comprises at least one first host, the second search system comprises at least one second host, the number of the first hosts in the first search system is greater than that of the second hosts in the second search system, and the performance of the first hosts is lower than that of the second hosts;
receiving at least one first search result returned by the first search system and at least one second search result returned by the second search system;
and determining the number of second search results required to be returned by each second host in the second search system by comparing the at least one first search result with the at least one second search result.
In a second aspect, an embodiment of the present invention provides an index reduction apparatus suitable for search, including:
the receiving module is used for receiving a search request of a user;
a sending module, configured to send the search request to a first search system and a second search system, where the first search system includes at least one first host, the second search system includes at least one second host, the number of the first hosts in the first search system is greater than the number of the second hosts in the second search system, and the performance of the first hosts is lower than that of the second hosts;
the receiving module is further used for receiving at least one first search result returned by the first search system and at least one second search result returned by the second search system;
a comparison module for comparing the at least one first search result and the at least one second search result;
and the determining module is used for comparing the at least one first search result with the at least one second search result through the comparing module, and determining the number of second search results required to be returned by each second host in the second search system.
In a third aspect, an embodiment of the present invention provides a server, including:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the method of the first aspect.
According to the index column-narrowing method, the index column-narrowing device and the server suitable for searching provided by the embodiment of the invention, two searching systems are arranged, one searching system is used as a reference searching system, and the number of searching results which are required to be returned by each host with higher performance in the other searching system is adjusted by comparing the searching results respectively returned by the two searching systems, so that the same number of index data can be distributed to fewer hosts, the number of hosts is saved, and the resource utilization rate is improved.
Drawings
Fig. 1 is a schematic diagram of a communication system provided by an embodiment of the present invention;
FIG. 2 is a flowchart of an index reduction method suitable for searching according to an embodiment of the present invention;
FIG. 3 is a flowchart of the operation of a search engine provided by an embodiment of the present invention;
FIG. 4 is a diagram illustrating index data distribution provided by an embodiment of the present invention;
FIG. 5 is a diagram illustrating index data distribution provided by an embodiment of the present invention;
FIG. 6 is a diagram illustrating index data distribution provided by an embodiment of the present invention;
FIG. 7 is a flowchart of an index reduction method for search according to another embodiment of the present invention;
FIG. 8 is a diagram illustrating index data distribution provided by an embodiment of the present invention;
FIG. 9 is a schematic structural diagram of an index reduction apparatus suitable for search according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of a server according to an embodiment of the present invention.
With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The index reduction method suitable for searching provided by the invention can be applied to the communication system shown in figure 1. As shown in fig. 1, the communication system includes: access network device 11, terminal device 12 and server 13. It should be noted that the communication System shown in fig. 1 may be applicable to different network formats, for example, may be applicable to Global System for Mobile communication (GSM), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Time Division-Synchronous Code Division Multiple Access (TD-SCDMA), Long Term Evolution (Long Term Evolution, LTE), and future 5G network formats. Optionally, the communication system may be a system in a scenario of high-reliability and Low-Latency Communications (URLLC) transmission in a 5G communication system.
Therefore, optionally, the access Network device 11 may be a Base Station (BTS) and/or a Base Station Controller in GSM or CDMA, a Base Station (NodeB, NB) and/or a Radio Network Controller (RNC) in WCDMA, an evolved Node B (eNB or eNodeB) in LTE, or a relay Station or an access point, or a Base Station (gbb) in a future 5G Network, and the present invention is not limited thereto.
The terminal device 12 may be a wireless terminal or a wired terminal. A wireless terminal may refer to a device that provides voice and/or other traffic data connectivity to a user, a handheld device having wireless connection capability, or other processing device connected to a wireless modem. A wireless terminal, which may be a mobile terminal such as a mobile telephone (or "cellular" telephone) and a computer having a mobile terminal, for example, a portable, pocket, hand-held, computer-included, or vehicle-mounted mobile device, may communicate with one or more core Network devices via a Radio Access Network (RAN), and may exchange language and/or data with the RAN. For another example, the Wireless terminal may also be a Personal Communication Service (PCS) phone, a cordless phone, a Session Initiation Protocol (SIP) phone, a Wireless Local Loop (WLL) station, a Personal Digital Assistant (PDA), and other devices. A wireless Terminal may also be referred to as a system, a Subscriber Unit (Subscriber Unit), a Subscriber Station (Subscriber Station), a Mobile Station (Mobile), a Remote Station (Remote Station), a Remote Terminal (Remote Terminal), an Access Terminal (Access Terminal), a User Terminal (User Terminal), a User Agent (User Agent), and a User Device or User Equipment (User Equipment), which are not limited herein. Optionally, the terminal device 12 may also be a smart watch, a tablet computer, or the like.
The invention provides an index column-narrowing method suitable for searching, which aims to solve the technical problems in the prior art.
The following describes the technical solutions of the present invention and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
Fig. 2 is a flowchart of an index reduction method suitable for search according to an embodiment of the present invention. The embodiment of the invention provides an index column-shrinking method suitable for searching aiming at the technical problems in the prior art, and the method comprises the following specific steps:
step 201, receiving a search request of a user.
The main working process of the search engine comprises the following steps: several main processes of crawling, storing, page analyzing, indexing, retrieving, etc., as shown in fig. 3, a search system crawls web page data from the internet, the webpage data can be documents, the search system stores the captured webpage data in a distributed file system, and index data is constructed based on the captured web page data, and a search system captures a large amount of web page data from the internet under a normal condition, so that the index data may not be completely stored on one host, therefore, the constructed index data needs to be distributed, for example, to different hosts 1, 2, …, n, after receiving the search request of the user, the online service system sends the search request to host 1, host 2, …, and host n, and host 1, host 2, …, and host n return the search result to the online service system.
As shown in fig. 1, the server 13 may specifically be an online service system in a search system, and the terminal device 12 may log in the server 13 through the access network device 11, for example, a browser is installed in the terminal device 12, and a user may browse a web page of the server 13 through the browser, and after the user inputs a search keyword in the browser and clicks a search key, the terminal device 12 sends a search request to the server 13 through the access network device 11, where the search request includes the search keyword.
Step 202, sending the search request to a first search system and a second search system, where the first search system includes at least one first host, the second search system includes at least one second host, the number of the first hosts in the first search system is greater than the number of the second hosts in the second search system, and the performance of the first hosts is lower than that of the second hosts.
In this embodiment, if the search system captures a large amount of web page data from the internet, a large amount of index data is created based on the captured web page data, and thus the large amount of index data cannot be completely stored in one host, and therefore, it is considered that the large amount of index data is stored in a plurality of different hosts, for example, as shown in fig. 4, 1200 pieces of index data are assumed to be created, and are distributed to 4 hosts, for example, host 1, host 2, host 3, and host 4, and host 1, host 2, host 3, and host 4 store 300 pieces of index data, respectively.
As the performance of the hosts increases, the number of hosts required to store the same amount of index data, for example, 1200 pieces of index data, decreases, and thus 1200 pieces of index data can be distributed to a smaller number of better-performing hosts, for example, the host 5 and the host 6 shown in fig. 4, and optionally, the host 5 and the host 6 store 600 pieces of index data, respectively. In this embodiment, the host 1, the host 2, the host 3, and the host 4 are denoted as a first search system 41, the host 5 and the host 6 are denoted as a second search system 42, each of the host 1, the host 2, the host 3, and the host 4 is denoted as a first host, and each of the host 5 and the host 6 is denoted as a second host; the performance of the host 1, the performance of the host 2, the performance of the host 3, and the performance of the host 4 are the same, the performance of the host 5, and the performance of the host 6 are the same, and the performance of the host 1, the performance of the host 2, the performance of the host 3, and the performance of the host 4 are lower than the performance of the host 5, and the performance of the host 6.
When the online service system receives a search request of a user, the online service system may transmit the search request to the first search system 41 and the second search system 42, respectively, that is, the online service system may transmit the search request to the hosts 1, 2, 3, 4, 5, and 6, respectively.
Step 203, receiving at least one first search result returned by the first search system and at least one second search result returned by the second search system.
After the first search system 41 receives the search request, the host 1, the host 2, the host 3, and the host 4 respectively perform search, and return search results, for example, the host 1, the host 2, the host 3, and the host 4 respectively return 10 search results, the first search system 41 searches for 40 search results in total, and the first search system 41 returns the 40 search results to the online service system.
After the second search system 42 receives the search request, the host 5 and the host 6 respectively perform a search and return search results, for example, the host 5 and the host 6 respectively return 15 search results, the second search system 42 searches 30 search results altogether, and the second search system 42 returns the 30 search results to the online service system.
Step 204, determining the number of second search results to be returned by each second host in the second search system by comparing the at least one first search result with the at least one second search result.
Since the performance of the hosts 5 and 6 is higher than that of the hosts 1, 2, 3 and 4, the hosts 5 and 6 can store more index data, and although the number of the search results returned by each of the hosts 5 and 6 is greater than that of the search results returned by each of the hosts 1, 2, 3 and 4, the search results searched by the second search system 42 together are less than the search results searched by the first search system 41 together. In this embodiment, the first search system 41 may serve as a reference search system, the number of hosts in the first search system 41 may be fixed, the number of hosts in the second search system 42 may be variable, and the number of search results returned by each host in the second search system 42 may also be adjustable. The present embodiment determines the number of search results to be returned by each host in the second search system 42 by comparing the search results of the first search system 41 and the search results of the second search system 42.
Optionally, the determining, by comparing the at least one first search result with the at least one second search result, the number of second search results that each second host in the second search system needs to return includes: and determining the number of second search results required to be returned by each second host in the second search system by comparing the number of the first search results with the number of the second search results.
For example, the first search system 41 searches for 40 pieces of search results in total, and the first search system 41 returns the 40 pieces of search results to the online service system; the second search system 42 searches for a total of 30 search results, and the second search system 42 returns the 30 search results to the online service system. With the first search system 41 as a reference search system, the number of search results that the second search system 42 needs to return is not less than the number of search results that the first search system 41 returns, that is, the number of search results that the second search system 42 should return is not less than 40, and since the second search system 42 includes two hosts, it can be determined that the number of search results that each host in the second search system 42 needs to return is 20.
In other embodiments, the determining the number of second search results to be returned by each second host in the second search system by comparing the at least one first search result with the at least one second search result includes: determining a first degree of match of the at least one first search result with the search request and a second degree of match of the at least one second search result with the search request; and determining the number of second search results required to be returned by each second host in the second search system by comparing the first matching degree with the second matching degree and comparing the number of the first search results with the number of the second search results.
For example, the first search system 41 searches for 40 pieces of search results in total, and the first search system 41 returns the 40 pieces of search results to the online service system; the second search system 42 searches for a total of 30 search results, and the second search system 42 returns the 30 search results to the online service system. The online service system may score the 40 search results returned by the first search system 41 through machine learning, or may score the 40 search results returned by the first search system 41 through manual evaluation, for example, the online service system may score the top 10 search results of the 40 search results returned by the first search system 41 through machine learning. Similarly, the online service system may score the 30 search results returned by the second search system 42 through machine learning, and may also score the 30 search results returned by the second search system 42 through manual evaluation, for example, the online service system may score the top 10 search results of the 30 search results returned by the second search system 42 through machine learning. Whether machine learning or manual evaluation is adopted, the matching degree between the search result and the search request is mainly detected, and if the score of the search result is higher, the higher the matching degree between the search result and the search request is, the more the search result meets the requirements of the user.
The score obtained by the online service system after scoring the top 10 search results in the 40 search results returned by the first search system 41 through machine learning can be used for representing the matching degree between the 40 search results returned by the first search system 41 and the search request; similarly, the score obtained by the online service system scoring the top 10 search results in the 30 search results returned by the second search system 42 through machine learning can be used to indicate the matching degree between the 30 search results returned by the second search system 42 and the search request. On the premise that the matching degree between the 40 search results returned by the first search system 41 and the search request is less than or equal to the matching degree between the 30 search results returned by the second search system 42 and the search request, the number of search results that the second search system 42 should return may be adjusted to be not less than 40, and since the second search system 42 includes two hosts, the number of search results that each host in the second search system 42 needs to return may be adjusted to be 20.
In general, when a certain amount of index data is distributed to a plurality of different hosts, the number of hosts occupied by the certain amount of index data is recorded as the number of index columns, as shown in fig. 4, in the first search system 41, the number of index columns is 4; in the second search system 42, the number of index columns is 2. The number of index columns of the second search system 42 is reduced compared to the first search system 41.
According to the embodiment of the invention, two search systems are arranged, one search system is used as a reference search system, and the number of the search results which are required to be returned by each host with higher performance in the other search system is adjusted by comparing the search results respectively returned by the two search systems, so that the same number of index data can be distributed to fewer hosts, the number of hosts is saved, and the resource utilization rate is improved.
On the basis of the above embodiment, the method further includes: and adjusting the number of second search results required to be returned by each second host in the second search system so as to enable the resource utilization rate of each second host to be highest, wherein the second matching degree is greater than or equal to the first matching degree.
As shown in fig. 4, since the performance of the hosts 5 and 6 is higher than that of the hosts 1, 2, 3, and 4, the hosts 5 and 6 can store more index data, and accordingly, the hosts 5 and 6 can return more search results, as described in the above embodiment, the number of the search results returned by the hosts 5 and 6 respectively can be adjusted to 20, so that the number of the search results returned by the hosts 5 and 6 respectively is not lower than that returned by the hosts 1, 2, 3, and 4 respectively.
In this embodiment, the number of search results returned by the hosts 5 and 6, respectively, may be further adjusted, for example, the number of search results returned by the hosts 5 and 6, respectively, may also be increased, but when the number of search results returned by the hosts 5 and 6, respectively, increases, the CPU occupancy rates corresponding to the hosts 5 and 6, respectively, may decrease, that is, the performance of the hosts 5 and 6 may decrease, therefore, in this embodiment, the number of search results that need to be returned by each host of the second search system 42 and the performance of the host may be balanced under the condition that the matching degree between the search results returned by the second search system 42 and the search request is greater than or equal to the matching degree between the search results returned by the first search system 41 and the search request, for example, taking the host 5 in the second search system 42 as an example, the current time t1 adjusts the number of search results that need to be returned by the host 5 to 20, detecting the performance index of the host 5 during searching; adjusting the number of search results required to be returned by the host 5 to 21 at the next time t2, and detecting the performance index of the host 5 during searching; and at the next time t3, adjusting the number of search results that need to be returned by the host 5 to 22, detecting the performance index of the host 5 during searching, and so on. It can be understood that, as the number of search results returned by the host 5 increases continuously, the performance index of the host 5 during searching decreases continuously, and in this embodiment, the number of search results returned by the host 5 can be increased as much as possible on the premise that the performance index of the host 5 is not lower than the preset index, so that the resource utilization rate of the host 5 is the highest. For example, the weighted value 1 may be obtained by performing weighted calculation on the number of search results that need to be returned by the host 5 at time t1 and the performance index of the host 5 during searching; weighting calculation is carried out on the number of search results needing to be returned by the host 5 at the time t2 and the performance index of the host 5 during searching to obtain a weighted value 2; and weighting the number of the search results required to be returned by the host 5 at the time t3 and the performance index of the host 5 during searching to obtain a weighted value 3, and so on. If a maximum value is selected from the weighted values of 1, 2, 3, for example, the weighted value 3 is the maximum, the number of search results to be returned by the host 5 is further adjusted to 22 based on the above embodiment.
In this embodiment, the number of the second search results that need to be returned by each second host in the second search system is adjusted, so that the resource utilization rate of each second host is the highest, and the second matching degree is greater than or equal to the first matching degree, so that the resource utilization rate of each second host can be further improved under the condition of saving the number of the second hosts.
In some other embodiments, the constructed index data may be further classified and then distributed, for example, as shown in fig. 5, the originally constructed index data is 2400 pieces of index data, and according to the quality of the 2400 pieces of index data, the 2400 pieces of index data are classified, for example, the 2400 pieces of index data are divided into 1200 pieces of index data with low quality and 1200 pieces of index data with high quality, where the 1200 pieces of index data with low quality may be denoted as cluster0, the 1200 pieces of index data with high quality may be denoted as cluster1, further, the cluster0 is distributed to four second hosts with lower performance, such as the host 12, the host 22, the host 32, and the host 42, and the cluster1 is distributed to four second hosts with lower performance, such as the host 11, the host 21, the host 31, and the host 41, and the specific distribution principle and process are consistent with the above embodiments, optionally, the host 12, the host 22, the host 32, and the host 32, The host 42, the host 11, the host 21, the host 31, and the host 41 constitute the first search system 41 as described in the above embodiment.
For example, as shown in fig. 6, the originally constructed index data is 2400 pieces of index data, and according to the quality of the 2400 pieces of index data, the 2400 pieces of index data are classified, for example, the 2400 pieces of index data are divided into 1200 pieces of index data with low quality and 1200 pieces of index data with high quality, where the 1200 pieces of index data with low quality can be denoted as cluster0, the 1200 pieces of index data with high quality can be denoted as cluster1, further, the cluster0 is distributed to two second hosts with higher performance, such as the host 62 and the host 52, and the cluster1 is distributed to two second hosts with higher performance, such as the host 61 and the host 51, and the specific distribution principle and process are consistent with the above-described embodiment, and optionally, the host 62, the host 52, the host 61, and the host 51 form the second search system 42 as described in the above-described embodiment. The method and principle of adjusting the number of search results to be returned by each host in the second search system 42 shown in fig. 6 by using the first search system 41 shown in fig. 5 as a reference search system are the same as those in the above embodiments, and will not be described herein again.
FIG. 7 is a flowchart of an index reduction method suitable for searching according to another embodiment of the present invention. On the basis of the foregoing embodiment, the index column reduction method suitable for search provided in this embodiment specifically includes the following steps:
step 701, sending the search request to a third search system, where the third search system includes at least one second host, the number of the second hosts in the third search system is not equal to the number of the second hosts in the second search system, and the number of the first hosts in the first search system is greater than the number of the second hosts in the third search system.
As shown in fig. 4, the second search system 42 includes two hosts, e.g., host 5 and host 6, i.e., when the performance of hosts 5, 6 is higher than that of hosts 1, 2, 3, 4, the same amount of index data may be distributed over the two higher performing hosts. In this embodiment, the same amount of index data is distributed over three higher performing hosts. As shown in fig. 8, the third search system 43 includes a host 5, a host 6, and a host 7, wherein the performance of the host 5, the host 6, and the host 7 is the same, and the performance of the host 5, the host 6, and the host 7 is higher than the performance of the host 1, the host 2, the host 3, and the host 4. For example, the total number of index data is 1200, and in fig. 4, the host 5 and the host 6 store 600 pieces of index data, respectively; in fig. 8, the host 5, the host 6, and the host 7 store 400 pieces of index data, respectively.
When the online service system receives a search request sent by a user, the online service system may also send the search request to the third search system 43, i.e., to host 5, host 6, and host 7, respectively.
And step 702, receiving at least one third search result returned by the third search system.
When host 5, host 6, and host 7 receive the search request, respectively, host 5, host 6, and host 7 perform a search, respectively, and return the search results to the online service system. For example, host 5, host 6, and host 7 each return 10 search results, and third search system 43 returns 30 search results to the online service system.
In this embodiment, the first search system 41 may serve as a reference search system, the number of search results returned by each host in the first search system 41 may be fixed, and the number of search results returned by each host in the third search system 43 may be adjustable. In this embodiment, the number of search results that need to be returned by each host in the third search system 43 is determined by comparing the search result of the first search system 41 with the search result of the third search system 43, and specifically, the method and the principle for determining the number of search results that need to be returned by each host in the third search system 43 are consistent with the method and the principle for determining the number of search results that need to be returned by each host in the second search system 42 described in the foregoing embodiment, and are not described here again. For example, the number of search results to be returned by each host in the third search system 43 is 15, that is, the third search system 43 needs to return 45 search results to the online service system.
Step 703, determining the number of second hosts that the second search system needs to include according to the third matching degree between the at least one third search result and the search request, the performance index of each second host in the third search system, the number of second hosts in the third search system, the second matching degree between the at least one second search result and the search request, the performance index of each second host in the second search system, and the number of second hosts in the second search system.
The online service system may also score 45 search results returned by the third search system 43 through machine learning, or may score 45 search results returned by the third search system 43 through manual evaluation, for example, the online service system may score the top 10 search results of the 45 search results returned by the third search system 43 through machine learning. The score obtained by the online service system scoring the top 10 search results in the 45 search results returned by the third search system 43 through machine learning can be used to represent the matching degree between the 45 search results returned by the third search system 43 and the search request.
As shown in fig. 4, the second search system 42 includes two hosts, the third search system 43 includes three hosts, and for the same amount of index data, for example, 1200 pieces of index data, each host in the second search system 42 needs to store 600 pieces of index data, each host in the third search system 43 needs to store 400 pieces of index data, and the number of search results to be returned by each host in the second search system 42 is greater than the number of search results to be returned by each host in the third search system 43, then during the search process, the performance of each host in the second search system 42 will be lower than that of each host in the third search system 43, and in order to determine whether 1200 pieces of index data are distributed over 2 hosts or 3 hosts, the second search system 42 and the third search system 43 need to be balanced, optionally, the online service system may match the 45 pieces of search results returned by the third search system 43 with the search request, The number of hosts in the third search system 43 and the performance index of each host in the third search system 43 during searching are weighted and summed to obtain a weighted value a. In addition, the online service system performs weighted summation on the matching degree between the search result returned by the second search system 42 and the search request, the number of hosts in the second search system 42, and the performance index of each host in the second search system 42 during searching, so as to obtain a weighted value B. By comparing the weight value a with the weight value B, if the weight value a is greater than the weight value B, which indicates that the resource utilization rate of the second search system 42 is greater than the resource utilization rate of the third search system 43, 1200 pieces of index data are determined to be distributed to the hosts 5 and 6 in the second search system 42 in the manner shown in fig. 4. If the weight value a is less than the weight value B, indicating that the resource utilization of the second search system 42 is less than the resource utilization of the third search system 43, then 1200 pieces of index data are determined to be distributed to the hosts 5, 6, and 7 in the third search system 43 in the manner shown in fig. 8.
Optionally, the first search system, the second search system, and the third search system store the same index data. For example, the first search system 41, the second search system 42, and the third search system 43 each store 1200 pieces of index data.
In addition, in this embodiment, specific structures of the first search system, the second search system, and the third search system are not limited, and the number of hosts included in the first search system, the second search system, and the third search system, respectively, is also not limited, and here, it is only schematically illustrated that the resource utilization rate is the maximum when the same number of index data is distributed to several hosts, and the resource utilization rate is the maximum when each host returns several search results.
The embodiment of the invention determines that the index data is stored in the second search system or the third search system by comparing the resource utilization rates of the second search system and the third search system, namely determines that the index data needs to be distributed on a plurality of hosts, so that the resource utilization rate can be highest.
Fig. 9 is a schematic structural diagram of an index reduction apparatus suitable for search according to an embodiment of the present invention. The index reduction device suitable for search provided in the embodiment of the present invention may execute the processing flow provided in the embodiment of the index reduction method suitable for search, and as shown in fig. 9, the index reduction device 90 suitable for search includes: a receiving module 91, a sending module 92, a comparing module 93 and a determining module 94; the receiving module 91 is configured to receive a search request of a user; the sending module 92 is configured to send the search request to a first search system and a second search system, where the first search system includes at least one first host, the second search system includes at least one second host, the number of the first hosts in the first search system is greater than the number of the second hosts in the second search system, and the performance of the first hosts is lower than that of the second hosts; the receiving module 91 is further configured to receive at least one first search result returned by the first search system and at least one second search result returned by the second search system; the comparison module 93 is configured to compare the at least one first search result with the at least one second search result; the determining module 94 is configured to compare the at least one first search result with the at least one second search result through the comparing module, and determine the number of second search results that need to be returned by each second host in the second search system.
Optionally, the comparing module 93 is specifically configured to compare the number of the first search results with the number of the second search results; the determining module 94 is specifically configured to compare the number of the first search results with the number of the second search results through the comparing module, and determine the number of the second search results that need to be returned by each second host in the second search system.
Optionally, the determining module 94 is further configured to: determining a first degree of match of the at least one first search result with the search request and a second degree of match of the at least one second search result with the search request; the comparing module 93 is further configured to: comparing the first and second degrees of match, and comparing the number of first search results to the number of second search results; the determining module 94 is specifically configured to compare the first matching degree and the second matching degree, compare the number of the first search results and the number of the second search results, and determine the number of the second search results that need to be returned by each second host in the second search system.
Optionally, the index reduction device 90 adapted for searching further includes: an adjustment module 95; the adjusting module 95 is configured to adjust the number of second search results that need to be returned by each second host in the second search system, so that the resource utilization rate of each second host is the highest, and the second matching degree is greater than or equal to the first matching degree.
Optionally, the sending module 92 is further configured to: sending the search request to a third search system, wherein the third search system comprises at least one second host, the number of the second hosts in the third search system is different from the number of the second hosts in the second search system, and the number of the first hosts in the first search system is larger than the number of the second hosts in the third search system; the receiving module 91 is further configured to: receiving at least one third search result returned by the third search system; the determination module 94 is further configured to: and determining the number of the second hosts required to be included by the second search system according to the third matching degree of the at least one third search result and the search request, the performance index of each second host in the third search system, the number of the second hosts in the third search system, the second matching degree of the at least one second search result and the search request, the performance index of each second host in the second search system and the number of the second hosts in the second search system.
Optionally, the first search system, the second search system, and the third search system store the same index data.
The index reduction apparatus suitable for search in the embodiment shown in fig. 9 can be used to implement the technical solution of the above method embodiment, and the implementation principle and technical effect are similar, and are not described herein again.
Fig. 10 is a schematic structural diagram of a server according to an embodiment of the present invention. The server provided by the embodiment of the present invention may execute the processing flow provided by the embodiment of the index reduction method suitable for search, as shown in fig. 10, the server 100 includes a memory 101, a processor 102, a computer program, and a communication interface 103; wherein the computer program is stored in the memory 101 and configured to be executed by the processor 102 for the index reduction method suitable for searching described in the above embodiments.
In addition, the present embodiment also provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the index reduction method suitable for search described in the above embodiments.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (14)

1. An index reduction method suitable for searching, comprising:
receiving a search request of a user, wherein the search request comprises a search keyword;
sending the search request to a first search system and a second search system, wherein the first search system comprises at least one first host, the second search system comprises at least one second host, the number of the first hosts in the first search system is greater than that of the second hosts in the second search system, and the performance of the first hosts is lower than that of the second hosts;
receiving at least one first search result returned by the first search system and at least one second search result returned by the second search system;
and determining the number of second search results required to be returned by each second host in the second search system by comparing the at least one first search result with the at least one second search result.
2. The method of claim 1, wherein determining the number of second search results to be returned by each second host in the second search system by comparing the at least one first search result with the at least one second search result comprises:
and determining the number of second search results required to be returned by each second host in the second search system by comparing the number of the first search results with the number of the second search results.
3. The method of claim 1, wherein determining the number of second search results to be returned by each second host in the second search system by comparing the at least one first search result with the at least one second search result comprises:
determining a first degree of match of the at least one first search result with the search request and a second degree of match of the at least one second search result with the search request;
and determining the number of second search results required to be returned by each second host in the second search system by comparing the first matching degree with the second matching degree and comparing the number of the first search results with the number of the second search results.
4. The method of claim 3, further comprising:
and adjusting the number of second search results required to be returned by each second host in the second search system so as to enable the resource utilization rate of each second host to be highest, wherein the second matching degree is greater than or equal to the first matching degree.
5. The method according to any one of claims 1-4, further comprising:
sending the search request to a third search system, wherein the third search system comprises at least one second host, the number of the second hosts in the third search system is different from the number of the second hosts in the second search system, and the number of the first hosts in the first search system is larger than the number of the second hosts in the third search system;
receiving at least one third search result returned by the third search system;
and determining the number of the second hosts required to be included by the second search system according to the third matching degree of the at least one third search result and the search request, the performance index of each second host in the third search system, the number of the second hosts in the third search system, the second matching degree of the at least one second search result and the search request, the performance index of each second host in the second search system and the number of the second hosts in the second search system.
6. The method of claim 5, wherein the first search system, the second search system, and the third search system store the same index data.
7. An index reduction apparatus adapted for searching, comprising:
the receiving module is used for receiving a search request of a user, wherein the search request comprises search keywords;
a sending module, configured to send the search request to a first search system and a second search system, where the first search system includes at least one first host, the second search system includes at least one second host, the number of the first hosts in the first search system is greater than the number of the second hosts in the second search system, and the performance of the first hosts is lower than that of the second hosts;
the receiving module is further used for receiving at least one first search result returned by the first search system and at least one second search result returned by the second search system;
a comparison module for comparing the at least one first search result and the at least one second search result;
and the determining module is used for comparing the at least one first search result with the at least one second search result through the comparing module, and determining the number of second search results required to be returned by each second host in the second search system.
8. The apparatus according to claim 7, wherein the comparing module is specifically configured to compare the number of the first search results with the number of the second search results;
the determining module is specifically configured to compare, by the comparing module, the number of the first search results with the number of the second search results, and determine the number of the second search results that need to be returned by each second host in the second search system.
9. The apparatus for index reduction adapted to search of claim 7, wherein the determining module is further configured to: determining a first degree of match of the at least one first search result with the search request and a second degree of match of the at least one second search result with the search request;
the comparison module is further configured to: comparing the first and second degrees of match, and comparing the number of first search results to the number of second search results;
the determining module is specifically configured to determine, through the comparing module, the number of second search results that each second host in the second search system needs to return by comparing the first matching degree with the second matching degree, and comparing the number of the first search results with the number of the second search results.
10. The apparatus for index reduction suitable for search of claim 9, further comprising:
and the adjusting module is used for adjusting the number of second search results required to be returned by each second host in the second search system so as to enable the resource utilization rate of each second host to be highest, and the second matching degree is greater than or equal to the first matching degree.
11. The apparatus for index reduction adapted to search according to any one of claims 7-10, wherein the sending module is further configured to: sending the search request to a third search system, wherein the third search system comprises at least one second host, the number of the second hosts in the third search system is different from the number of the second hosts in the second search system, and the number of the first hosts in the first search system is larger than the number of the second hosts in the third search system;
the receiving module is further configured to: receiving at least one third search result returned by the third search system;
the determination module is further to: and determining the number of the second hosts required to be included by the second search system according to the third matching degree of the at least one third search result and the search request, the performance index of each second host in the third search system, the number of the second hosts in the third search system, the second matching degree of the at least one second search result and the search request, the performance index of each second host in the second search system and the number of the second hosts in the second search system.
12. The apparatus of claim 11, wherein the first search system, the second search system and the third search system store the same index data.
13. A server, comprising:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-6.
14. A computer-readable storage medium, having stored thereon a computer program for execution by a processor to perform the method of any one of claims 1-6.
CN201810214501.8A 2018-03-15 2018-03-15 Index reduction method and device suitable for search and server Active CN110309390B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810214501.8A CN110309390B (en) 2018-03-15 2018-03-15 Index reduction method and device suitable for search and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810214501.8A CN110309390B (en) 2018-03-15 2018-03-15 Index reduction method and device suitable for search and server

Publications (2)

Publication Number Publication Date
CN110309390A CN110309390A (en) 2019-10-08
CN110309390B true CN110309390B (en) 2021-10-08

Family

ID=68073324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810214501.8A Active CN110309390B (en) 2018-03-15 2018-03-15 Index reduction method and device suitable for search and server

Country Status (1)

Country Link
CN (1) CN110309390B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104123329A (en) * 2013-04-25 2014-10-29 北京千橡网景科技发展有限公司 Search method and device
CN106649870A (en) * 2017-01-03 2017-05-10 山东浪潮商用系统有限公司 Distributed implementation method for search engine
CN106776299A (en) * 2016-11-30 2017-05-31 努比亚技术有限公司 Search engine test device and method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10127314B2 (en) * 2012-03-21 2018-11-13 Apple Inc. Systems and methods for optimizing search engine performance
CN103488687A (en) * 2013-09-02 2014-01-01 用友软件股份有限公司 Searching system and searching method of big data
CN104699825B (en) * 2015-03-30 2016-10-05 北京奇虎科技有限公司 The balancing method of Performance of Search Engine and device
US20170091326A1 (en) * 2015-09-30 2017-03-30 Linkedln Corporation Managing search engines using dynamic similarity
CN105447187B (en) * 2015-12-15 2017-09-22 广州神马移动信息科技有限公司 Web search method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104123329A (en) * 2013-04-25 2014-10-29 北京千橡网景科技发展有限公司 Search method and device
CN106776299A (en) * 2016-11-30 2017-05-31 努比亚技术有限公司 Search engine test device and method
CN106649870A (en) * 2017-01-03 2017-05-10 山东浪潮商用系统有限公司 Distributed implementation method for search engine

Also Published As

Publication number Publication date
CN110309390A (en) 2019-10-08

Similar Documents

Publication Publication Date Title
US20110065424A1 (en) System and method to facilitate downloading data at a mobile wireless device
CN109118360B (en) Block chain account checking method, device, equipment and storage medium
US20140108966A1 (en) Method, sharing platform, and system for sharing image-editing action
US20150139074A1 (en) Adaptive Generation of Network Scores From Crowdsourced Data
CN105142146B (en) Authentication method, device and system for WIFI hotspot access
KR20180099841A (en) Method and apparatus for handling short-cut links,
CN101000623A (en) Method for image identification search by mobile phone photographing and device using the method
CN104169946B (en) Extensible queries for visual search
CN111611225A (en) Data storage management method, query method, device, electronic equipment and medium
KR20140094001A (en) Method and server for searching for nearby user in social network
CN102256337B (en) Message processing method and equipment in wireless local area network (WLAN)
CN102571820B (en) For transmitting the method for data, compression service device and terminal
EP2698015A1 (en) Positioning in a cellular communication network
WO2018000684A1 (en) Traffic–based capacity expansion method and apparatus
WO2022057379A1 (en) Data storage adjustment method and apparatus, and computer device
CN103020208B (en) A kind of searching method and device being adapted with mobile terminal
CN113329477A (en) Cell residence method and device, modem, terminal equipment and storage medium
CN102222095B (en) Equipment for converting webpage to be displayed and method thereof
CN103079194B (en) Service adaptation method, apparatus and system
CN110309390B (en) Index reduction method and device suitable for search and server
US9195716B2 (en) Techniques for ranking character searches
CN111625600A (en) Data storage processing method, system, computer equipment and storage medium
CN112579853A (en) Method and device for sequencing crawling links and storage medium
CN108122123B (en) Method and device for expanding potential users
WO2015073753A1 (en) Adaptive generation of network scores from crowdsourced data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200420

Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant after: Alibaba (China) Co.,Ltd.

Address before: 510627 Guangdong city of Guangzhou province Whampoa Tianhe District Road No. 163 Xiping Yun Lu Yun Ping square B radio tower 13 layer self unit 01

Applicant before: GUANGZHOU SHENMA MOBILE INFORMATION TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant