WO2020052067A1 - Information search method and device - Google Patents

Information search method and device Download PDF

Info

Publication number
WO2020052067A1
WO2020052067A1 PCT/CN2018/116342 CN2018116342W WO2020052067A1 WO 2020052067 A1 WO2020052067 A1 WO 2020052067A1 CN 2018116342 W CN2018116342 W CN 2018116342W WO 2020052067 A1 WO2020052067 A1 WO 2020052067A1
Authority
WO
WIPO (PCT)
Prior art keywords
search result
supplementary
search
target
subset
Prior art date
Application number
PCT/CN2018/116342
Other languages
French (fr)
Chinese (zh)
Inventor
何轶
李磊
宗显子
汤颢
郑光果
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2020052067A1 publication Critical patent/WO2020052067A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation

Definitions

  • Embodiments of the present application relate to the field of computer technology, and in particular, to a method and an apparatus for searching information.
  • the search results corresponding to the input search information are generally directly selected according to the search algorithm.
  • the information obtained after adjusting the search information multiple times may not be directly searchable, so the search results obtained may not be comprehensive.
  • the embodiments of the present application provide a method and device for searching information.
  • an embodiment of the present application provides a method for searching information.
  • the method includes: using the obtained search information to search in a target information database to obtain a search result set; selecting a search result subset from the search result set, And determining the search result that belongs to the target search result in the search result subset as supplementary search information; performing the following search step: determining the union of the search result and the search result set searched by the supplementary search information in the target information database as the supplementary search result set ; Determine whether the supplementary search result set satisfies a preset convergence condition; in response to determining that the supplementary search result set does not meet the preset convergence condition, selecting a supplementary search result subset from the supplementary search result set, and belonging the supplementary search result subset to the target
  • the supplementary search result of the search result is determined as the supplementary search information
  • the supplementary search result set is determined as the search result set, and the foregoing search step is continued.
  • the above search step further includes: in response to determining that the supplementary search result set satisfies the above-mentioned convergence condition, determining a target search result set from the supplementary search result set.
  • determining the search results in the search result subset that belong to the target search result as supplementary search information includes: obtaining annotation information of the search results in the search result subset, wherein the annotation information is used to indicate whether the search result is a target Search results; according to the labeled information, the search results in the search result subset that belong to the target search results are determined.
  • the method further includes: determining a correlation degree between the search results in the search result set, where the correlation degree is used to represent the relationship between the search result and the search information. Relevance.
  • the relevance of the search results is determined by the following steps: determining the similarity between the search results and the search information; and determining the relevance of the search results according to the similarity, wherein, Similarity is directly proportional to relevance.
  • the method further includes: responding to the supplementary search results in the supplementary search result set, in response to the The supplementary search result exists in the search result set, and the correlation degree of the supplementary search result is updated according to a preset correlation degree improvement algorithm, and the updated correlation degree is used as the correlation degree of the supplementary search result; Exist in the search result set and determine the relevance of the supplementary search result.
  • the determining the correlation degree of the supplementary search result includes: determining a similarity between the supplementary search result and the search information; and determining a correlation degree of the supplementary search result according to the similarity, where the similarity is related to the correlation degree Directly proportional.
  • the above determining the target search result set from the supplementary search result set includes: selecting the target number of supplementary search results from the supplementary search result set as the target search results in the order of the degree of relevance from the supplementary search result set to obtain the target search. Result set.
  • the method further includes: dividing the search result set into corresponding target number search result subsets according to a preset number of target relevance intervals; and selecting the search result subset from the search result set, including : Select search results from the target number of search result subsets to obtain search result subsets.
  • the method further comprises: determining an accuracy of the search result subset of the target number of search result subsets, wherein the accuracy is used to represent a proportion of the target search results in the search result subset.
  • the method further includes: dividing the supplementary search result set into Corresponding target number of supplementary search result subsets; and selecting supplementary search result subsets from the supplementary search result set, including: separately selecting supplementary search results from the target number of supplementary search result subsets to obtain a supplementary search result subset.
  • the method further comprises: determining an accuracy of the supplementary search result subset of the target number of supplementary search result sets, where the accuracy is used to represent a proportion of target search results in the supplementary search result subset.
  • the above-mentioned selecting supplementary search results from the target number of supplementary search result subsets respectively includes: for the supplementary search subset of the target number of supplementary search result subsets, determining the relevance degree of the supplementary search result subset.
  • the absolute value of the difference between the accuracy of the search result set corresponding to the interval and the preset accuracy threshold; based on the absolute value and the number of supplementary search results contained in the supplementary search result subset, determining the sub-search results from the supplementary search result set The number of centrally selected supplementary search results.
  • the absolute value is inversely proportional to the number of supplementary search results selected.
  • the number of supplementary search results included is directly proportional to the number of supplementary search results selected.
  • the number of search results in the search result set and the number of supplementary search results in the supplementary search result subset are the same.
  • the convergence condition includes: a difference between the accuracy of the search result subset and the supplementary search result subset corresponding to the target relevance degree interval is less than a preset difference threshold.
  • the convergence condition includes: the number of search results included in the difference between the search result set and the supplementary search result set is less than a preset number difference threshold.
  • an embodiment of the present application provides an apparatus for searching for information.
  • the apparatus includes: a searching unit configured to search in a target information database using the obtained search information to obtain a search result set; and a selecting unit, which is It is configured to select a search result subset from the search result set, and determine the search results in the search result subset that belong to the target search result as supplementary search information; the supplementary search unit is configured to perform the following search steps: placing the supplementary search information in the target information The union of the search results and the search result set searched in the library is determined as the supplementary search result set; determining whether the supplementary search result set satisfies a preset convergence condition; and a determining unit configured to respond to determining that the supplementary search result set does not satisfy the preset Condition of convergence, select a supplementary search result subset from the supplementary search result set, and determine the supplementary search results belonging to the target search result in the supplementary search result subset as supplementary search information, determine the supplementary search result set as the search result set,
  • the above-mentioned supplementary search unit is further configured to determine a target search result set from the supplementary search result set in response to determining that the supplementary search result set satisfies the above-mentioned convergence condition.
  • the above-mentioned selecting unit is further configured to: obtain annotation information of the search results in the search result subset, wherein the annotation information is used to indicate whether the search result is a target search result; and determine the search result subset according to the annotation information Search results that belong to the target search result.
  • the apparatus further includes a relevance degree determining unit configured to determine a relevance degree of the search results in the search result set, wherein the relevance degree is used to indicate a relevance degree of the search results and the search information.
  • the above-mentioned relevance determination unit is further configured to: determine a similarity between the search result and the search information; and determine a relevance of the search result according to the similarity, wherein the similarity is directly proportional to the relevance.
  • the above-mentioned correlation degree determining unit is further configured to: for the supplementary search result set in the supplementary search result set, in response to the supplementary search result existing in the search result set, update the supplement according to a preset correlation degree improvement algorithm.
  • the relevance of the search results and the updated relevance are taken as the relevance of the supplementary search results; in response to the supplementary search results not being present in the search result set, the relevance of the supplementary search results is determined.
  • the above-mentioned relevance determination unit is further configured to: determine a similarity between the supplementary search result and the search information; and determine a relevance between the supplementary search results according to the similarity, wherein the similarity is directly proportional to the relevance .
  • the above-mentioned supplementary search unit is further configured to select the target number of supplementary search results from the supplementary search result set as the target search results in order of the relevance degree from large to small to obtain the target search result set.
  • the apparatus further includes a dividing unit configured to divide the search result set into a corresponding number of target search result subsets according to a preset number of target relevance intervals; and the above-mentioned selecting unit, further It is configured to select the search results from the target number of search result subsets respectively to obtain the search result subsets.
  • the apparatus further includes: an accuracy determination unit configured to determine the accuracy of the search result subset in the search result subset of the target number, where the accuracy is used to represent the target search result in the search result subset Percentage.
  • the above-mentioned dividing unit is further configured to divide the supplementary search result set into corresponding number of supplementary search result subsets according to the above-mentioned relevance degree interval; and the above-mentioned determination unit is further configured to: from the number of targets The supplementary search result subsets are selected from each of the supplementary search result subsets to obtain the supplementary search result subsets.
  • the accuracy determination unit is further configured to determine the accuracy of the supplementary search result subset in the target number of supplementary search result sets, where the accuracy is used to represent the target search result location in the supplementary search result subset. Percentage.
  • the above-mentioned determining unit is further configured to: for the supplementary search subset of the target number of supplementary search result subsets, determine the accuracy and An absolute value of a difference between a preset accuracy threshold; determining the number of supplementary search results selected from the supplementary search result subset according to the absolute value and the number of supplementary search results included in the supplementary search result subset, wherein The absolute value is inversely proportional to the number of selected supplementary search results, and the number of supplementary search results included is directly proportional to the number of supplementary search results selected.
  • the number of search results in the search result set and the number of supplementary search results in the supplementary search result subset are the same.
  • the convergence condition includes: a difference between the accuracy of the search result subset and the supplementary search result subset corresponding to the target relevance degree interval is less than a preset difference threshold.
  • the convergence condition includes: the number of search results included in the difference between the search result set and the supplementary search result set is less than a preset number difference threshold.
  • an embodiment of the present application provides an electronic device.
  • the electronic device includes: one or more processors; a storage device configured to store one or more programs;
  • the processor executes such that one or more processors implement the method as described in any implementation of the first aspect.
  • an embodiment of the present application provides a computer-readable medium having stored thereon a computer program that, when executed by a processor, implements the method as described in any implementation manner of the first aspect.
  • the method and device for searching information provided in the embodiments of the present application can obtain a search result by directly sampling a part of the search result and selecting a search result belonging to the target search result to perform a search again. After that, according to the search result of the search again Determine whether to continue sampling and searching, so as to increase the number of search results and improve the coverage of search results.
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application can be applied;
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application can be applied;
  • FIG. 2 is a flowchart of an embodiment of a method for searching information according to the present application
  • FIG. 3 is a flowchart of still another embodiment of a method for searching information according to the present application.
  • FIG. 4 is a flowchart of another embodiment of a method for searching information according to the present application.
  • FIG. 5 is a schematic diagram of an application scenario of a method for searching information according to an embodiment of the present application
  • FIG. 6 is a schematic structural diagram of an embodiment of an apparatus for searching information according to the present application.
  • FIG. 7 is a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.
  • FIG. 1 illustrates an exemplary architecture 100 of an embodiment of a method for searching information or a device for searching information to which the present application can be applied.
  • the system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105.
  • the network 104 is a medium for providing a communication link between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the terminal devices 101, 102, 103 interact with the server 105 through the network 104 to receive or send messages and the like.
  • Various client applications can be installed on the terminal devices 101, 102, and 103, such as a web browser application, a shopping application, a search application, and an instant communication tool.
  • the terminal devices 101, 102, and 103 may be hardware or software.
  • the terminal devices 101, 102, and 103 can be various electronic devices that support data storage and data exchange, including but not limited to smartphones, tablets, e-book readers, laptop computers, and desktop computers. Wait.
  • the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (such as multiple software or software modules used to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
  • the server 105 may be a server that provides various services, for example, a search server that performs a search in the target information base according to the search information sent by the terminal devices 101, 102, and 103, and returns search results to the terminal devices 101, 102, and 103.
  • a search server that performs a search in the target information base according to the search information sent by the terminal devices 101, 102, and 103, and returns search results to the terminal devices 101, 102, and 103.
  • search information and target information base may also be directly stored locally on the server 105, and the server 105 may directly extract the search information and search in the target information base stored locally. At this time, there may be no terminal Devices 101, 102, 103 and network 104.
  • the method for searching information provided by the embodiment of the present application is generally executed by the server 105, and accordingly, the apparatus for searching information is generally set in the server 105.
  • information search applications can also be installed in the terminal devices 101, 102, and 103.
  • the method for processing images can also be executed by the terminal devices 101, 102, and 103, and accordingly, used for processing
  • the image device may be provided in the terminal devices 101, 102, and 103.
  • the exemplary system architecture 100 may be absent from the server 105 and the network 104.
  • the server may be hardware or software.
  • the server can be implemented as a distributed server cluster consisting of multiple servers or as a single server.
  • the server can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
  • terminal devices, networks, and servers in FIG. 1 are merely exemplary. According to implementation needs, there can be any number of terminal devices, networks, and servers.
  • the method for searching information includes the following steps:
  • Step 201 Use the obtained search information to search in a target information database to obtain a search result set.
  • the execution subject of the method for searching information may first obtain the search information from a local or other storage device, and then perform a search in the target information base to obtain a search result set.
  • the target information base may be an information base specified by a user in advance, or a search base matching the search information determined according to the search information, or an information base set according to actual application requirements.
  • the search algorithm used in the search process may be preset.
  • Step 202 Select a search result subset from the search result set, and determine search results belonging to the target search result in the search result subset as supplementary search information.
  • the search result set may be sampled first, and at least one search result may be selected to obtain a search result subset.
  • the sampling ratio can be set randomly, or preset by the user or technician, or it can be determined based on the size of the search result set (for example, ten percent of the number of search results included in the search result set is selected as Number of search result subsets).
  • the target search result may refer to a search result that meets a user's search needs. For example, it is possible to verify whether each search result is a target search result by calculating the similarity between each search result in the search result subset and the search information.
  • the annotation information of the search results in the search result subset may also be obtained first, where the annotation information is used to indicate whether the search result is a target search result. Then, according to the label information, search results in the search result subset that belong to the target search result are determined.
  • each search result in the search result subset can be manually verified, and the verification result is input as labeled information, and then sent to the above-mentioned execution subject.
  • Step 203 Perform the following search steps:
  • step 2031 the union of the search result and the search result set searched by the supplementary search information in the target information database is determined as the supplementary search result set.
  • a search may be performed in the target information database by using the supplementary search information to obtain a search result. Because supplementary search information is used, compared with the above search result set, new search results may be included in this search result. Therefore, the union of the current search result and the search result set can be determined as a supplementary search result set.
  • Step 2032 Determine whether the supplementary search result set satisfies a preset convergence condition.
  • the convergence condition may be specifically set by a user or a technician according to an actual search requirement.
  • the convergence condition may be that the number of supplementary search result sets is greater than a preset threshold.
  • the convergence condition may also be that the number of search results included in the difference between the supplementary search result set and the search result set is less than a preset threshold.
  • Step 204 In response to determining that the supplementary search result set does not satisfy a preset convergence condition, selecting a supplementary search result subset from the supplementary search result set, and determining the supplementary search result belonging to the target search result in the supplementary search result subset as the supplementary search information. , Determine the supplementary search result set as the search result set, and continue with the above search steps.
  • step 202 when the supplementary search result set does not satisfy a preset convergence condition, a method similar to step 202 above may be followed, and the supplementary search result set may be sampled first to obtain a supplementary search result subset, and then the supplementary search is performed.
  • the supplementary search results belonging to the target search result in the subset are determined as supplementary search information
  • the supplementary search result set is determined as the search result set
  • the above step 203 is continuously performed to perform an iterative search.
  • a target search result set is determined from the supplementary search result set.
  • the supplementary search result set can be directly used as the target search result set.
  • the method for searching information samples the search result set obtained by searching the search information in the target information database, and uses the search results in the sampled search result subset that belong to the target search result as a supplementary search.
  • the information is searched again in the target information base to supplement the new search results.
  • a flowchart 300 of yet another embodiment of a method for searching information is shown.
  • the process 300 of the method for searching information includes the following steps:
  • Step 301 Use the obtained search information to search in a target information database to obtain a search result set.
  • step 201 For the specific execution process of this step, reference may be made to the related description of step 201 in the embodiment corresponding to FIG. 2, and details are not described herein again.
  • Step 302 Determine the relevance of the search results in the search result set.
  • the degree of association may be used to indicate the degree of association between the search results and the search information.
  • a correlation degree between each search result and the search information may be calculated according to a preset correlation degree calculation method, or a correlation degree between each search result and the search information may be manually marked and returned to the above-mentioned execution subject.
  • the relevance of the search results may be determined by the following steps: first, the existing similarity calculation methods may be used to determine the similarity between the search result and the search information, and then according to the similarity The relationship is proportional to the degree of relevance, and the degree of relevance of the search result is determined based on the similarity.
  • the similarity may be directly determined as the relevance of the search result.
  • Step 303 Select a search result subset from the search result set, and determine search results belonging to the target search result in the search result subset as supplementary search information.
  • step 202 For the specific execution process of this step, reference may be made to the description of step 202 in the embodiment corresponding to FIG. 2, and details are not described herein again.
  • Step 304 Perform the following search steps:
  • Step 3041 Determine the union of the search result and the search result set searched in the target information base by the supplementary search information as the supplementary search result set.
  • step 3041 For the specific execution process of step 3041, reference may be made to the description of step 2031 in the embodiment corresponding to FIG. 2, and details are not described herein again.
  • Step 3042 For the supplementary search results in the supplementary search result set, determine the relevance of the supplementary search results in the following manner:
  • Step 30421 In response to the supplementary search result existing in the search result set, update the correlation degree of the supplementary search result according to a preset correlation degree improvement algorithm, and use the updated correlation degree as the correlation degree of the supplementary search result.
  • the relevance of the supplementary search results can be improved.
  • a technician can set a correlation improvement algorithm in advance to improve the correlation, and can also set a correlation improvement value in advance.
  • the correlation of the search result is added to the set correlation. Value, and use the obtained new relevance as the relevance of this search result.
  • Step 30422 in response to the supplementary search result not existing in the search result set, determining the relevance of the supplementary search result.
  • some existing similarity calculation methods may be used to first determine the similarity between the supplementary search result and the search information, and then determine the supplementary search result based on the similarity according to the relationship between the similarity and the correlation. Correlation. For example, the similarity may be directly determined as the relevance of the supplementary search result.
  • the specific calculation process of the degree of association may also refer to the related description of step 302, which is not repeated here.
  • Step 3043 Determine whether the supplementary search result set satisfies a preset convergence condition.
  • step 3043 For the specific execution process of step 3043, refer to the related description of step 2032 in the embodiment corresponding to FIG. 2, and details are not described herein again.
  • Step 305 In response to determining that the supplementary search result set does not satisfy a preset convergence condition, select a supplementary search result subset from the supplementary search result set, and determine the supplementary search result belonging to the target search result in the supplementary search result subset as the supplementary search information. , Determine the supplementary search result set as the search result set, and continue with the above search steps.
  • step 204 For the specific execution process of this step, reference may be made to the related description of step 204 in the embodiment corresponding to FIG. 2, and details are not described herein again.
  • step 306 in response to determining that the supplementary search result set satisfies the above-mentioned convergence condition, the target search result set is selected from the supplementary search result set as the target search result in the order of the correlation degree from large to small to obtain a target search result set.
  • the number of targets may be set in advance, or may be determined according to a number determination rule (for example, the number of targets is half of the number of supplementary search results included in the supplementary search result set).
  • the process for searching information in this embodiment can also determine the degree of relevance corresponding to each search result for the search results after each search.
  • the relevance of search results that are searched again can be dynamically updated. Therefore, the target search result can be selected according to the relevance of each search result and the supplementary search result, thereby further improving the accuracy of the search result.
  • the process 400 of the method for searching information includes the following steps:
  • Step 401 Use the obtained search information to search in a target information database to obtain a search result set.
  • step 201 For the specific execution process of this step, reference may be made to the related description of step 201 in the embodiment corresponding to FIG. 2, and details are not described herein again.
  • Step 402 Determine the relevance of the search results in the search result set.
  • step 302 For the specific execution process of this step, reference may be made to the description of step 302 in the embodiment corresponding to FIG. 3, and details are not described herein again.
  • Step 403 Divide the search result set into a corresponding number of target search result subsets according to a preset number of target relevance intervals.
  • each correlation degree interval may be evenly divided or non-uniformly divided.
  • the correlation degree is between 0 and 1 (including 0 and 1)
  • it can be divided into the following five correlation degree intervals: 0 to 0.2 (including 0, not including 0.2), 0.2 to 0.4 ( Including 0.2, excluding 0.4), 0.4 to 0.6 (including 0.4, excluding 0.6), 0.6 to 0.8 (including 0.6, excluding 0.8) to 0.8-1 (including 0.8 and 1), and it can also be non-uniformly divided into the following: Five correlation intervals: 0 to 0.5 (including 0, not including 1), 0.5 to 0.7 (including 0.5, not including 0.7), 0.7 to 0.8 (including 0.7, not including 0.8), 0.8 to 0.9 (including 0.8, not Contains 0.9), 0.9 to 1 (including 0.9 and 1).
  • a search result with a correlation degree between 0 and 0.2 can be determined as a search result subset, and a search result with a correlation degree between 0.2 and 0.4 can be determined as A search result subset.
  • a search result with a relevance between 0.4 and 0.6 is determined as a search result subset.
  • a search result with a relevance between 0.6 and 0.8 is determined as a search result subset.
  • Search results between 0.8 and 1 are determined as a subset of search results.
  • Step 404 Select search results from the target number of search result subsets to obtain the search result subsets, and determine the search results belonging to the target search results in the search result subsets as supplementary search information.
  • a partial search result may be sampled from each search result subset, and the sampled search results may be combined to obtain a search result subset.
  • the number of search results sampled for each search result subset may be arbitrarily specified, or may be determined according to a sampling number determination rule.
  • the number of samples determination rule may sample the number of search results for each search result subset to be one tenth of the number of search results included in each search result subset.
  • the number of samples can also be determined according to the degree of correlation. For example, the closer the corresponding degree of correlation is to 0.5, the larger the number of corresponding samples is.
  • the accuracy of the search result subset in the target number of search result subsets may be further determined.
  • accuracy can be used to represent the proportion of target search results in the search result subset. For example, for a search result subset containing 30 search results, 20 of which belong to the target search result, the accuracy of the search result subset is two thirds.
  • Step 405 Perform the following search steps:
  • Step 4051 Determine the union of the search result and the search result set searched in the target information base by the supplementary search information as the supplementary search result set.
  • step 4051 For the specific execution process of step 4051, reference may be made to the description of step 2031 in the embodiment corresponding to FIG. 2, and details are not described herein again.
  • Step 4052 For the supplementary search results in the supplementary search result set, determine the relevance of the supplementary search results in the following manner:
  • step 40521 in response to the supplementary search result existing in the search result set, the correlation degree of the supplementary search result is updated according to a preset correlation degree improvement algorithm, and the updated correlation degree is used as the correlation degree of the supplementary search result.
  • Step 40522 in response to the supplementary search result not existing in the search result set, determining the relevance of the supplementary search result.
  • steps 40521 and 40522 For specific implementation processes of the above steps 40521 and 40522, reference may be made to the related description of steps 30421 and 30422 in the embodiment corresponding to FIG. 3, and details are not described herein again.
  • step 4053 the supplementary search result set is divided into a corresponding number of target supplementary search result subsets according to the above-mentioned association degree interval.
  • supplementary search result set after the supplementary search result set is obtained, new search results may be included, and the relevance of the search results that are searched again is improved, so the supplementary search results may be divided again according to the relevance.
  • the specific division process refer to the related description in step 403 above.
  • Step 4054 Determine whether the supplementary search result set satisfies a preset convergence condition.
  • step 4054 For the specific execution process of step 4054, refer to the related description of step 2032 in the embodiment corresponding to FIG. 2, and details are not described herein again.
  • Step 406 In response to determining that the supplementary search result set does not satisfy a preset convergence condition, select supplementary search results from the target number of supplementary search result sets to obtain a supplementary search result subset, and belong to the target search result to the supplementary search result subset.
  • the supplementary search result of is determined as the supplementary search information
  • the supplementary search result set is determined as the search result set, and the foregoing search step is continued.
  • the search results can be supplemented from the sampling part of each supplementary search result subset, and the sampled supplementary search results can be combined to obtain a supplementary search result subset.
  • the number of supplementary search results sampled for each supplementary search result subset may be arbitrarily specified, or may be determined according to a determination rule of the number of samples.
  • the sampling number determination rule may sample the number of supplementary search results for each supplementary search result subset to be one tenth of the number of supplementary search results included in each supplementary search result subset.
  • the number of samples can also be determined according to the degree of correlation. For example, the closer the corresponding degree of correlation is to 0.5, the larger the number of corresponding samples is.
  • the accuracy between the search result set corresponding to the correlation interval in which the supplementary search result subset is located and the preset accuracy threshold may be determined.
  • the value is inversely proportional to the number of selected supplementary search results, and the number of included supplementary search results is directly proportional to the number of selected supplementary search results.
  • the preset accuracy threshold may be preset by a technician.
  • the minimum and maximum values of accuracy can be estimated in advance, and the accuracy threshold can be set according to the minimum and maximum values of accuracy that are estimated in advance. For example, if the accuracy is between 0 and 1, then the accuracy threshold can be set to 0.5. At this time, the closer the corresponding accuracy is to the supplementary search result subset of 0.5, the more the number of corresponding samples is; on the contrary, the closer the corresponding accuracy is to the supplementary search result subset of 0 and 1, the more the corresponding number of samples is. less.
  • the number of supplementary search results included in the two supplementary search result subsets is 200 and 400 respectively, then the number of samples of the supplementary search result subset including 400 supplementary search results may be greater than that of the 200 supplementary search results.
  • the number of samples of the supplementary search result subset is 200 and 400 respectively, then the number of samples of the supplementary search result subset including 400 supplementary search results may be greater than that of the 200 supplementary search results. The number of samples of the supplementary search result subset.
  • the number of search results in the search result subset and the number of search results in the supplementary search result subset may be the same. That is, after each search, the total number of samples can be fixed.
  • the accuracy of the supplementary search result subset in the target number of supplementary search result sets may be further determined, where the accuracy may be used to represent the proportion of target search results in the supplementary search result subset.
  • the convergence condition may be that the difference between the accuracy of the search result subset and the supplementary search result subset corresponding to the target relevance degree interval is less than a preset difference threshold.
  • the convergence condition may also be that the number of search results included in the difference between the search result set and the supplementary search result set is less than a preset number difference threshold.
  • step 407 in response to determining that the supplementary search result set satisfies the above-mentioned convergence conditions, the target search result set is selected from the supplementary search result set as the target search result in the order of the correlation degree from large to small to obtain a target search result set.
  • step 306 For the specific execution process of this step, reference may be made to the description of step 306 in the embodiment corresponding to FIG. 3, and details are not described herein again.
  • FIG. 5 is a schematic diagram of an application scenario of the method for searching information according to this embodiment.
  • a search result set 503 may be obtained by first searching the image library 502 using the image 501, where the search result set 503 includes 100 images. After that, the similarity between each image in the search result set 503 and the image 501 can be calculated separately, and the similarity can be used as the correlation between each image.
  • the search result set may be divided into three subsets according to the relevance degree (as shown by reference numeral 504 in the figure).
  • the correlation degree of the images in the first subset is between 0 and 0.5, which contains a total of 30 images.
  • the correlation degree of the images in the second subset is between 0.5 and 0.8, which contains a total of 50 images.
  • the relevance of the images is between 0.8 and 1 and contains a total of 20 images.
  • sampling can be performed in three subsets, and the target images included in the sampling are combined to obtain a supplementary search image 506.
  • 10 images can be sampled for the first subset
  • 10 images can be sampled for the second subset
  • 3 images can be sampled for the third subset.
  • the image sampled by the first subset contains 5 target images
  • the image sampled by the second subset contains 8 target images
  • the image sampled by the third subset contains Since three target images are obtained
  • 16 target images can be obtained as the supplementary search image 506.
  • the accuracy of the 10 images sampled by the first subset is one-half
  • the accuracy of the sampling by the second subset is four-fifths
  • the accuracy by the third subset is one.
  • the supplementary search image 506 may be searched in the image library 502 to obtain the current search result set 507, where the current search result set 507 includes 120 images. Further, the union of the search result set 503 obtained from the previous search and the supplementary search result set of the current search may be used as the supplementary search result set 508, where the supplementary search result set includes 150 images.
  • the 150 images included in the supplementary search result set are regrouped according to the relevance degree, and new three subsets are obtained (as shown in 509 in the figure).
  • the first subset with newly obtained correlation between 0 and 0.5 contains 45 images
  • the second subset with newly obtained correlation between 0.5 and 0.8 contains 80 images
  • the third subset between 1 and 25 contains 25 images.
  • the three newly obtained subsets are sampled again.
  • the accuracy of the last three subsets was one-half, four-fifths, and one, so in this sampling process, the sampling in each subset can be determined according to the preset accuracy threshold (such as 0.5). number.
  • the absolute value of the difference between the accuracy and the accuracy threshold of the subset is 0, more images can be sampled in the subset.
  • the absolute value of the difference between the accuracy and the accuracy threshold of the two subsets is large, so fewer images can be sampled in these two subsets .
  • the absolute value of the difference between the accuracy and the accuracy threshold of the newly obtained second subset is smaller than the absolute value of the difference between the accuracy and the accuracy threshold of the newly obtained third subset.
  • the number of samples obtained in the second subset may be greater than the number of samples in the newly obtained third subset.
  • each of the images obtained in the newly obtained first subset and the second subset includes 8 target images, and the images obtained in the newly obtained third subset include There are 1 target images. It can further be calculated that the accuracy of the newly obtained first subset is two thirds, the accuracy of the newly obtained second subset is four fifths, and the accuracy of the newly obtained third subset is one. .
  • the above-mentioned search process is performed again using the supplementary search result set 511 composed of a total of 17 newly sampled images as a supplementary search image, and using the supplementary search result set 508 as a search result set.
  • a search is performed in the image library 502 by using the supplementary search result set 511, and a union of the obtained search result and the supplementary search result set 508 is used as a new supplementary search result set 512.
  • the new supplementary search result set 512 contains 160 images.
  • the degree of relevance of the image searched again can be improved.
  • the similarity between the newly searched image and the image 501 can be used as the relevance of the newly searched image.
  • the new supplementary search result set 512 is re-divided into three subsets according to the re-determined relevance degree. Specifically, as shown by reference numeral 513 in the figure, the newly divided first subset contains 45 images, the newly divided second subset contains 87 images, and the newly divided third subset contains 28 images.
  • the accuracy of the three correlation intervals calculated last time is two-thirds, four-fifths, and one.
  • the images can be sampled from the three newly divided subsets and the accuracy of this sampling can be calculated. Among them, more images can be sampled for the first subset.
  • the accuracy of the newly divided three subsets is calculated to be one-half, nine-tenths, and one, respectively. It can be seen that the accuracy of the newly divided second subset and the newly divided third subset is small, and the new supplementary search result set 512 is compared with the previous supplementary search result set 508. There are also fewer newly searched images, so a total of 115 images in the newly divided second subset and the newly divided third subset can be directly used as the target image set 514.
  • the flow of the method for searching information in this embodiment highlights that after each search is completed and the degree of relevance of each search result is determined
  • the search results can also be divided into corresponding multiple subsets according to a preset correlation degree interval, and supplementary search is sampled in each subset, so that the supplementary search process can cover each correlation degree interval, and further improve the search result. Coverage.
  • the accuracy corresponding to each subset can be further determined according to the sampling result, and the number of samples for the next interval of the correlation degree can be determined according to the accuracy, thereby achieving targeted Perform supplementary searches to further improve the accuracy and search efficiency of search results.
  • this application provides an embodiment of a device for searching for information.
  • the device embodiment corresponds to the method embodiment shown in FIG. 2, and the device is specific Can be applied to various electronic devices.
  • the apparatus 600 for searching information includes a search unit 601, a selection unit 602, a supplemental search unit 603, and a determination unit 604.
  • the search unit 601 is configured to search the target information base using the obtained search information to obtain a search result set;
  • the selection unit 602 is configured to select a search result subset from the search result set, and the search result subset belongs to The search result of the target search result is determined as the supplementary search information;
  • the supplemental search unit 603 is configured to perform the following search step: determining the union of the search result and the search result set searched by the supplementary search information in the target information base as the supplementary search result Determining whether the set of supplementary search results meets a preset convergence condition;
  • a determining unit 604 configured to select a supplementary search result subset from the set of supplemental search results in response to determining that the set of supplemental search results does not meet the preset convergence conditions, and
  • the supplementary search results belonging to the target search result subset in the supplementary search result subset are determined
  • the specific processing of the search unit 601, the selection unit 602, the supplemental search unit 603, and the determination unit 604 and the technical effects brought by them can be referred to the corresponding embodiment in FIG. 2 respectively.
  • Relevant descriptions of step 201, step 202, step 203, and step 204 in the description are not repeated here.
  • the above-mentioned supplementary search unit 603 is further configured to: in response to determining that the supplementary search result set satisfies the above-mentioned convergence condition, determine a target search result set from the supplementary search result set.
  • the above-mentioned selecting unit 602 is further configured to: obtain annotation information of the search results in the search result subset, wherein the annotation information is used to indicate whether the search result is a target search result; according to Annotate the information to determine the search results that belong to the target search result in the search result subset.
  • the apparatus 600 for searching information further includes: a relevance determination unit (not shown in the figure) configured to determine a relevance of search results in a search result set, where , The degree of relevance is used to indicate the degree of relevance of the search results to the search information.
  • a relevance determination unit (not shown in the figure) configured to determine a relevance of search results in a search result set, where , The degree of relevance is used to indicate the degree of relevance of the search results to the search information.
  • the above-mentioned relevance determination unit is further configured to: determine a similarity between the search result and the search information; and determine a relevance between the search results according to the similarity, where the similarity Proportionally related.
  • the above-mentioned correlation degree determining unit is further configured to: for a supplementary search result in a supplementary search result set, in response to the supplementary search result existing in the search result set, according to a preset association Degree improvement algorithm, updating the relevance degree of the supplementary search result, and using the updated relevance degree as the relevance degree of the supplementary search result; determining the relevance degree of the supplementary search result in response to the supplementary search result not existing in the search result set .
  • the above-mentioned correlation degree determining unit is further configured to: determine a similarity between the supplementary search result and the search information; and determine a correlation degree of the supplementary search result according to the similarity, where: Similarity is directly proportional to relevance.
  • the above-mentioned supplementary search unit 603 is further configured to: select the number of supplementary search results from the supplementary search result set as the target search result in the order of relevance from large to small, Get the target search result set.
  • the apparatus 600 for searching information further includes a dividing unit (not shown in the figure) configured to: search results according to a preset number of target relevance intervals.
  • the set is divided into a corresponding number of target search result subsets; and the above-mentioned selection unit is further configured to: respectively select search results from the target number of search result subsets to obtain a search result subset.
  • the apparatus 600 for searching information further includes: an accuracy determination unit (not shown in the figure) configured to determine search result sub-sets in the target number search result subset.
  • an accuracy determination unit (not shown in the figure) configured to determine search result sub-sets in the target number search result subset. Set accuracy, where accuracy is used to represent the proportion of target search results in a subset of search results.
  • the above-mentioned dividing unit is further configured to divide the supplementary search result set into corresponding target number supplementary search result subsets according to the above-mentioned correlation degree interval; and the above-mentioned determining unit further It is configured to select supplementary search results from the target number of supplementary search result subsets respectively to obtain a supplementary search result subset.
  • the accuracy determination unit is further configured to determine the accuracy of the supplementary search result subset of the target number of supplementary search result sets, where the accuracy is used to represent the supplementary search results The percentage of target search results in the subset.
  • the foregoing determining unit is further configured to: for a supplementary search subset of the target number of supplementary search result subsets, determine a correlation degree interval corresponding to the supplementary search result subset.
  • the number of supplementary search results, where the absolute value is inversely proportional to the number of selected supplementary search results, and the number of supplementary search results included is directly proportional to the number of selected supplementary search results.
  • the number of search results in the search result set and the number of supplementary search results in the supplementary search result subset are the same.
  • the convergence conditions include: a difference between the accuracy of the search result subset and the supplementary search result subset corresponding to the target relevance degree interval is less than a preset difference threshold.
  • the convergence condition includes: the number of search results included in the difference between the search result set and the supplementary search result set is less than a preset number difference threshold.
  • the partial search result may be sampled by the selection unit and the search result belonging to the target search result may be selected and searched again by the supplementary search unit.
  • the search result of searching again determines whether to continue sampling for searching, thereby achieving an increase in the number of search results and improving the coverage of the search results.
  • FIG. 7 illustrates a schematic structural diagram of a computer system 700 suitable for implementing an electronic device according to an embodiment of the present application.
  • the electronic device shown in FIG. 7 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application.
  • the computer system 700 includes a central processing unit (CPU) 701, which can be loaded into a random access memory (RAM) 703 from a program stored in a read-only memory (ROM) 702 or from a storage section 708. Instead, perform various appropriate actions and processes.
  • RAM random access memory
  • ROM read-only memory
  • various programs and data required for the operation of the system 700 are also stored.
  • the CPU 701, ROM 702, and RAM 703 are connected to each other through a bus 704.
  • An input / output (I / O) interface 705 is also connected to the bus 704.
  • the following components are connected to the I / O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output portion 707 including a cathode ray tube (CRT), a liquid crystal display (LCD), and a speaker; a storage portion 708 including a hard disk and the like And a communication section 709 including a network interface card such as a LAN card, a modem, and the like.
  • the communication section 709 performs communication processing via a network such as the Internet.
  • the driver 710 is also connected to the I / O interface 705 as needed.
  • a removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 710 as needed, so that a computer program read out therefrom is installed into the storage section 708 as needed.
  • the process described above with reference to the flowchart may be implemented as a computer software program.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart.
  • the computer program may be downloaded and installed from a network through the communication section 709, and / or installed from a removable medium 711.
  • CPU central processing unit
  • the computer-readable medium of the present application may be a computer-readable signal medium or a computer-readable storage medium or any combination of the foregoing.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programming read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more functions to implement a specified logical function Executable instructions.
  • the functions noted in the blocks may also occur in a different order than those marked in the drawings. For example, two successively represented boxes may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts can be implemented by a dedicated hardware-based system that performs the specified function or operation , Or it can be implemented with a combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present application may be implemented by software or hardware.
  • the described unit may also be provided in the processor, for example, it may be described as: a processor including a search unit, a selection unit, a supplementary search unit, and a determination unit.
  • the names of these units do not constitute a limitation on the unit itself in some cases.
  • the search unit can also be described as "a unit that uses the obtained search information to search in the target information base to obtain a search result set.” .
  • the present application also provides a computer-readable medium, which may be included in the electronic device described in the foregoing embodiments; or may exist alone without being assembled into the electronic device in.
  • the computer-readable medium carries one or more programs, and when the one or more programs are executed by the electronic device, the electronic device: searches the target information base using the obtained search information to obtain a search result set; Select a search result subset in the search result set, and determine the search results that belong to the target search result in the search result subset as supplementary search information; perform the following search steps: search results and search result sets that search for the supplementary search information in the target information database
  • the union set is determined as the supplementary search result set; determining whether the supplementary search result set satisfies a preset convergence condition; and in response to determining that the supplementary search result set does not satisfy the preset convergence condition, selecting a supplementary search result subset from the supplementary search result set, And determining the supplementary search result belonging to the target search result in the supplementary search result subset as supplementary

Abstract

An information search method and device. The method comprises: using search information to search in a target information library, so as to acquire a search result set (201); selecting a search result subset from the search result set, and determining, as supplementary search information, a search result which is a target search result in the search result subset (202); executing the following search steps: determining, as a supplementary search result set, the union of a search result for the supplementary search information in the target information library with the search result set (2031); and determining whether the supplementary search result set meets a preset convergence condition (2032) (203); and in response to determining that the supplementary search result set does not meet the preset convergence condition, selecting a supplementary search result subset from the supplementary search result set, determining, as supplementary search information, a supplementary search result belonging to a target search result in the supplementary search result subset, determining the supplementary search result set to be the search result set, and continuing to execute the search steps (204). The method increases the number of search results.

Description

用于搜索信息的方法和装置Method and device for searching information
本专利申请要求于2018年9月12日提交的、申请号为201811060981.3、申请人为北京字节跳动网络技术有限公司、发明名称为“用于搜索信息的方法和装置”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。This patent application claims the priority of a Chinese patent application filed on September 12, 2018, with application number 201811060981.3, the applicant being Beijing BYTE Network Technology Co., Ltd., and the invention name "Methods and Devices for Searching Information" The entire application is incorporated herein by reference.
技术领域Technical field
本申请实施例涉及计算机技术领域,具体涉及用于搜索信息的方法和装置。Embodiments of the present application relate to the field of computer technology, and in particular, to a method and an apparatus for searching information.
背景技术Background technique
在搜索过程中,一般是根据搜索算法直接选择出与输入的搜索信息对应的搜索结果。这种方式对于针对搜索信息进行多次调整之后的信息可能无法直接搜索得到,因此,得到的搜索结果可能不全面。In the search process, the search results corresponding to the input search information are generally directly selected according to the search algorithm. In this way, the information obtained after adjusting the search information multiple times may not be directly searchable, so the search results obtained may not be comprehensive.
发明内容Summary of the Invention
本申请实施例提出了用于搜索信息的方法和装置。The embodiments of the present application provide a method and device for searching information.
第一方面,本申请实施例提供了一种用于搜索信息的方法,该方法包括:利用获取的搜索信息在目标信息库中搜索,得到搜索结果集;从搜索结果集中选取搜索结果子集,以及将搜索结果子集中属于目标搜索结果的搜索结果确定为补充搜索信息;执行如下搜索步骤:将补充搜索信息在目标信息库中搜索的搜索结果与搜索结果集的并集确定为补充搜索结果集;确定补充搜索结果集是否满足预设的收敛条件;响应于确定补充搜索结果集不满足预设的收敛条件,从补充搜索结果集中选取补充搜索结果子集,以及将补充搜索结果子集中属于目标搜索结果的补充搜索结果确定为补充搜索信息,将补充搜索结果集确定为搜索结果集,继续执行上述搜索步骤。In a first aspect, an embodiment of the present application provides a method for searching information. The method includes: using the obtained search information to search in a target information database to obtain a search result set; selecting a search result subset from the search result set, And determining the search result that belongs to the target search result in the search result subset as supplementary search information; performing the following search step: determining the union of the search result and the search result set searched by the supplementary search information in the target information database as the supplementary search result set ; Determine whether the supplementary search result set satisfies a preset convergence condition; in response to determining that the supplementary search result set does not meet the preset convergence condition, selecting a supplementary search result subset from the supplementary search result set, and belonging the supplementary search result subset to the target The supplementary search result of the search result is determined as the supplementary search information, the supplementary search result set is determined as the search result set, and the foregoing search step is continued.
在一些实施例中,上述搜索步骤还包括:响应于确定补充搜索结果集满足上述收敛条件,从补充搜索结果集中确定目标搜索结果集。In some embodiments, the above search step further includes: in response to determining that the supplementary search result set satisfies the above-mentioned convergence condition, determining a target search result set from the supplementary search result set.
在一些实施例中,将搜索结果子集中属于目标搜索结果的搜索结果确定为补充搜索信息,包括:获取搜索结果子集中的搜索结果的标注信息,其中,标注信息用于表示搜索结果是否是目标搜索结果;根据标注信息,确定搜索结果子集中属于目标搜索结果的搜索结果。In some embodiments, determining the search results in the search result subset that belong to the target search result as supplementary search information includes: obtaining annotation information of the search results in the search result subset, wherein the annotation information is used to indicate whether the search result is a target Search results; according to the labeled information, the search results in the search result subset that belong to the target search results are determined.
在一些实施例中,在利用搜索信息在目标信息库中搜索,得到搜索结果集之后,还包括:确定搜索结果集中的搜索结果的关联度,其中,关联度用于表示搜索结果与搜索信息的关联程度。In some embodiments, after using the search information to search in the target information database to obtain the search result set, the method further includes: determining a correlation degree between the search results in the search result set, where the correlation degree is used to represent the relationship between the search result and the search information. Relevance.
在一些实施例中,针对搜索结果集中的搜索结果,该搜索结果的关联度通过如下步骤确定:确定该搜索结果与搜索信息的相似度;根据相似度,确定该搜索结果的关联度,其中,相似度与关联度成正比。In some embodiments, for the search results in the search result set, the relevance of the search results is determined by the following steps: determining the similarity between the search results and the search information; and determining the relevance of the search results according to the similarity, wherein, Similarity is directly proportional to relevance.
在一些实施例中,在将补充搜索信息在目标信息库中的搜索结果与搜索结果集的并集确定为补充搜索结果集之后,还包括:针对补充搜索结果集中的补充搜索结果,响应于该补充搜索结果存在于搜索结果集中,按照预设的关联度提升算法,更新该补充搜索结果的关联度,以及将更新后的关联度作为该补充搜索结果的关联度;响应于该补充搜索结果不存在于搜索结果集中,确定该补充搜索结果的关联度。In some embodiments, after determining the union of the search results and the search result set of the supplementary search information in the target information database as the supplementary search result set, the method further includes: responding to the supplementary search results in the supplementary search result set, in response to the The supplementary search result exists in the search result set, and the correlation degree of the supplementary search result is updated according to a preset correlation degree improvement algorithm, and the updated correlation degree is used as the correlation degree of the supplementary search result; Exist in the search result set and determine the relevance of the supplementary search result.
在一些实施例中,上述确定该补充搜索结果的关联度包括:确定该补充搜索结果与搜索信息的相似度;根据相似度,确定该补充搜索结果的关联度,其中,相似度与关联度成正比。In some embodiments, the determining the correlation degree of the supplementary search result includes: determining a similarity between the supplementary search result and the search information; and determining a correlation degree of the supplementary search result according to the similarity, where the similarity is related to the correlation degree Directly proportional.
在一些实施例中,上述从补充搜索结果集中确定目标搜索结果集,包括:按照关联度从大到小的顺序,从补充搜索结果集中选取目标数目个补充搜索结果作为目标搜索结果,得到目标搜索结果集。In some embodiments, the above determining the target search result set from the supplementary search result set includes: selecting the target number of supplementary search results from the supplementary search result set as the target search results in the order of the degree of relevance from the supplementary search result set to obtain the target search. Result set.
在一些实施例中,该方法还包括:按照预设的目标数目个关联度区间,将搜索结果集划分为对应的目标数目个搜索结果子集;以及从搜索结果集中选取搜索结果子集,包括:从目标数目个搜索结果子集中分别选取搜索结果,得到搜索结果子集。In some embodiments, the method further includes: dividing the search result set into corresponding target number search result subsets according to a preset number of target relevance intervals; and selecting the search result subset from the search result set, including : Select search results from the target number of search result subsets to obtain search result subsets.
在一些实施例中,该方法还包括:确定目标数目个搜索结果子集中的搜索结果子集的准确度,其中,准确度用于表示搜索结果子集中 目标搜索结果所占的比例。In some embodiments, the method further comprises: determining an accuracy of the search result subset of the target number of search result subsets, wherein the accuracy is used to represent a proportion of the target search results in the search result subset.
在一些实施例中,在将补充搜索信息在目标信息库中的搜索结果与搜索结果集的并集确定为补充搜索结果集之后,还包括:按照上述关联度区间,将补充搜索结果集划分为对应的目标数目个补充搜索结果子集;以及从补充搜索结果集中选取补充搜索结果子集,包括:从目标数目个补充搜索结果子集中分别选取补充搜索结果,得到补充搜索结果子集。In some embodiments, after determining the union of the search result and the search result set of the supplementary search information in the target information base as the supplementary search result set, the method further includes: dividing the supplementary search result set into Corresponding target number of supplementary search result subsets; and selecting supplementary search result subsets from the supplementary search result set, including: separately selecting supplementary search results from the target number of supplementary search result subsets to obtain a supplementary search result subset.
在一些实施例中,该方法还包括:确定目标数目个补充搜索结果集中的补充搜索结果子集的准确度,其中,准确度用于表示补充搜索结果子集中的目标搜索结果所占的比例。In some embodiments, the method further comprises: determining an accuracy of the supplementary search result subset of the target number of supplementary search result sets, where the accuracy is used to represent a proportion of target search results in the supplementary search result subset.
在一些实施例中,上述从目标数目个补充搜索结果子集中分别选取补充搜索结果,包括:针对目标数目个补充搜索结果子集中的补充搜索子集,确定该补充搜索结果子集所在的关联度区间对应的搜索结果集的准确度与预设的准确度阈值之间的差值的绝对值;根据绝对值和该补充搜索结果子集包含的补充搜索结果的数目,确定从该补充搜索结果子集中选取的补充搜索结果的数目,其中,绝对值和选取的补充搜索结果的数目成反比,包含的补充搜索结果的数目和选取的补充搜索结果的数目成正比。In some embodiments, the above-mentioned selecting supplementary search results from the target number of supplementary search result subsets respectively includes: for the supplementary search subset of the target number of supplementary search result subsets, determining the relevance degree of the supplementary search result subset. The absolute value of the difference between the accuracy of the search result set corresponding to the interval and the preset accuracy threshold; based on the absolute value and the number of supplementary search results contained in the supplementary search result subset, determining the sub-search results from the supplementary search result set The number of centrally selected supplementary search results. The absolute value is inversely proportional to the number of supplementary search results selected. The number of supplementary search results included is directly proportional to the number of supplementary search results selected.
在一些实施例中,搜索结果集中的搜索结果的数目和补充搜索结果子集中的补充搜索结果的数目相同。In some embodiments, the number of search results in the search result set and the number of supplementary search results in the supplementary search result subset are the same.
在一些实施例中,收敛条件包括:目标关联度区间对应的搜索结果子集和补充搜索结果子集的准确度的差值小于预设的差值阈值。In some embodiments, the convergence condition includes: a difference between the accuracy of the search result subset and the supplementary search result subset corresponding to the target relevance degree interval is less than a preset difference threshold.
在一些实施例中,收敛条件包括:搜索结果集和补充搜索结果集的差集所包含的搜索结果的数目小于预设的数目差阈值。In some embodiments, the convergence condition includes: the number of search results included in the difference between the search result set and the supplementary search result set is less than a preset number difference threshold.
第二方面,本申请实施例提供了一种用于搜索信息的装置,该装置包括:搜索单元,被配置成利用获取的搜索信息在目标信息库中搜索,得到搜索结果集;选取单元,被配置成从搜索结果集中选取搜索结果子集,以及将搜索结果子集中属于目标搜索结果的搜索结果确定为补充搜索信息;补充搜索单元,被配置成执行如下搜索步骤:将补充搜索信息在目标信息库中搜索的搜索结果与搜索结果集的并集确定 为补充搜索结果集;确定补充搜索结果集是否满足预设的收敛条件;确定单元,被配置成响应于确定补充搜索结果集不满足预设的收敛条件,从补充搜索结果集中选取补充搜索结果子集,以及将补充搜索结果子集中属于目标搜索结果的补充搜索结果确定为补充搜索信息,将补充搜索结果集确定为搜索结果集,继续执行上述搜索步骤。In a second aspect, an embodiment of the present application provides an apparatus for searching for information. The apparatus includes: a searching unit configured to search in a target information database using the obtained search information to obtain a search result set; and a selecting unit, which is It is configured to select a search result subset from the search result set, and determine the search results in the search result subset that belong to the target search result as supplementary search information; the supplementary search unit is configured to perform the following search steps: placing the supplementary search information in the target information The union of the search results and the search result set searched in the library is determined as the supplementary search result set; determining whether the supplementary search result set satisfies a preset convergence condition; and a determining unit configured to respond to determining that the supplementary search result set does not satisfy the preset Condition of convergence, select a supplementary search result subset from the supplementary search result set, and determine the supplementary search results belonging to the target search result in the supplementary search result subset as supplementary search information, determine the supplementary search result set as the search result set, and continue execution The search steps above.
在一些实施例中,上述补充搜索单元进一步被配置成:响应于确定补充搜索结果集满足上述收敛条件,从补充搜索结果集中确定目标搜索结果集。In some embodiments, the above-mentioned supplementary search unit is further configured to determine a target search result set from the supplementary search result set in response to determining that the supplementary search result set satisfies the above-mentioned convergence condition.
在一些实施例中,上述选取单元进一步被配置成:获取搜索结果子集中的搜索结果的标注信息,其中,标注信息用于表示搜索结果是否是目标搜索结果;根据标注信息,确定搜索结果子集中属于目标搜索结果的搜索结果。In some embodiments, the above-mentioned selecting unit is further configured to: obtain annotation information of the search results in the search result subset, wherein the annotation information is used to indicate whether the search result is a target search result; and determine the search result subset according to the annotation information Search results that belong to the target search result.
在一些实施例中,该装置还包括:关联度确定单元,被配置成确定搜索结果集中的搜索结果的关联度,其中,关联度用于表示搜索结果与搜索信息的关联程度。In some embodiments, the apparatus further includes a relevance degree determining unit configured to determine a relevance degree of the search results in the search result set, wherein the relevance degree is used to indicate a relevance degree of the search results and the search information.
在一些实施例中,上述关联度确定单元进一步被配置成:确定该搜索结果与搜索信息的相似度;根据相似度,确定该搜索结果的关联度,其中,相似度与关联度成正比。In some embodiments, the above-mentioned relevance determination unit is further configured to: determine a similarity between the search result and the search information; and determine a relevance of the search result according to the similarity, wherein the similarity is directly proportional to the relevance.
在一些实施例中,上述关联度确定单元进一步被配置成:针对补充搜索结果集中的补充搜索结果,响应于该补充搜索结果存在于搜索结果集中,按照预设的关联度提升算法,更新该补充搜索结果的关联度,以及将更新后的关联度作为该补充搜索结果的关联度;响应于该补充搜索结果不存在于搜索结果集中,确定该补充搜索结果的关联度。In some embodiments, the above-mentioned correlation degree determining unit is further configured to: for the supplementary search result set in the supplementary search result set, in response to the supplementary search result existing in the search result set, update the supplement according to a preset correlation degree improvement algorithm. The relevance of the search results and the updated relevance are taken as the relevance of the supplementary search results; in response to the supplementary search results not being present in the search result set, the relevance of the supplementary search results is determined.
在一些实施例中,上述关联度确定单元进一步被配置成:确定该补充搜索结果与搜索信息的相似度;根据相似度,确定该补充搜索结果的关联度,其中,相似度与关联度成正比。In some embodiments, the above-mentioned relevance determination unit is further configured to: determine a similarity between the supplementary search result and the search information; and determine a relevance between the supplementary search results according to the similarity, wherein the similarity is directly proportional to the relevance .
在一些实施例中,上述补充搜索单元,进一步被配置成:按照关联度从大到小的顺序,从补充搜索结果集中选取目标数目个补充搜索结果作为目标搜索结果,得到目标搜索结果集。In some embodiments, the above-mentioned supplementary search unit is further configured to select the target number of supplementary search results from the supplementary search result set as the target search results in order of the relevance degree from large to small to obtain the target search result set.
在一些实施例中,该装置还包括划分单元,被配置成:按照预设 的目标数目个关联度区间,将搜索结果集划分为对应的目标数目个搜索结果子集;以及上述选取单元,进一步被配置成:从目标数目个搜索结果子集中分别选取搜索结果,得到搜索结果子集。In some embodiments, the apparatus further includes a dividing unit configured to divide the search result set into a corresponding number of target search result subsets according to a preset number of target relevance intervals; and the above-mentioned selecting unit, further It is configured to select the search results from the target number of search result subsets respectively to obtain the search result subsets.
在一些实施例中,该装置还包括:准确度确定单元,被配置成确定目标数目个搜索结果子集中的搜索结果子集的准确度,其中,准确度用于表示搜索结果子集中目标搜索结果所占的比例。In some embodiments, the apparatus further includes: an accuracy determination unit configured to determine the accuracy of the search result subset in the search result subset of the target number, where the accuracy is used to represent the target search result in the search result subset Percentage.
在一些实施例中,上述划分单元进一步被配置成:按照上述关联度区间,将补充搜索结果集划分为对应的目标数目个补充搜索结果子集;以及上述确定单元进一步被配置成:从目标数目个补充搜索结果子集中分别选取补充搜索结果,得到补充搜索结果子集。In some embodiments, the above-mentioned dividing unit is further configured to divide the supplementary search result set into corresponding number of supplementary search result subsets according to the above-mentioned relevance degree interval; and the above-mentioned determination unit is further configured to: from the number of targets The supplementary search result subsets are selected from each of the supplementary search result subsets to obtain the supplementary search result subsets.
在一些实施例中,准确度确定单元进一步被配置成:确定目标数目个补充搜索结果集中的补充搜索结果子集的准确度,其中,准确度用于表示补充搜索结果子集中的目标搜索结果所占的比例。In some embodiments, the accuracy determination unit is further configured to determine the accuracy of the supplementary search result subset in the target number of supplementary search result sets, where the accuracy is used to represent the target search result location in the supplementary search result subset. Percentage.
在一些实施例中,上述确定单元进一步被配置成:针对目标数目个补充搜索结果子集中的补充搜索子集,确定该补充搜索结果子集所在的关联度区间对应的搜索结果集的准确度与预设的准确度阈值之间的差值的绝对值;根据绝对值和该补充搜索结果子集包含的补充搜索结果的数目,确定从该补充搜索结果子集中选取的补充搜索结果的数目,其中,绝对值和选取的补充搜索结果的数目成反比,包含的补充搜索结果的数目和选取的补充搜索结果的数目成正比。In some embodiments, the above-mentioned determining unit is further configured to: for the supplementary search subset of the target number of supplementary search result subsets, determine the accuracy and An absolute value of a difference between a preset accuracy threshold; determining the number of supplementary search results selected from the supplementary search result subset according to the absolute value and the number of supplementary search results included in the supplementary search result subset, wherein The absolute value is inversely proportional to the number of selected supplementary search results, and the number of supplementary search results included is directly proportional to the number of supplementary search results selected.
在一些实施例中,搜索结果集中的搜索结果的数目和补充搜索结果子集中的补充搜索结果的数目相同。In some embodiments, the number of search results in the search result set and the number of supplementary search results in the supplementary search result subset are the same.
在一些实施例中,收敛条件包括:目标关联度区间对应的搜索结果子集和补充搜索结果子集的准确度的差值小于预设的差值阈值。In some embodiments, the convergence condition includes: a difference between the accuracy of the search result subset and the supplementary search result subset corresponding to the target relevance degree interval is less than a preset difference threshold.
在一些实施例中,收敛条件包括:搜索结果集和补充搜索结果集的差集所包含的搜索结果的数目小于预设的数目差阈值。In some embodiments, the convergence condition includes: the number of search results included in the difference between the search result set and the supplementary search result set is less than a preset number difference threshold.
第三方面,本申请实施例提供了一种电子设备,该电子设备包括:一个或多个处理器;存储装置,用于存储一个或多个程序;当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如第一方面中任一实现方式描述的方法。According to a third aspect, an embodiment of the present application provides an electronic device. The electronic device includes: one or more processors; a storage device configured to store one or more programs; The processor executes such that one or more processors implement the method as described in any implementation of the first aspect.
第四方面,本申请实施例提供了一种计算机可读介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如第一方面中任一实现方式描述的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable medium having stored thereon a computer program that, when executed by a processor, implements the method as described in any implementation manner of the first aspect.
本申请实施例提供的用于搜索信息的方法和装置,通过在直接得到搜索结果之后,可以采样部分搜索结果并从中选取属于目标搜索结果的搜索结果再次进行搜索,之后,根据再次搜索的搜索结果判断是否继续采样进行搜索,从而实现了增大搜索结果的数目,提升搜索结果的覆盖率。The method and device for searching information provided in the embodiments of the present application can obtain a search result by directly sampling a part of the search result and selecting a search result belonging to the target search result to perform a search again. After that, according to the search result of the search again Determine whether to continue sampling and searching, so as to increase the number of search results and improve the coverage of search results.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:Other features, objects, and advantages of the present application will become more apparent by reading the detailed description of the non-limiting embodiments with reference to the following drawings:
图1是本申请的一个实施例可以应用于其中的示例性系统架构图;FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application can be applied; FIG.
图2是根据本申请的用于搜索信息的方法的一个实施例的流程图;2 is a flowchart of an embodiment of a method for searching information according to the present application;
图3是根据本申请的用于搜索信息的方法的又一个实施例的流程图;3 is a flowchart of still another embodiment of a method for searching information according to the present application;
图4是根据本申请的用于搜索信息的方法的又一个实施例的流程图;4 is a flowchart of another embodiment of a method for searching information according to the present application;
图5是根据本申请实施例的用于搜索信息的方法的一个应用场景的示意图;5 is a schematic diagram of an application scenario of a method for searching information according to an embodiment of the present application;
图6是根据本申请的用于搜索信息的装置的一个实施例的结构示意图;6 is a schematic structural diagram of an embodiment of an apparatus for searching information according to the present application;
图7是适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。FIG. 7 is a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.
具体实施方式detailed description
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发 明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。The following describes the present application in detail with reference to the accompanying drawings and embodiments. It can be understood that the specific embodiments described herein are only used to explain the related invention, rather than limiting the invention. It should also be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The application will be described in detail below with reference to the drawings and embodiments.
图1示出了可以应用本申请的用于搜索信息的方法或用于搜索信息的装置的实施例的示例性架构100。FIG. 1 illustrates an exemplary architecture 100 of an embodiment of a method for searching information or a device for searching information to which the present application can be applied.
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1, the system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105. The network 104 is a medium for providing a communication link between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种客户端应用,例如网页浏览器应用、购物类应用、搜索类应用、即时通信工具等。The terminal devices 101, 102, 103 interact with the server 105 through the network 104 to receive or send messages and the like. Various client applications can be installed on the terminal devices 101, 102, and 103, such as a web browser application, a shopping application, a search application, and an instant communication tool.
终端设备101、102、103可以是硬件,也可以是软件。当终端设备101、102、103为硬件时,可以是支持数据存储和数据交换的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、膝上型便携计算机和台式计算机等等。当终端设备101、102、103为软件时,可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务的多个软件或软件模块),也可以实现成单个软件或软件模块。在此不做具体限定。The terminal devices 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, and 103 are hardware, they can be various electronic devices that support data storage and data exchange, including but not limited to smartphones, tablets, e-book readers, laptop computers, and desktop computers. Wait. When the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (such as multiple software or software modules used to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
服务器105可以是提供各种服务的服务器,例如根据终端设备101、102、103发送的搜索信息在目标信息库中进行搜索,并向终端设备101、102、103返回搜索结果的搜索服务器。The server 105 may be a server that provides various services, for example, a search server that performs a search in the target information base according to the search information sent by the terminal devices 101, 102, and 103, and returns search results to the terminal devices 101, 102, and 103.
需要说明的是,上述搜索信息和目标信息库也可以直接存储在服务器105的本地,服务器105可以直接提取搜索信息,并在本地所存储的目标信息库中进行搜索,此时,可以不存在终端设备101、102、103和网络104。It should be noted that the above search information and target information base may also be directly stored locally on the server 105, and the server 105 may directly extract the search information and search in the target information base stored locally. At this time, there may be no terminal Devices 101, 102, 103 and network 104.
需要说明的是,本申请实施例所提供的用于搜索信息的方法一般 由服务器105执行,相应地,用于搜索信息的装置一般设置于服务器105中。It should be noted that the method for searching information provided by the embodiment of the present application is generally executed by the server 105, and accordingly, the apparatus for searching information is generally set in the server 105.
还需要指出的是,终端设备101、102、103中也可以安装有信息搜索类应用,此时,用于处理图像的方法也可以由终端设备101、102、103执行,相应地,用于处理图像的装置也可以设置于终端设备101、102、103中。此时,示例性系统架构100可以不存在服务器105和网络104。It should also be noted that information search applications can also be installed in the terminal devices 101, 102, and 103. At this time, the method for processing images can also be executed by the terminal devices 101, 102, and 103, and accordingly, used for processing The image device may be provided in the terminal devices 101, 102, and 103. At this time, the exemplary system architecture 100 may be absent from the server 105 and the network 104.
需要说明的是,服务器可以是硬件,也可以是软件。当服务器为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。It should be noted that the server may be hardware or software. When the server is hardware, it can be implemented as a distributed server cluster consisting of multiple servers or as a single server. When the server is software, it can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely exemplary. According to implementation needs, there can be any number of terminal devices, networks, and servers.
继续参考图2,其示出了根据本申请的用于搜索信息方法的一个实施例的流程200。该用于搜索信息的方法包括以下步骤:With continued reference to FIG. 2, a flowchart 200 of an embodiment of a method for searching information according to the present application is shown. The method for searching information includes the following steps:
步骤201,利用获取的搜索信息在目标信息库中搜索,得到搜索结果集。Step 201: Use the obtained search information to search in a target information database to obtain a search result set.
在本实施例中,用于搜索信息的方法的执行主体(如图1中的服务器105)可以先从本地或其他存储设备获取搜索信息,然后在目标信息库中进行搜索以得到搜索结果集。其中,目标信息库可以是预先由用户指定的信息库,也可以是根据搜索信息确定的与搜索信息匹配的搜索库,还可以是根据实际应用需求设置的信息库。搜索过程所使用的搜索算法可以是预先设置好的。In this embodiment, the execution subject of the method for searching information (such as the server 105 in FIG. 1) may first obtain the search information from a local or other storage device, and then perform a search in the target information base to obtain a search result set. The target information base may be an information base specified by a user in advance, or a search base matching the search information determined according to the search information, or an information base set according to actual application requirements. The search algorithm used in the search process may be preset.
步骤202,从搜索结果集中选取搜索结果子集,以及将搜索结果子集中属于目标搜索结果的搜索结果确定为补充搜索信息。Step 202: Select a search result subset from the search result set, and determine search results belonging to the target search result in the search result subset as supplementary search information.
在本实施例中,可以对搜索结果集先进行采样,选取至少一个搜索结果得到搜索结果子集。采样的比例可以是随机设置的,也可以是由用户或技术人员预先设置的,也可以是根据搜索结果集的大小确定 的(例如选取搜索结果集包含的搜索结果的数目的百分之十作为搜索结果子集的数目)。In this embodiment, the search result set may be sampled first, and at least one search result may be selected to obtain a search result subset. The sampling ratio can be set randomly, or preset by the user or technician, or it can be determined based on the size of the search result set (for example, ten percent of the number of search results included in the search result set is selected as Number of search result subsets).
在本实施例中,目标搜索结果可以指满足用户搜索需求的搜索结果。例如可以通过计算搜索结果子集中的各个搜索结果与搜索信息的相似度来验证各个搜索结果是否是目标搜索结果。In this embodiment, the target search result may refer to a search result that meets a user's search needs. For example, it is possible to verify whether each search result is a target search result by calculating the similarity between each search result in the search result subset and the search information.
可选地,也可以先获取搜索结果子集中的搜索结果的标注信息,其中,标注信息用于表示搜索结果是否是目标搜索结果。然后,根据标注信息,确定搜索结果子集中属于目标搜索结果的搜索结果。Optionally, the annotation information of the search results in the search result subset may also be obtained first, where the annotation information is used to indicate whether the search result is a target search result. Then, according to the label information, search results in the search result subset that belong to the target search result are determined.
具体地,可以由人工对搜索结果子集中的各个搜索结果进行验证,并将验证结果输入为标注信息,然后发送至上述执行主体。Specifically, each search result in the search result subset can be manually verified, and the verification result is input as labeled information, and then sent to the above-mentioned execution subject.
步骤203,执行如下搜索步骤:Step 203: Perform the following search steps:
步骤2031,将补充搜索信息在目标信息库中搜索的搜索结果与搜索结果集的并集确定为补充搜索结果集。In step 2031, the union of the search result and the search result set searched by the supplementary search information in the target information database is determined as the supplementary search result set.
在本实施例中,可以先利用补充搜索信息在目标信息库中进行搜索,得到搜索结果。由于使用了补充搜索信息,因此,和上述搜索结果集相比,本次的搜索结果中可能包含有新的搜索结果。因此,可以将本次的搜索结果与上述搜索结果集的并集确定为补充搜索结果集。In this embodiment, a search may be performed in the target information database by using the supplementary search information to obtain a search result. Because supplementary search information is used, compared with the above search result set, new search results may be included in this search result. Therefore, the union of the current search result and the search result set can be determined as a supplementary search result set.
步骤2032,确定补充搜索结果集是否满足预设的收敛条件。Step 2032: Determine whether the supplementary search result set satisfies a preset convergence condition.
在本实施例中,收敛条件可以是由用户或技术人员根据实际的搜索需求具体设置的。例如,收敛条件可以是补充搜索结果集的数目大于预设的阈值。又例如,收敛条件也可以是补充搜索结果集和上述搜索结果集的差集所包含的搜索结果的数目小于预设的阈值。In this embodiment, the convergence condition may be specifically set by a user or a technician according to an actual search requirement. For example, the convergence condition may be that the number of supplementary search result sets is greater than a preset threshold. For another example, the convergence condition may also be that the number of search results included in the difference between the supplementary search result set and the search result set is less than a preset threshold.
步骤204,响应于确定补充搜索结果集不满足预设的收敛条件,从补充搜索结果集中选取补充搜索结果子集,以及将补充搜索结果子集中属于目标搜索结果的补充搜索结果确定为补充搜索信息,将补充搜索结果集确定为搜索结果集,继续执行上述搜索步骤。Step 204: In response to determining that the supplementary search result set does not satisfy a preset convergence condition, selecting a supplementary search result subset from the supplementary search result set, and determining the supplementary search result belonging to the target search result in the supplementary search result subset as the supplementary search information. , Determine the supplementary search result set as the search result set, and continue with the above search steps.
在本实施例中,在补充搜索结果集不满足预设的收敛条件时,可以按照上述步骤202类似的方法,可以先从补充搜索结果集中进行采样,得到补充搜索结果子集,然后将补充搜索子集中属于目标搜索结果的补充搜索结果确定为补充搜索信息,将补充搜索结果集确定为搜 索结果集,继续执行上述步骤203,进行迭代搜索。In this embodiment, when the supplementary search result set does not satisfy a preset convergence condition, a method similar to step 202 above may be followed, and the supplementary search result set may be sampled first to obtain a supplementary search result subset, and then the supplementary search is performed. The supplementary search results belonging to the target search result in the subset are determined as supplementary search information, the supplementary search result set is determined as the search result set, and the above step 203 is continuously performed to perform an iterative search.
可选地,响应于确定补充搜索结果集满足上述收敛条件,从补充搜索结果集中确定目标搜索结果集。例如,可以直接将补充搜索结果集作为目标搜索结果集。Optionally, in response to determining that the supplementary search result set satisfies the above-mentioned convergence condition, a target search result set is determined from the supplementary search result set. For example, the supplementary search result set can be directly used as the target search result set.
本申请的上述实施例提供的用于搜索信息的方法通过对搜索信息在目标信息库中搜索得到的搜索结果集进行采样,并将采样的搜索结果子集中属于目标搜索结果的搜索结果作为补充搜索信息在目标信息库中再次进行搜索,从而能够补充新的搜索结果。之后还可以进一步判断搜索结果集加上新的搜索结果得到的补充搜索结果集是否满足收敛条件,若不满足,还可以按照上述采样选取补充搜索信息的方法在目标信息库中迭代搜索,从而不断增加新的搜索结果,直到得到的所有的搜索结果满足收敛条件,从而提升了搜索结果的覆盖率,避免了直接一次的搜索可能会出现的、遗漏许多搜索结果的情况。The method for searching information provided by the above embodiments of the present application samples the search result set obtained by searching the search information in the target information database, and uses the search results in the sampled search result subset that belong to the target search result as a supplementary search. The information is searched again in the target information base to supplement the new search results. After that, you can further determine whether the supplementary search result set obtained by adding the new search result to the search result set meets the convergence conditions. If it is not satisfied, you can also iteratively search in the target information database according to the method of sampling and supplementary search information described above, so as to continuously Add new search results until all the search results obtained meet the convergence conditions, thereby improving the coverage of the search results and avoiding the situation where a direct search may miss many search results.
进一步参考图3,其示出了用于搜索信息的方法的又一个实施例的流程300。该用于搜索信息的方法的流程300,包括以下步骤:With further reference to FIG. 3, a flowchart 300 of yet another embodiment of a method for searching information is shown. The process 300 of the method for searching information includes the following steps:
步骤301,利用获取的搜索信息在目标信息库中搜索,得到搜索结果集。Step 301: Use the obtained search information to search in a target information database to obtain a search result set.
本步骤的具体的执行过程可参考图2对应实施例中的步骤201的相关说明,在此不再赘述。For the specific execution process of this step, reference may be made to the related description of step 201 in the embodiment corresponding to FIG. 2, and details are not described herein again.
步骤302,确定搜索结果集中的搜索结果的关联度。Step 302: Determine the relevance of the search results in the search result set.
在本实施例中,关联度可以用于表示搜索结果与搜索信息的关联程度。具体地,可以根据预设的关联度计算方法计算各个搜索结果分别与搜索信息的关联程度,也可以由人工来标注各个搜索结果与搜索信息的关联程度并返回至上述执行主体。In this embodiment, the degree of association may be used to indicate the degree of association between the search results and the search information. Specifically, a correlation degree between each search result and the search information may be calculated according to a preset correlation degree calculation method, or a correlation degree between each search result and the search information may be manually marked and returned to the above-mentioned execution subject.
可选地,针对搜索结果集中的搜索结果,该搜索结果的关联度可以通过如下步骤确定:可以利用现有的一些相似度计算方法先确定该搜索结果与搜索信息的相似度,然后按照相似度与关联度成正比的关系,根据相似度确定该搜索结果的关联度。例如,可以直接将相似度确定为该搜索结果的关联度。Optionally, for the search results in the search result set, the relevance of the search results may be determined by the following steps: first, the existing similarity calculation methods may be used to determine the similarity between the search result and the search information, and then according to the similarity The relationship is proportional to the degree of relevance, and the degree of relevance of the search result is determined based on the similarity. For example, the similarity may be directly determined as the relevance of the search result.
步骤303,从搜索结果集中选取搜索结果子集,以及将搜索结果子集中属于目标搜索结果的搜索结果确定为补充搜索信息。Step 303: Select a search result subset from the search result set, and determine search results belonging to the target search result in the search result subset as supplementary search information.
本步骤的具体执行过程可参考图2对应实施例中的步骤202的相关说明,在此不再赘述。For the specific execution process of this step, reference may be made to the description of step 202 in the embodiment corresponding to FIG. 2, and details are not described herein again.
步骤304,执行如下搜索步骤:Step 304: Perform the following search steps:
步骤3041,将补充搜索信息在目标信息库中搜索的搜索结果与搜索结果集的并集确定为补充搜索结果集。Step 3041: Determine the union of the search result and the search result set searched in the target information base by the supplementary search information as the supplementary search result set.
本步骤3041的具体执行过程可参考图2对应实施例中的步骤2031的相关说明,在此不再赘述。For the specific execution process of step 3041, reference may be made to the description of step 2031 in the embodiment corresponding to FIG. 2, and details are not described herein again.
步骤3042,针对补充搜索结果集中的补充搜索结果,通过如下方式确定该补充搜索结果的关联度:Step 3042: For the supplementary search results in the supplementary search result set, determine the relevance of the supplementary search results in the following manner:
步骤30421,响应于该补充搜索结果存在于搜索结果集中,按照预设的关联度提升算法,更新该补充搜索结果的关联度,以及将更新后的关联度作为该补充搜索结果的关联度。Step 30421: In response to the supplementary search result existing in the search result set, update the correlation degree of the supplementary search result according to a preset correlation degree improvement algorithm, and use the updated correlation degree as the correlation degree of the supplementary search result.
在本实施例中,对于在搜索结果集中存在的补充搜索结果,即搜索结果再次被搜索到,可以提升该补充搜索结果的关联度。具体地,可以由技术人员预先设置关联度提升算法用于提升关联度,也可以预先设置关联度提升值,在一个搜索结果再次被搜索到时,将该搜索结果的关联度加上设置的关联度提升值,并将得到的新的关联度作为该搜索结果的关联度。In this embodiment, for the supplementary search results existing in the search result set, that is, the search results are searched again, the relevance of the supplementary search results can be improved. Specifically, a technician can set a correlation improvement algorithm in advance to improve the correlation, and can also set a correlation improvement value in advance. When a search result is searched again, the correlation of the search result is added to the set correlation. Value, and use the obtained new relevance as the relevance of this search result.
步骤30422,响应于该补充搜索结果不存在于搜索结果集中,确定该补充搜索结果的关联度。Step 30422, in response to the supplementary search result not existing in the search result set, determining the relevance of the supplementary search result.
在本实施例中,可以利用现有的一些相似度计算方法先确定该补充搜索结果与搜索信息的相似度,然后按照相似度与关联度成正比的关系,根据相似度确定该补充搜索结果的关联度。例如,可以直接将相似度确定为该补充搜索结果的关联度。In this embodiment, some existing similarity calculation methods may be used to first determine the similarity between the supplementary search result and the search information, and then determine the supplementary search result based on the similarity according to the relationship between the similarity and the correlation. Correlation. For example, the similarity may be directly determined as the relevance of the supplementary search result.
对于不存在与搜索结果集中的补充搜索结果,即新增的搜索结果,关联度的具体计算过程还可以参考上述步骤302的相关说明,在此不再赘述。For a supplementary search result that does not exist in the search result set, that is, a newly added search result, the specific calculation process of the degree of association may also refer to the related description of step 302, which is not repeated here.
步骤3043,确定补充搜索结果集是否满足预设的收敛条件。Step 3043: Determine whether the supplementary search result set satisfies a preset convergence condition.
本步骤3043的具体执行过程可参考图2对应实施例中的步骤2032的相关说明,在此不再赘述。For the specific execution process of step 3043, refer to the related description of step 2032 in the embodiment corresponding to FIG. 2, and details are not described herein again.
步骤305,响应于确定补充搜索结果集不满足预设的收敛条件,从补充搜索结果集中选取补充搜索结果子集,以及将补充搜索结果子集中属于目标搜索结果的补充搜索结果确定为补充搜索信息,将补充搜索结果集确定为搜索结果集,继续执行上述搜索步骤。Step 305: In response to determining that the supplementary search result set does not satisfy a preset convergence condition, select a supplementary search result subset from the supplementary search result set, and determine the supplementary search result belonging to the target search result in the supplementary search result subset as the supplementary search information. , Determine the supplementary search result set as the search result set, and continue with the above search steps.
本步骤的具体执行过程可参考图2对应实施例中的步骤204的相关说明,在此不再赘述。For the specific execution process of this step, reference may be made to the related description of step 204 in the embodiment corresponding to FIG. 2, and details are not described herein again.
步骤306,响应于确定补充搜索结果集满足上述收敛条件,按照关联度从大到小的顺序,从补充搜索结果集中选取目标数目个补充搜索结果作为目标搜索结果,得到目标搜索结果集合。In step 306, in response to determining that the supplementary search result set satisfies the above-mentioned convergence condition, the target search result set is selected from the supplementary search result set as the target search result in the order of the correlation degree from large to small to obtain a target search result set.
在本实施例中,目标数目可以是预先设置的,也可以是根据数目确定规则(如目标数目为补充搜索结果集所包含的补充搜索结果的数目的一半等)确定的。In this embodiment, the number of targets may be set in advance, or may be determined according to a number determination rule (for example, the number of targets is half of the number of supplementary search results included in the supplementary search result set).
从图3中可以看出,与图2对应的实施例相比,本实施例中的用于搜索信息方法的流程对于每次搜索之后的搜索结果,还可以确定每个搜索结果对应的关联度,并在迭代搜索过程中,可以动态更新再次被搜索到的搜索结果的关联度。由此,可以根据各个搜索结果及补充搜索结果的关联度选取目标搜索结果,进一步提升搜索结果的准确度。It can be seen from FIG. 3 that, compared with the embodiment corresponding to FIG. 2, the process for searching information in this embodiment can also determine the degree of relevance corresponding to each search result for the search results after each search. During the iterative search process, the relevance of search results that are searched again can be dynamically updated. Therefore, the target search result can be selected according to the relevance of each search result and the supplementary search result, thereby further improving the accuracy of the search result.
进一步参考图4,其示出了用于搜索信息的方法的又一个实施例的流程400。该用于搜索信息的方法的流程400,包括以下步骤:With further reference to FIG. 4, a flowchart 400 of still another embodiment of a method for searching information is shown. The process 400 of the method for searching information includes the following steps:
步骤401,利用获取的搜索信息在目标信息库中搜索,得到搜索结果集。Step 401: Use the obtained search information to search in a target information database to obtain a search result set.
本步骤的具体的执行过程可参考图2对应实施例中的步骤201的相关说明,在此不再赘述。For the specific execution process of this step, reference may be made to the related description of step 201 in the embodiment corresponding to FIG. 2, and details are not described herein again.
步骤402,确定搜索结果集中的搜索结果的关联度。Step 402: Determine the relevance of the search results in the search result set.
本步骤的具体的执行过程可参考图3对应实施例中的步骤302的相关说明,在此不再赘述。For the specific execution process of this step, reference may be made to the description of step 302 in the embodiment corresponding to FIG. 3, and details are not described herein again.
步骤403,按照预设的目标数目个关联度区间,将搜索结果集划 分为对应的目标数目个搜索结果子集。Step 403: Divide the search result set into a corresponding number of target search result subsets according to a preset number of target relevance intervals.
在本实施例中,目标数目可以是根据实际应用需求预先设置的。其中,各个关联度区间可以是均匀划分的,也可以是非均匀划分的。举例来说,对于关联度为0到1(包括0和1)之间的情况,可以均匀的划分为如下五个关联度区间:0到0.2(包含0,不包含0.2)、0.2到0.4(包含0.2,不包含0.4)、0.4到0.6(包含0.4,不包含0.6)、0.6到0.8(包含0.6,不包含0.8)到0.8-1(包含0.8和1),也可以非均匀的划分为如下五个关联度区间:0到0.5(包含0,不包含1)、0.5到0.7(包含0.5,不包含0.7)、0.7到0.8(包含0.7,不包含0.8)、0.8到0.9(包含0.8,不包含0.9)、0.9到1(包含0.9和1)。In this embodiment, the target number may be preset according to actual application requirements. Among them, each correlation degree interval may be evenly divided or non-uniformly divided. For example, for the case where the correlation degree is between 0 and 1 (including 0 and 1), it can be divided into the following five correlation degree intervals: 0 to 0.2 (including 0, not including 0.2), 0.2 to 0.4 ( Including 0.2, excluding 0.4), 0.4 to 0.6 (including 0.4, excluding 0.6), 0.6 to 0.8 (including 0.6, excluding 0.8) to 0.8-1 (including 0.8 and 1), and it can also be non-uniformly divided into the following: Five correlation intervals: 0 to 0.5 (including 0, not including 1), 0.5 to 0.7 (including 0.5, not including 0.7), 0.7 to 0.8 (including 0.7, not including 0.8), 0.8 to 0.9 (including 0.8, not Contains 0.9), 0.9 to 1 (including 0.9 and 1).
以上述均匀划分五个关联度区间为例,对应地,可以将关联度在0到0.2之间的搜索结果确定为一个搜索结果子集,将关联度在0.2到0.4之间的搜索结果确定为一个搜索结果子集,将关联度在0.4到0.6之间的搜索结果确定为一个搜索结果子集,将关联度在0.6到0.8之间的搜索结果确定为一个搜索结果子集,将关联度在0.8到1之间的搜索结果确定为一个搜索结果子集。Taking the above evenly divided five correlation degree intervals as an example, correspondingly, a search result with a correlation degree between 0 and 0.2 can be determined as a search result subset, and a search result with a correlation degree between 0.2 and 0.4 can be determined as A search result subset. A search result with a relevance between 0.4 and 0.6 is determined as a search result subset. A search result with a relevance between 0.6 and 0.8 is determined as a search result subset. Search results between 0.8 and 1 are determined as a subset of search results.
步骤404,从目标数目个搜索结果子集中分别选取搜索结果,得到搜索结果子集,以及将搜索结果子集中属于目标搜索结果的搜索结果确定为补充搜索信息。Step 404: Select search results from the target number of search result subsets to obtain the search result subsets, and determine the search results belonging to the target search results in the search result subsets as supplementary search information.
在本实施例中,可以从每个搜索结果子集中采样部分搜索结果,并将采样的搜索结果组合得到搜索结果子集。其中,对于每个搜索结果子集所采样的搜索结果的数目可以任意指定,也可以根据采样数目确定规则来确定。例如,采样数目确定规则可以为每个搜索结果子集采样的搜索结果的数目为各搜索结果子集所包含的搜索结果的数目的十分之一。又例如,采样数目也可以根据关联度确定,例如对应的关联度越接近0.5,对应采样的数目就越大。In this embodiment, a partial search result may be sampled from each search result subset, and the sampled search results may be combined to obtain a search result subset. The number of search results sampled for each search result subset may be arbitrarily specified, or may be determined according to a sampling number determination rule. For example, the number of samples determination rule may sample the number of search results for each search result subset to be one tenth of the number of search results included in each search result subset. For another example, the number of samples can also be determined according to the degree of correlation. For example, the closer the corresponding degree of correlation is to 0.5, the larger the number of corresponding samples is.
可选地,还可以进一步确定目标数目个搜索结果子集中的搜索结果子集的准确度。其中,准确度可以用于表示搜索结果子集中目标搜索结果所占的比例。举例来说,对于包含有30条搜索结果的搜索结果子集,其中,有20条属于目标搜索结果,那么该搜索结果子集的准确 度即为三分之二。Optionally, the accuracy of the search result subset in the target number of search result subsets may be further determined. Among them, accuracy can be used to represent the proportion of target search results in the search result subset. For example, for a search result subset containing 30 search results, 20 of which belong to the target search result, the accuracy of the search result subset is two thirds.
步骤405,执行如下搜索步骤:Step 405: Perform the following search steps:
步骤4051,将补充搜索信息在目标信息库中搜索的搜索结果与搜索结果集的并集确定为补充搜索结果集。Step 4051: Determine the union of the search result and the search result set searched in the target information base by the supplementary search information as the supplementary search result set.
本步骤4051的具体执行过程可参考图2对应实施例中的步骤2031的相关说明,在此不再赘述。For the specific execution process of step 4051, reference may be made to the description of step 2031 in the embodiment corresponding to FIG. 2, and details are not described herein again.
步骤4052,针对补充搜索结果集中的补充搜索结果,通过如下方式确定该补充搜索结果的关联度:Step 4052: For the supplementary search results in the supplementary search result set, determine the relevance of the supplementary search results in the following manner:
步骤40521,响应于该补充搜索结果存在于搜索结果集中,按照预设的关联度提升算法,更新该补充搜索结果的关联度,以及将更新后的关联度作为该补充搜索结果的关联度。In step 40521, in response to the supplementary search result existing in the search result set, the correlation degree of the supplementary search result is updated according to a preset correlation degree improvement algorithm, and the updated correlation degree is used as the correlation degree of the supplementary search result.
步骤40522,响应于该补充搜索结果不存在于搜索结果集中,确定该补充搜索结果的关联度。 Step 40522, in response to the supplementary search result not existing in the search result set, determining the relevance of the supplementary search result.
上述步骤40521和40522的具体执行过程可参考图3对应实施例中的步骤30421和30422的相关说明,在此不再赘述。For specific implementation processes of the above steps 40521 and 40522, reference may be made to the related description of steps 30421 and 30422 in the embodiment corresponding to FIG. 3, and details are not described herein again.
步骤4053,根据上述关联度区间,将补充搜索结果集划分为对应的目标数目个补充搜索结果子集。In step 4053, the supplementary search result set is divided into a corresponding number of target supplementary search result subsets according to the above-mentioned association degree interval.
在本实施例中,在得到补充搜索结果集之后,由于可能包含有新的搜索结果,而且再次被搜索到的搜索结果的关联度有所提升,因此可以将补充搜索结果按照关联度再次进行划分,具体的划分过程可以参考上述步骤403中的相关说明。In this embodiment, after the supplementary search result set is obtained, new search results may be included, and the relevance of the search results that are searched again is improved, so the supplementary search results may be divided again according to the relevance. For the specific division process, refer to the related description in step 403 above.
步骤4054,确定补充搜索结果集是否满足预设的收敛条件。Step 4054: Determine whether the supplementary search result set satisfies a preset convergence condition.
本步骤4054的具体执行过程可参考图2对应实施例中的步骤2032的相关说明,在此不再赘述。For the specific execution process of step 4054, refer to the related description of step 2032 in the embodiment corresponding to FIG. 2, and details are not described herein again.
步骤406,响应于确定补充搜索结果集不满足预设的收敛条件,从目标数目个补充搜索结果集中分别选取补充搜索结果,得到补充搜索结果子集,以及将补充搜索结果子集中属于目标搜索结果的补充搜索结果确定为补充搜索信息,将补充搜索结果集确定为搜索结果集,继续执行上述搜索步骤。Step 406: In response to determining that the supplementary search result set does not satisfy a preset convergence condition, select supplementary search results from the target number of supplementary search result sets to obtain a supplementary search result subset, and belong to the target search result to the supplementary search result subset. The supplementary search result of is determined as the supplementary search information, the supplementary search result set is determined as the search result set, and the foregoing search step is continued.
在本实施例中,可以从每个补充搜索结果子集中采样部分补充搜 索结果,并将采样的补充搜索结果组合得到补充搜索结果子集。其中,对于每个补充搜索结果子集所采样的补充搜索结果的数目可以任意指定,也可以根据采样数目确定规则来确定。例如,采样数目确定规则可以为每个补充搜索结果子集采样的补充搜索结果的数目为各补充搜索结果子集所包含的补充搜索结果的数目的十分之一。又例如,采样数目也可以根据关联度确定,例如对应的关联度越接近0.5,对应采样的数目就越大。In this embodiment, the search results can be supplemented from the sampling part of each supplementary search result subset, and the sampled supplementary search results can be combined to obtain a supplementary search result subset. The number of supplementary search results sampled for each supplementary search result subset may be arbitrarily specified, or may be determined according to a determination rule of the number of samples. For example, the sampling number determination rule may sample the number of supplementary search results for each supplementary search result subset to be one tenth of the number of supplementary search results included in each supplementary search result subset. For another example, the number of samples can also be determined according to the degree of correlation. For example, the closer the corresponding degree of correlation is to 0.5, the larger the number of corresponding samples is.
可选地,针对目标数目个补充搜索结果子集中的补充搜索子集,可以先确定该补充搜索结果子集所在的关联度区间对应的搜索结果集的准确度与预设的准确度阈值之间的差值的绝对值;然后,根据确定的绝对值和该补充搜索结果子集包含的补充搜索结果的数目,确定从该补充搜索结果子集中选取的补充搜索结果的数目,其中,确定的绝对值和选取的补充搜索结果的数目成反比,包含的补充搜索结果的数目和选取的补充搜索结果的数目成正比。Optionally, for the supplementary search subset of the target number of supplementary search result subsets, the accuracy between the search result set corresponding to the correlation interval in which the supplementary search result subset is located and the preset accuracy threshold may be determined. The absolute value of the difference between the two; then, based on the determined absolute value and the number of supplementary search results contained in the supplementary search result subset, the number of supplementary search results selected from the supplementary search result subset is determined, where the determined absolute The value is inversely proportional to the number of selected supplementary search results, and the number of included supplementary search results is directly proportional to the number of selected supplementary search results.
其中,预设的准确度阈值可以是由技术人员预先设置的。实践中,可以预先估计准确度的最小值和最大值,并根据预先估计的准确度的最小值和最大值来设置准确度阈值。举例来说,若准确度在0到1之间,那么可以将准确度阈值设置为0.5。此时,对应的准确度越接近0.5的补充搜索结果子集,对应采样的数目就越多;相反地,对应的准确度越接近0和1的补充搜索结果子集,对应采样的数目就越少。The preset accuracy threshold may be preset by a technician. In practice, the minimum and maximum values of accuracy can be estimated in advance, and the accuracy threshold can be set according to the minimum and maximum values of accuracy that are estimated in advance. For example, if the accuracy is between 0 and 1, then the accuracy threshold can be set to 0.5. At this time, the closer the corresponding accuracy is to the supplementary search result subset of 0.5, the more the number of corresponding samples is; on the contrary, the closer the corresponding accuracy is to the supplementary search result subset of 0 and 1, the more the corresponding number of samples is. less.
举例来说,两个补充搜索结果子集包括的补充搜索结果的数目分别为200条和400条,那么包含400条补充搜索结果的补充搜索结果子集的采样数目可以大于包含200条补充搜索结果的补充搜索结果子集的采样数目。For example, the number of supplementary search results included in the two supplementary search result subsets is 200 and 400 respectively, then the number of samples of the supplementary search result subset including 400 supplementary search results may be greater than that of the 200 supplementary search results. The number of samples of the supplementary search result subset.
可选地,上述搜索结果子集中的搜索结果的数目和补充搜索结果子集中的搜索结果的数目可以相同。即每次搜索完后,采样的总数目可以是固定的。Optionally, the number of search results in the search result subset and the number of search results in the supplementary search result subset may be the same. That is, after each search, the total number of samples can be fixed.
可选地,还可以进一步确定目标数目个补充搜索结果集中的补充搜索结果子集的准确度,其中,准确度可以用来表示补充搜索结果子集中的目标搜索结果所占的比例。Optionally, the accuracy of the supplementary search result subset in the target number of supplementary search result sets may be further determined, where the accuracy may be used to represent the proportion of target search results in the supplementary search result subset.
可选地,收敛条件可以为目标关联度区间对应的搜索结果子集和补充搜索结果子集的准确度的差值小于预设的差值阈值。Optionally, the convergence condition may be that the difference between the accuracy of the search result subset and the supplementary search result subset corresponding to the target relevance degree interval is less than a preset difference threshold.
可选地,收敛条件也可以为搜索结果集和补充搜索结果集的差集所包含的搜索结果的数目小于预设的数目差阈值。Optionally, the convergence condition may also be that the number of search results included in the difference between the search result set and the supplementary search result set is less than a preset number difference threshold.
步骤407,响应于确定补充搜索结果集满足上述收敛条件,按照关联度从大到小的顺序,从补充搜索结果集中选取目标数目个补充搜索结果作为目标搜索结果,得到目标搜索结果集合。In step 407, in response to determining that the supplementary search result set satisfies the above-mentioned convergence conditions, the target search result set is selected from the supplementary search result set as the target search result in the order of the correlation degree from large to small to obtain a target search result set.
本步骤的具体的执行过程可参考图3对应实施例中的步骤306的相关说明,在此不再赘述。For the specific execution process of this step, reference may be made to the description of step 306 in the embodiment corresponding to FIG. 3, and details are not described herein again.
继续参见图5,图5是根据本实施例的用于搜索信息的方法的应用场景的一个示意图。在图5的应用场景中,可以先利用图像501在图像库502中进行搜索得到搜索结果集503,其中,搜索结果集503包括100张图像。之后,可以分别计算搜索结果集503中的各个图像与图像501的相似度,并将相似度作为各个图像的关联度。之后,可以按照关联度,将搜索结果集划分为三个子集(如图中标号504所示)。第一个子集中的图像的关联度在0到0.5之间,共包含30张图像,第二个子集中的图像的关联度在0.5到0.8之间,共包含50张图像,第三个子集中的图像的关联度在0.8到1之间,共包含20张图像。With continued reference to FIG. 5, FIG. 5 is a schematic diagram of an application scenario of the method for searching information according to this embodiment. In the application scenario of FIG. 5, a search result set 503 may be obtained by first searching the image library 502 using the image 501, where the search result set 503 includes 100 images. After that, the similarity between each image in the search result set 503 and the image 501 can be calculated separately, and the similarity can be used as the correlation between each image. After that, the search result set may be divided into three subsets according to the relevance degree (as shown by reference numeral 504 in the figure). The correlation degree of the images in the first subset is between 0 and 0.5, which contains a total of 30 images. The correlation degree of the images in the second subset is between 0.5 and 0.8, which contains a total of 50 images. The relevance of the images is between 0.8 and 1 and contains a total of 20 images.
然后,可以在三个子集中进行采样,并将采样中包含的目标图像组合得到补充搜索图像506。具体可以对第一个子集采样10张图像,对第二个子集采样10张图像,对第三个子集采样3张图像。具体如图中标号505所示,第一个子集采样的图像中包含有5张目标图像,第二个子集采样的图像中包含有8张目标图像,第三个子集采样的图像中包含有3张目标图像,因此,可以得到16张目标图像作为补充搜索图像506。进一步地,还可以得到第一个子集采样的10张图像的准确度为二分之一,第二个子集采样的准确度为五分之四,第三个子集采样的准确度为一。Then, sampling can be performed in three subsets, and the target images included in the sampling are combined to obtain a supplementary search image 506. Specifically, 10 images can be sampled for the first subset, 10 images can be sampled for the second subset, and 3 images can be sampled for the third subset. Specifically, as shown by reference numeral 505 in the figure, the image sampled by the first subset contains 5 target images, the image sampled by the second subset contains 8 target images, and the image sampled by the third subset contains Since three target images are obtained, 16 target images can be obtained as the supplementary search image 506. Further, it can also be obtained that the accuracy of the 10 images sampled by the first subset is one-half, the accuracy of the sampling by the second subset is four-fifths, and the accuracy by the third subset is one.
然后,可以将补充搜索图像506在图像库502中进行搜索,得到本次的搜索结果集507,其中,本次的搜索结果集507中包含120条图像。进一步可以将上一次搜索得到搜索结果集503和本次搜索的补 充搜索结果集的并集作为补充搜索结果集508,其中,补充搜索结果集包含150张图像。Then, the supplementary search image 506 may be searched in the image library 502 to obtain the current search result set 507, where the current search result set 507 includes 120 images. Further, the union of the search result set 503 obtained from the previous search and the supplementary search result set of the current search may be used as the supplementary search result set 508, where the supplementary search result set includes 150 images.
可见,本次搜索过程中,有30张新搜索到的图像,还有90张图像再次被搜索到。对于90张再次被搜索到的图像的关联度可以进行提升,对于30张新搜索到的图像,可以将这些图像分别与图像501的相似度作为这些图像的关联度。从而将补充搜索结果集所包含的150张图像按照关联度重新分组,得到新的三个子集(如图中509所示)。It can be seen that during this search, there were 30 newly searched images and another 90 images were searched again. For 90 images that are searched again, the degree of correlation can be improved. For 30 newly searched images, the similarity between these images and the image 501 can be used as the correlation of these images. Thus, the 150 images included in the supplementary search result set are regrouped according to the relevance degree, and new three subsets are obtained (as shown in 509 in the figure).
新得到的关联度在0到0.5之间的第一个子集包含45张图像,新得到的关联度在0.5到0.8之间的第二个子集包含80张图像,新得到的关联度在0.8到1之间的第三个子集包含25张图像。The first subset with newly obtained correlation between 0 and 0.5 contains 45 images, the second subset with newly obtained correlation between 0.5 and 0.8 contains 80 images, and the newly obtained correlation with 0.8 The third subset between 1 and 25 contains 25 images.
之后,可以按照上一次计算得到的三个关联度区间对应的子集采样的准确度,对新得到的三个子集再次分别进行采样。上一次三个子集采样的准确度分别为二分之一、五分之四和一,那么本次采样的过程中,可以根据预先设置准确度阈值(如0.5)来确定在各个子集中的采样数目。After that, according to the accuracy of the subset sampling corresponding to the three correlation degree intervals calculated last time, the three newly obtained subsets are sampled again. The accuracy of the last three subsets was one-half, four-fifths, and one, so in this sampling process, the sampling in each subset can be determined according to the preset accuracy threshold (such as 0.5). number.
对于新得到的第一个子集,由于该子集的准确度和准确度阈值的差值的绝对值为0,因此,可以在该子集中多采样一些图像。对于新得到的第二个子集和新得到的第三个子集,由于这两个子集的准确度与准确度阈值的差值的绝对值较大,因此,在这两个子集中可以少采样一些图像。具体地,由于新得到的第二个子集的准确度与准确度阈值的差值的绝对值小于新得到的第三个子集的准确度与准确度阈值的差值的绝对值,因此,对于新得到的第二个子集的采样的数目可以大于新得到的第三个子集的采样的数目。For the newly obtained first subset, since the absolute value of the difference between the accuracy and the accuracy threshold of the subset is 0, more images can be sampled in the subset. For the newly obtained second subset and the newly obtained third subset, since the absolute value of the difference between the accuracy and the accuracy threshold of the two subsets is large, so fewer images can be sampled in these two subsets . Specifically, since the absolute value of the difference between the accuracy and the accuracy threshold of the newly obtained second subset is smaller than the absolute value of the difference between the accuracy and the accuracy threshold of the newly obtained third subset, The number of samples obtained in the second subset may be greater than the number of samples in the newly obtained third subset.
具体可以在新得到的第一个子集中采样12张图像,在新得到的第二个子集中采样10张图像,在新得到的第三个子集中采样1张图像。具体如图中标号510所示,在新得到的第一个子集和第二个子集所采样的图像中均包含有8张目标图像,在新得到的第三个子集所采样的图像中包含有1张目标图像。进一步还可以计算得到新得到的第一个子集的准确度为三分之二,新得到的第二个子集的准确度为五分之四,新得到的第三个子集的准确度为一。Specifically, 12 images can be sampled in the newly obtained first subset, 10 images can be sampled in the newly obtained second subset, and 1 image can be sampled in the newly obtained third subset. Specifically, as shown by reference numeral 510 in the figure, each of the images obtained in the newly obtained first subset and the second subset includes 8 target images, and the images obtained in the newly obtained third subset include There are 1 target images. It can further be calculated that the accuracy of the newly obtained first subset is two thirds, the accuracy of the newly obtained second subset is four fifths, and the accuracy of the newly obtained third subset is one. .
之后,将新采样的共17张图像组成的补充搜索结果集511作为补充搜索图像,将补充搜索结果集508作为搜索结果集再次执行上述搜索过程。After that, the above-mentioned search process is performed again using the supplementary search result set 511 composed of a total of 17 newly sampled images as a supplementary search image, and using the supplementary search result set 508 as a search result set.
具体地,先利用补充搜索结果集511在图像库502中进行搜索,将得到的搜索结果与补充搜索结果集508的并集作为新的补充搜索结果集512。新的补充搜索结果集512包含160张图像。类似地,可以提升再次被搜索到的图像的关联度,对于新搜索到的图像,可以将新搜索到图像与图像501的相似度作为新搜索到的图像的关联度。之后,按照重新确定的关联度对新的补充搜索结果集512重新划分为三个子集。具体如图中标号513所示,新划分的第一个子集中包含45张图像,在新划分的第二个子集中包含87张图像,在新划分的第三个子集中包含28张图像。Specifically, a search is performed in the image library 502 by using the supplementary search result set 511, and a union of the obtained search result and the supplementary search result set 508 is used as a new supplementary search result set 512. The new supplementary search result set 512 contains 160 images. Similarly, the degree of relevance of the image searched again can be improved. For the newly searched image, the similarity between the newly searched image and the image 501 can be used as the relevance of the newly searched image. After that, the new supplementary search result set 512 is re-divided into three subsets according to the re-determined relevance degree. Specifically, as shown by reference numeral 513 in the figure, the newly divided first subset contains 45 images, the newly divided second subset contains 87 images, and the newly divided third subset contains 28 images.
由于上次计算得出三个关联度区间分别对应的准确度为三分之二、五分之四和一。可以再分别从新划分的三个子集中采样图像,并计算本次采样的准确度,其中,可以对于第一个子集多采样一些图像。具体新划分的三个子集分别采样后计算出的准确度分别为二分之一、十分之九和一。由此可以看出新划分的第二子子集和新划分的第三个子集的准确度变化较小,而且本次新的补充搜索结果集512和前一次的补充搜索结果集508相比,新搜索到的图像也较少,那么可以直接将本次搜索新划分的第二个子集和新划分的第三个子集中的共115张图像作为目标图像集514。Because the accuracy of the three correlation intervals calculated last time is two-thirds, four-fifths, and one. The images can be sampled from the three newly divided subsets and the accuracy of this sampling can be calculated. Among them, more images can be sampled for the first subset. The accuracy of the newly divided three subsets is calculated to be one-half, nine-tenths, and one, respectively. It can be seen that the accuracy of the newly divided second subset and the newly divided third subset is small, and the new supplementary search result set 512 is compared with the previous supplementary search result set 508. There are also fewer newly searched images, so a total of 115 images in the newly divided second subset and the newly divided third subset can be directly used as the target image set 514.
从图4中可以看出,与图2和图3对应的实施例相比,本实施例中的用于搜索信息的方法的流程突出了在每次搜索完并确定了各搜索结果关联度之后,还可以按照预设的关联度区间,将搜索结果划分为对应的多个子集,并在各个子集中分别采样进行补充搜索,从而使得补充搜索过程可以覆盖各个关联度区间,进一步提升搜索结果的覆盖率。在此基础上,在对各个子集分别采样后,还可以进一步根据采样结果确定各个子集对应的准确度,并根据准确度确定下一次该关联度区间的采样数目,从而实现有针对性地进行补充搜索,进一步提升搜索结果的准确度和搜索效率。As can be seen from FIG. 4, compared with the embodiments corresponding to FIGS. 2 and 3, the flow of the method for searching information in this embodiment highlights that after each search is completed and the degree of relevance of each search result is determined The search results can also be divided into corresponding multiple subsets according to a preset correlation degree interval, and supplementary search is sampled in each subset, so that the supplementary search process can cover each correlation degree interval, and further improve the search result. Coverage. On this basis, after each subset is sampled separately, the accuracy corresponding to each subset can be further determined according to the sampling result, and the number of samples for the next interval of the correlation degree can be determined according to the accuracy, thereby achieving targeted Perform supplementary searches to further improve the accuracy and search efficiency of search results.
进一步参考图6,作为对上述各图所示方法的实现,本申请提供了用于搜索信息的装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。With further reference to FIG. 6, as an implementation of the methods shown in the foregoing figures, this application provides an embodiment of a device for searching for information. The device embodiment corresponds to the method embodiment shown in FIG. 2, and the device is specific Can be applied to various electronic devices.
如图6所示,本实施例提供的用于搜索信息的装置600包括搜索单元601、选取单元602、补充搜索单元603和确定单元604。其中,搜索单元601,被配置成利用获取的搜索信息在目标信息库中搜索,得到搜索结果集;选取单元602,被配置成从搜索结果集中选取搜索结果子集,以及将搜索结果子集中属于目标搜索结果的搜索结果确定为补充搜索信息;补充搜索单元603,被配置成执行如下搜索步骤:将补充搜索信息在目标信息库中搜索的搜索结果与搜索结果集的并集确定为补充搜索结果集;确定补充搜索结果集是否满足预设的收敛条件;确定单元604,被配置成响应于确定补充搜索结果集不满足预设的收敛条件,从补充搜索结果集中选取补充搜索结果子集,以及将补充搜索结果子集中属于目标搜索结果的补充搜索结果确定为补充搜索信息,将补充搜索结果集确定为搜索结果集,继续执行上述搜索步骤。As shown in FIG. 6, the apparatus 600 for searching information provided in this embodiment includes a search unit 601, a selection unit 602, a supplemental search unit 603, and a determination unit 604. The search unit 601 is configured to search the target information base using the obtained search information to obtain a search result set; the selection unit 602 is configured to select a search result subset from the search result set, and the search result subset belongs to The search result of the target search result is determined as the supplementary search information; the supplemental search unit 603 is configured to perform the following search step: determining the union of the search result and the search result set searched by the supplementary search information in the target information base as the supplementary search result Determining whether the set of supplementary search results meets a preset convergence condition; a determining unit 604 configured to select a supplementary search result subset from the set of supplemental search results in response to determining that the set of supplemental search results does not meet the preset convergence conditions, and The supplementary search results belonging to the target search result subset in the supplementary search result subset are determined as supplementary search information, the supplementary search result set is determined as the search result set, and the foregoing search step is continued.
在本实施例中,用于搜索信息的装置600中:搜索单元601、选取单元602、补充搜索单元603和确定单元604的具体处理及其所带来的技术效果可分别参考图2对应实施例中的步骤201、步骤202、步骤203和步骤204的相关说明,在此不再赘述。In this embodiment, in the apparatus 600 for searching information: the specific processing of the search unit 601, the selection unit 602, the supplemental search unit 603, and the determination unit 604 and the technical effects brought by them can be referred to the corresponding embodiment in FIG. 2 respectively. Relevant descriptions of step 201, step 202, step 203, and step 204 in the description are not repeated here.
在本实施例的一些可选的实现方式中,上述补充搜索单元603进一步被配置成:响应于确定补充搜索结果集满足上述收敛条件,从补充搜索结果集中确定目标搜索结果集。In some optional implementations of this embodiment, the above-mentioned supplementary search unit 603 is further configured to: in response to determining that the supplementary search result set satisfies the above-mentioned convergence condition, determine a target search result set from the supplementary search result set.
在本实施例的一些可选的实现方式中,上述选取单元602进一步被配置成:获取搜索结果子集中的搜索结果的标注信息,其中,标注信息用于表示搜索结果是否是目标搜索结果;根据标注信息,确定搜索结果子集中属于目标搜索结果的搜索结果。In some optional implementation manners of this embodiment, the above-mentioned selecting unit 602 is further configured to: obtain annotation information of the search results in the search result subset, wherein the annotation information is used to indicate whether the search result is a target search result; according to Annotate the information to determine the search results that belong to the target search result in the search result subset.
在本实施例的一些可选的实现方式中,该用于搜索信息的装置600还包括:关联度确定单元(图中未示出)被配置成确定搜索结果集中的搜索结果的关联度,其中,关联度用于表示搜索结果与搜索信 息的关联程度。In some optional implementation manners of this embodiment, the apparatus 600 for searching information further includes: a relevance determination unit (not shown in the figure) configured to determine a relevance of search results in a search result set, where , The degree of relevance is used to indicate the degree of relevance of the search results to the search information.
在本实施例的一些可选的实现方式中,上述关联度确定单元进一步被配置成:确定该搜索结果与搜索信息的相似度;根据相似度,确定该搜索结果的关联度,其中,相似度与关联度成正比。In some optional implementation manners of this embodiment, the above-mentioned relevance determination unit is further configured to: determine a similarity between the search result and the search information; and determine a relevance between the search results according to the similarity, where the similarity Proportionally related.
在本实施例的一些可选的实现方式中,上述关联度确定单元进一步被配置成:针对补充搜索结果集中的补充搜索结果,响应于该补充搜索结果存在于搜索结果集中,按照预设的关联度提升算法,更新该补充搜索结果的关联度,以及将更新后的关联度作为该补充搜索结果的关联度;响应于该补充搜索结果不存在于搜索结果集中,确定该补充搜索结果的关联度。In some optional implementations of this embodiment, the above-mentioned correlation degree determining unit is further configured to: for a supplementary search result in a supplementary search result set, in response to the supplementary search result existing in the search result set, according to a preset association Degree improvement algorithm, updating the relevance degree of the supplementary search result, and using the updated relevance degree as the relevance degree of the supplementary search result; determining the relevance degree of the supplementary search result in response to the supplementary search result not existing in the search result set .
在本实施例的一些可选的实现方式中,上述关联度确定单元进一步被配置成:确定该补充搜索结果与搜索信息的相似度;根据相似度,确定该补充搜索结果的关联度,其中,相似度与关联度成正比。In some optional implementation manners of this embodiment, the above-mentioned correlation degree determining unit is further configured to: determine a similarity between the supplementary search result and the search information; and determine a correlation degree of the supplementary search result according to the similarity, where: Similarity is directly proportional to relevance.
在本实施例的一些可选的实现方式中,上述补充搜索单元603进一步被配置成:按照关联度从大到小的顺序,从补充搜索结果集中选取目标数目个补充搜索结果作为目标搜索结果,得到目标搜索结果集。In some optional implementation manners of this embodiment, the above-mentioned supplementary search unit 603 is further configured to: select the number of supplementary search results from the supplementary search result set as the target search result in the order of relevance from large to small, Get the target search result set.
在本实施例的一些可选的实现方式中,该用于搜索信息的装置600还包括划分单元(图中未示出)被配置成:按照预设的目标数目个关联度区间,将搜索结果集划分为对应的目标数目个搜索结果子集;以及上述选取单元,进一步被配置成:从目标数目个搜索结果子集中分别选取搜索结果,得到搜索结果子集。In some optional implementation manners of this embodiment, the apparatus 600 for searching information further includes a dividing unit (not shown in the figure) configured to: search results according to a preset number of target relevance intervals. The set is divided into a corresponding number of target search result subsets; and the above-mentioned selection unit is further configured to: respectively select search results from the target number of search result subsets to obtain a search result subset.
在本实施例的一些可选的实现方式中,该用于搜索信息的装置600还包括:准确度确定单元(图中未示出)被配置成确定目标数目个搜索结果子集中的搜索结果子集的准确度,其中,准确度用于表示搜索结果子集中目标搜索结果所占的比例。In some optional implementation manners of this embodiment, the apparatus 600 for searching information further includes: an accuracy determination unit (not shown in the figure) configured to determine search result sub-sets in the target number search result subset. Set accuracy, where accuracy is used to represent the proportion of target search results in a subset of search results.
在本实施例的一些可选的实现方式中,上述划分单元进一步被配置成:按照上述关联度区间,将补充搜索结果集划分为对应的目标数目个补充搜索结果子集;以及上述确定单元进一步被配置成:从目标数目个补充搜索结果子集中分别选取补充搜索结果,得到补充搜索结果子集。In some optional implementations of this embodiment, the above-mentioned dividing unit is further configured to divide the supplementary search result set into corresponding target number supplementary search result subsets according to the above-mentioned correlation degree interval; and the above-mentioned determining unit further It is configured to select supplementary search results from the target number of supplementary search result subsets respectively to obtain a supplementary search result subset.
在本实施例的一些可选的实现方式中,准确度确定单元进一步被配置成:确定目标数目个补充搜索结果集中的补充搜索结果子集的准确度,其中,准确度用于表示补充搜索结果子集中的目标搜索结果所占的比例。In some optional implementation manners of this embodiment, the accuracy determination unit is further configured to determine the accuracy of the supplementary search result subset of the target number of supplementary search result sets, where the accuracy is used to represent the supplementary search results The percentage of target search results in the subset.
在本实施例的一些可选的实现方式中,上述确定单元进一步被配置成:针对目标数目个补充搜索结果子集中的补充搜索子集,确定该补充搜索结果子集所在的关联度区间对应的搜索结果集的准确度与预设的准确度阈值之间的差值的绝对值;根据绝对值和该补充搜索结果子集包含的补充搜索结果的数目,确定从该补充搜索结果子集中选取的补充搜索结果的数目,其中,绝对值和选取的补充搜索结果的数目成反比,包含的补充搜索结果的数目和选取的补充搜索结果的数目成正比。In some optional implementation manners of this embodiment, the foregoing determining unit is further configured to: for a supplementary search subset of the target number of supplementary search result subsets, determine a correlation degree interval corresponding to the supplementary search result subset. The absolute value of the difference between the accuracy of the search result set and a preset accuracy threshold; based on the absolute value and the number of supplementary search results contained in the supplementary search result subset, determining the selected from the supplementary search result subset The number of supplementary search results, where the absolute value is inversely proportional to the number of selected supplementary search results, and the number of supplementary search results included is directly proportional to the number of selected supplementary search results.
在本实施例的一些可选的实现方式中,搜索结果集中的搜索结果的数目和补充搜索结果子集中的补充搜索结果的数目相同。In some optional implementation manners of this embodiment, the number of search results in the search result set and the number of supplementary search results in the supplementary search result subset are the same.
在本实施例的一些可选的实现方式中,收敛条件包括:目标关联度区间对应的搜索结果子集和补充搜索结果子集的准确度的差值小于预设的差值阈值。In some optional implementation manners of this embodiment, the convergence conditions include: a difference between the accuracy of the search result subset and the supplementary search result subset corresponding to the target relevance degree interval is less than a preset difference threshold.
在本实施例的一些可选的实现方式中,收敛条件包括:搜索结果集和补充搜索结果集的差集所包含的搜索结果的数目小于预设的数目差阈值。In some optional implementation manners of this embodiment, the convergence condition includes: the number of search results included in the difference between the search result set and the supplementary search result set is less than a preset number difference threshold.
本申请的上述实施例提供的装置,通过搜索单元直接得到搜索结果之后,可以通过选取单元采样部分搜索结果并从中选取属于目标搜索结果的搜索结果由补充搜索单元再次进行搜索,之后,确定单元根据再次搜索的搜索结果判断是否继续采样进行搜索,从而实现了增大搜索结果的数目,提升搜索结果的覆盖率。In the device provided by the foregoing embodiment of the present application, after the search result is directly obtained by the search unit, the partial search result may be sampled by the selection unit and the search result belonging to the target search result may be selected and searched again by the supplementary search unit. The search result of searching again determines whether to continue sampling for searching, thereby achieving an increase in the number of search results and improving the coverage of the search results.
下面参考图7,其示出了适于用来实现本申请实施例的电子设备的计算机系统700的结构示意图。图7示出的电子设备仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。Reference is now made to FIG. 7, which illustrates a schematic structural diagram of a computer system 700 suitable for implementing an electronic device according to an embodiment of the present application. The electronic device shown in FIG. 7 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application.
如图7所示,计算机系统700包括中央处理单元(CPU)701,其 可以根据存储在只读存储器(ROM)702中的程序或者从存储部分708加载到随机访问存储器(RAM)703中的程序而执行各种适当的动作和处理。在RAM 703中,还存储有系统700操作所需的各种程序和数据。CPU 701、ROM 702以及RAM 703通过总线704彼此相连。输入/输出(I/O)接口705也连接至总线704。As shown in FIG. 7, the computer system 700 includes a central processing unit (CPU) 701, which can be loaded into a random access memory (RAM) 703 from a program stored in a read-only memory (ROM) 702 or from a storage section 708. Instead, perform various appropriate actions and processes. In the RAM 703, various programs and data required for the operation of the system 700 are also stored. The CPU 701, ROM 702, and RAM 703 are connected to each other through a bus 704. An input / output (I / O) interface 705 is also connected to the bus 704.
以下部件连接至I/O接口705:包括键盘、鼠标等的输入部分706;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分707;包括硬盘等的存储部分708;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分709。通信部分709经由诸如因特网的网络执行通信处理。驱动器710也根据需要连接至I/O接口705。可拆卸介质711,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器710上,以便于从其上读出的计算机程序根据需要被安装入存储部分708。The following components are connected to the I / O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output portion 707 including a cathode ray tube (CRT), a liquid crystal display (LCD), and a speaker; a storage portion 708 including a hard disk and the like And a communication section 709 including a network interface card such as a LAN card, a modem, and the like. The communication section 709 performs communication processing via a network such as the Internet. The driver 710 is also connected to the I / O interface 705 as needed. A removable medium 711, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 710 as needed, so that a computer program read out therefrom is installed into the storage section 708 as needed.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分709从网络上被下载和安装,和/或从可拆卸介质711被安装。在该计算机程序被中央处理单元(CPU)701执行时,执行本申请的方法中限定的上述功能。In particular, according to an embodiment of the present disclosure, the process described above with reference to the flowchart may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 709, and / or installed from a removable medium 711. When the computer program is executed by a central processing unit (CPU) 701, the above-mentioned functions defined in the method of the present application are executed.
需要说明的是,本申请的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装 置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium of the present application may be a computer-readable signal medium or a computer-readable storage medium or any combination of the foregoing. The computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programming read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing. In this application, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device. In this application, a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device . Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more functions to implement a specified logical function Executable instructions. It should also be noted that in some alternative implementations, the functions noted in the blocks may also occur in a different order than those marked in the drawings. For example, two successively represented boxes may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented by a dedicated hardware-based system that performs the specified function or operation , Or it can be implemented with a combination of dedicated hardware and computer instructions.
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器,包括搜索单元、选取单元、补充搜索单元和确定单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,搜索单元还可以被描述为“利用获取的搜索信息在目标信息库中搜索,得到搜索结果集的单元”。The units described in the embodiments of the present application may be implemented by software or hardware. The described unit may also be provided in the processor, for example, it may be described as: a processor including a search unit, a selection unit, a supplementary search unit, and a determination unit. Among them, the names of these units do not constitute a limitation on the unit itself in some cases. For example, the search unit can also be described as "a unit that uses the obtained search information to search in the target information base to obtain a search result set." .
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使 得该电子设备:利用获取的搜索信息在目标信息库中搜索,得到搜索结果集;从搜索结果集中选取搜索结果子集,以及将搜索结果子集中属于目标搜索结果的搜索结果确定为补充搜索信息;执行如下搜索步骤:将补充搜索信息在目标信息库中搜索的搜索结果与搜索结果集的并集确定为补充搜索结果集;确定补充搜索结果集是否满足预设的收敛条件;响应于确定补充搜索结果集不满足预设的收敛条件,从补充搜索结果集中选取补充搜索结果子集,以及将补充搜索结果子集中属于目标搜索结果的补充搜索结果确定为补充搜索信息,将补充搜索结果集确定为搜索结果集,继续执行上述搜索步骤。As another aspect, the present application also provides a computer-readable medium, which may be included in the electronic device described in the foregoing embodiments; or may exist alone without being assembled into the electronic device in. The computer-readable medium carries one or more programs, and when the one or more programs are executed by the electronic device, the electronic device: searches the target information base using the obtained search information to obtain a search result set; Select a search result subset in the search result set, and determine the search results that belong to the target search result in the search result subset as supplementary search information; perform the following search steps: search results and search result sets that search for the supplementary search information in the target information database The union set is determined as the supplementary search result set; determining whether the supplementary search result set satisfies a preset convergence condition; and in response to determining that the supplementary search result set does not satisfy the preset convergence condition, selecting a supplementary search result subset from the supplementary search result set, And determining the supplementary search result belonging to the target search result in the supplementary search result subset as supplementary search information, determining the supplementary search result set as the search result set, and continuing to execute the search step described above.
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present application and an explanation of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in this application is not limited to the technical solution of the specific combination of the above technical features, but it should also cover the above technical features or Other technical solutions formed by arbitrarily combining their equivalent features. For example, a technical solution formed by replacing the above features with technical features disclosed in the present application (but not limited to) with similar functions.

Claims (34)

  1. 一种用于搜索信息的方法,包括:A method for searching information includes:
    利用获取的搜索信息在目标信息库中搜索,得到搜索结果集;Use the obtained search information to search in the target information database to obtain a search result set;
    从搜索结果集中选取搜索结果子集,以及将搜索结果子集中属于目标搜索结果的搜索结果确定为补充搜索信息;Selecting a search result subset from the search result set, and determining search results belonging to the target search result in the search result subset as supplementary search information;
    执行如下搜索步骤:将补充搜索信息在目标信息库中搜索的搜索结果与搜索结果集的并集确定为补充搜索结果集;确定补充搜索结果集是否满足预设的收敛条件;Perform the following search steps: determine the union of the search result and the search result set searched in the target information base by the supplementary search information as the supplementary search result set; determine whether the supplementary search result set satisfies a preset convergence condition;
    响应于确定补充搜索结果集不满足预设的收敛条件,从补充搜索结果集中选取补充搜索结果子集,以及将补充搜索结果子集中属于目标搜索结果的补充搜索结果确定为补充搜索信息,将补充搜索结果集确定为搜索结果集,继续执行所述搜索步骤。In response to determining that the supplementary search result set does not satisfy a preset convergence condition, selecting a supplementary search result subset from the supplementary search result set, and determining the supplementary search result belonging to the target search result in the supplementary search result subset as supplementary search information, and supplementing the The search result set is determined as the search result set, and the search step is continued.
  2. 根据权利要求1所述的方法,其中,所述搜索步骤还包括:The method according to claim 1, wherein the searching step further comprises:
    响应于确定补充搜索结果集满足所述收敛条件,从补充搜索结果集中确定目标搜索结果集。In response to determining that the supplementary search result set satisfies the convergence condition, a target search result set is determined from the supplementary search result set.
  3. 根据权利要求1所述的方法,其中,所述将搜索结果子集中属于目标搜索结果的搜索结果确定为补充搜索信息,包括:The method according to claim 1, wherein determining the search results belonging to the target search result in the search result subset as supplementary search information comprises:
    获取搜索结果子集中的搜索结果的标注信息,其中,标注信息用于表示搜索结果是否是目标搜索结果;Obtain annotation information of search results in a search result subset, where the annotation information is used to indicate whether the search result is a target search result;
    根据标注信息,确定搜索结果子集中属于目标搜索结果的搜索结果。According to the labeled information, the search results in the search result subset that belong to the target search result are determined.
  4. 根据权利要求2所述的方法,其中,在所述利用搜索信息在目标信息库中搜索,得到搜索结果集之后,还包括:The method according to claim 2, wherein after said using the search information to search in a target information database to obtain a search result set, further comprising:
    确定搜索结果集中的搜索结果的关联度,其中,关联度用于表示搜索结果与搜索信息的关联程度。Determine the relevance of the search results in the search result set, where the relevance is used to indicate the relevance of the search results to the search information.
  5. 根据权利要求4所述的方法,其中,针对搜索结果集中的搜索结果,该搜索结果的关联度通过如下步骤确定:The method according to claim 4, wherein, for the search results in the search result set, the relevance of the search results is determined by the following steps:
    确定该搜索结果与搜索信息的相似度;Determine the similarity between the search results and the search information;
    根据相似度,确定该搜索结果的关联度,其中,相似度与关联度成正比。According to the similarity, the relevance of the search result is determined, wherein the similarity is directly proportional to the relevance.
  6. 根据权利要求4所述的方法,其中,在所述将补充搜索信息在目标信息库中的搜索结果与搜索结果集的并集确定为补充搜索结果集之后,还包括:The method according to claim 4, wherein after determining the union of the search result and the search result set of the supplementary search information in the target information base as the supplementary search result set, further comprising:
    针对补充搜索结果集中的补充搜索结果,响应于该补充搜索结果存在于搜索结果集中,按照预设的关联度提升算法,更新该补充搜索结果的关联度,以及将更新后的关联度作为该补充搜索结果的关联度;响应于该补充搜索结果不存在于搜索结果集中,确定该补充搜索结果的关联度。For the supplementary search results in the supplementary search result set, in response to the supplementary search results existing in the search result set, according to a preset correlation degree improvement algorithm, update the correlation degree of the supplementary search results, and use the updated correlation degree as the supplement The relevance of the search results; in response to the supplementary search results not being present in the search result set, the relevance of the supplementary search results is determined.
  7. 根据权利要求6所述的方法,其中,所述确定该补充搜索结果的关联度包括:The method according to claim 6, wherein the determining the relevance of the supplementary search result comprises:
    确定该补充搜索结果与搜索信息的相似度;Determine the similarity between the supplementary search results and the search information;
    根据相似度,确定该补充搜索结果的关联度,其中,相似度与关联度成正比。The relevance degree of the supplementary search result is determined according to the similarity degree, wherein the similarity degree is directly proportional to the relevance degree.
  8. 根据权利要求6所述的方法,其中,所述从补充搜索结果集中确定目标搜索结果集,包括:The method according to claim 6, wherein said determining a target search result set from a supplementary search result set comprises:
    按照关联度从大到小的顺序,从补充搜索结果集中选取目标数目个补充搜索结果作为目标搜索结果,得到目标搜索结果集。According to the order of relevance, from the supplementary search result set, the target number of supplementary search results are selected as the target search result, and the target search result set is obtained.
  9. 根据权利要求4所述的方法,其中,所述方法还包括:The method according to claim 4, wherein the method further comprises:
    按照预设的目标数目个关联度区间,将搜索结果集划分为对应的目标数目个搜索结果子集;以及Dividing the search result set into a corresponding number of target search result subsets according to a preset number of target relevance intervals; and
    所述从搜索结果集中选取搜索结果子集,包括:The selecting a search result subset from the search result set includes:
    从目标数目个搜索结果子集中分别选取搜索结果,得到搜索结果子集。The search results are selected from the target number of search result subsets to obtain the search result subsets.
  10. 根据权利要求9所述的方法,其中,所述方法还包括:The method according to claim 9, wherein the method further comprises:
    确定目标数目个搜索结果子集中的搜索结果子集的准确度,其中,准确度用于表示搜索结果子集中目标搜索结果所占的比例。Determine the accuracy of the search result subset in the number of targets search result subset, where the accuracy is used to represent the proportion of the target search results in the search result subset.
  11. 根据权利要求10所述的方法,其中,在所述将补充搜索信息在目标信息库中的搜索结果与搜索结果集的并集确定为补充搜索结果集之后,还包括:The method according to claim 10, wherein after determining the union of the search result and the search result set of the supplementary search information in the target information base as the supplementary search result set, further comprising:
    按照所述关联度区间,将补充搜索结果集划分为对应的目标数目个补充搜索结果子集;以及Dividing the supplementary search result set into corresponding target number supplementary search result subsets according to the correlation degree interval; and
    所述从补充搜索结果集中选取补充搜索结果子集,包括:The selecting a supplementary search result subset from the supplementary search result set includes:
    从目标数目个补充搜索结果子集中分别选取补充搜索结果,得到补充搜索结果子集。The supplementary search results are selected from the target number of supplementary search result subsets to obtain the supplementary search result subsets.
  12. 根据权利要求11所述的方法,其中,所述方法还包括:The method according to claim 11, wherein the method further comprises:
    确定目标数目个补充搜索结果集中的补充搜索结果子集的准确度,其中,准确度用于表示补充搜索结果子集中的目标搜索结果所占的比例。Determine the accuracy of the supplementary search result subset of the target number of supplementary search result sets, where the accuracy is used to represent the proportion of target search results in the supplementary search result subset.
  13. 根据权利要求12所述的方法,其中,所述从目标数目个补充搜索结果子集中分别选取补充搜索结果,包括:The method according to claim 12, wherein the selecting the supplementary search results from the target number of supplementary search result subsets respectively comprises:
    针对目标数目个补充搜索结果子集中的补充搜索子集,确定该补充搜索结果子集所在的关联度区间对应的搜索结果集的准确度与预设的准确度阈值之间的差值的绝对值;根据所述绝对值和该补充搜索结果子集包含的补充搜索结果的数目,确定从该补充搜索结果子集中选取的补充搜索结果的数目,其中,所述绝对值和选取的补充搜索结果的数目成反比,包含的补充搜索结果的数目和选取的补充搜索结果的数目成正比。For the supplementary search subset of the target number of supplementary search result subsets, determine the absolute value of the difference between the accuracy of the search result set corresponding to the correlation interval in which the supplementary search result subset is located and the preset accuracy threshold. Determining the number of supplementary search results selected from the supplementary search result subset according to the absolute value and the number of supplementary search results included in the supplementary search result subset, wherein the absolute value and the number of supplementary search results selected The number is inversely proportional, and the number of supplementary search results included is proportional to the number of selected supplementary search results.
  14. 根据权利要求1-13之一所述的方法,其中,搜索结果集中的搜索结果的数目和补充搜索结果子集中的补充搜索结果的数目相同。The method according to any one of claims 1-13, wherein the number of search results in the search result set and the number of supplementary search results in the supplementary search result subset are the same.
  15. 根据权利要求13所述的方法,其中,所述收敛条件包括:The method according to claim 13, wherein the convergence conditions include:
    目标关联度区间对应的搜索结果子集和补充搜索结果子集的准确度的差值小于预设的差值阈值。The difference between the accuracy of the search result subset and the supplementary search result subset corresponding to the target relevance degree interval is less than a preset difference threshold.
  16. 根据权利要求1所述的方法,其中,所述收敛条件包括:The method according to claim 1, wherein the convergence conditions include:
    搜索结果集和补充搜索结果集的差集所包含的搜索结果的数目小于预设的数目差阈值。The number of search results included in the difference between the search result set and the supplementary search result set is less than a preset number difference threshold.
  17. 一种用于搜索信息的装置,包括:A device for searching information includes:
    搜索单元,被配置成利用获取的搜索信息在目标信息库中搜索,得到搜索结果集;A search unit configured to search the target information base using the obtained search information to obtain a search result set;
    选取单元,被配置成从搜索结果集中选取搜索结果子集,以及将搜索结果子集中属于目标搜索结果的搜索结果确定为补充搜索信息;A selecting unit configured to select a search result subset from the search result set, and determine search results belonging to the target search result in the search result subset as supplementary search information;
    补充搜索单元,被配置成执行如下搜索步骤:将补充搜索信息在目标信息库中搜索的搜索结果与搜索结果集的并集确定为补充搜索结果集;确定补充搜索结果集是否满足预设的收敛条件;The supplementary search unit is configured to perform the following search steps: determining the union of the search result and the search result set searched by the supplementary search information in the target information base as the supplementary search result set; determining whether the supplementary search result set satisfies a preset convergence condition;
    确定单元,被配置成响应于确定补充搜索结果集不满足预设的收敛条件,从补充搜索结果集中选取补充搜索结果子集,以及将补充搜索结果子集中属于目标搜索结果的补充搜索结果确定为补充搜索信息,将补充搜索结果集确定为搜索结果集,继续执行所述搜索步骤。A determining unit configured to, in response to determining that the supplementary search result set does not satisfy a preset convergence condition, select a supplementary search result subset from the supplementary search result set, and determine the supplementary search result belonging to the target search result in the supplementary search result subset as Supplement the search information, determine the supplementary search result set as the search result set, and continue to perform the search step.
  18. 根据权利要求17所述的装置,其中,所述补充搜索单元进一步被配置成:The apparatus according to claim 17, wherein the supplementary search unit is further configured to:
    响应于确定补充搜索结果集满足所述收敛条件,从补充搜索结果集中确定目标搜索结果集。In response to determining that the supplementary search result set satisfies the convergence condition, a target search result set is determined from the supplementary search result set.
  19. 根据权利要求17所述的装置,其中,所述选取单元进一步被配置成:The apparatus according to claim 17, wherein the selecting unit is further configured to:
    获取搜索结果子集中的搜索结果的标注信息,其中,标注信息用于表示搜索结果是否是目标搜索结果;Obtain annotation information of search results in a search result subset, where the annotation information is used to indicate whether the search result is a target search result;
    根据标注信息,确定搜索结果子集中属于目标搜索结果的搜索结果。According to the labeled information, the search results in the search result subset that belong to the target search result are determined.
  20. 根据权利要求18所述的装置,其中,所述装置还包括:The apparatus according to claim 18, wherein the apparatus further comprises:
    关联度确定单元,被配置成确定搜索结果集中的搜索结果的关联度,其中,关联度用于表示搜索结果与搜索信息的关联程度。The relevance degree determination unit is configured to determine a relevance degree of the search results in the search result set, where the relevance degree is used to indicate a relevance degree of the search results and the search information.
  21. 根据权利要求20所述的装置,其中,所述关联度确定单元进一步被配置成:The apparatus according to claim 20, wherein the relevance determination unit is further configured to:
    确定该搜索结果与搜索信息的相似度;Determine the similarity between the search results and the search information;
    根据相似度,确定该搜索结果的关联度,其中,相似度与关联度成正比。According to the similarity, the relevance of the search result is determined, wherein the similarity is directly proportional to the relevance.
  22. 根据权利要求20所述的装置,其中,所述关联度确定单元进一步被配置成:The apparatus according to claim 20, wherein the relevance determination unit is further configured to:
    针对补充搜索结果集中的补充搜索结果,响应于该补充搜索结果存在于搜索结果集中,按照预设的关联度提升算法,更新该补充搜索结果的关联度,以及将更新后的关联度作为该补充搜索结果的关联度;响应于该补充搜索结果不存在于搜索结果集中,确定该补充搜索结果的关联度。For the supplementary search results in the supplementary search result set, in response to the supplementary search results existing in the search result set, according to a preset correlation degree improvement algorithm, update the correlation degree of the supplementary search results, and use the updated correlation degree as the supplement The relevance of the search results; in response to the supplementary search results not being present in the search result set, the relevance of the supplementary search results is determined.
  23. 根据权利要求22所述的装置,其中,所述关联度确定单元进一步被配置成:The apparatus according to claim 22, wherein the relevance determination unit is further configured to:
    确定该补充搜索结果与搜索信息的相似度;Determine the similarity between the supplementary search results and the search information;
    根据相似度,确定该补充搜索结果的关联度,其中,相似度与关联度成正比。The relevance degree of the supplementary search result is determined according to the similarity degree, wherein the similarity degree is directly proportional to the relevance degree.
  24. 根据权利要求22所述的装置,其中,所述补充搜索单元,进一步被配置成:The apparatus according to claim 22, wherein the supplementary search unit is further configured to:
    按照关联度从大到小的顺序,从补充搜索结果集中选取目标数目个补充搜索结果作为目标搜索结果,得到目标搜索结果集。According to the order of relevance, from the supplementary search result set, the target number of supplementary search results are selected as the target search result, and the target search result set is obtained.
  25. 根据权利要求20所述的装置,其中,所述装置还包括划分单元,被配置成:The apparatus according to claim 20, wherein the apparatus further comprises a dividing unit configured to:
    按照预设的目标数目个关联度区间,将搜索结果集划分为对应的目标数目个搜索结果子集;以及Dividing the search result set into a corresponding number of target search result subsets according to a preset number of target relevance intervals; and
    所述选取单元,进一步被配置成:The selecting unit is further configured to:
    从目标数目个搜索结果子集中分别选取搜索结果,得到搜索结果子集。The search results are selected from the target number of search result subsets to obtain the search result subsets.
  26. 根据权利要求25所述的装置,其中,所述装置还包括:The apparatus according to claim 25, wherein the apparatus further comprises:
    准确度确定单元,被配置成确定目标数目个搜索结果子集中的搜索结果子集的准确度,其中,准确度用于表示搜索结果子集中目标搜索结果所占的比例。The accuracy determination unit is configured to determine the accuracy of the search result subset in the search result subset of the target number, where the accuracy is used to represent the proportion of the target search results in the search result subset.
  27. 根据权利要求26所述的装置,其中,所述划分单元进一步被配置成:The apparatus according to claim 26, wherein the dividing unit is further configured to:
    按照所述关联度区间,将补充搜索结果集划分为对应的目标数目个补充搜索结果子集;以及Dividing the supplementary search result set into corresponding target number supplementary search result subsets according to the correlation degree interval; and
    所述确定单元进一步被配置成:The determining unit is further configured to:
    从目标数目个补充搜索结果子集中分别选取补充搜索结果,得到补充搜索结果子集。The supplementary search results are selected from the target number of supplementary search result subsets to obtain the supplementary search result subsets.
  28. 根据权利要求27所述的装置,其中,所述准确度确定单元进一步被配置成:The apparatus according to claim 27, wherein the accuracy determination unit is further configured to:
    确定目标数目个补充搜索结果集中的补充搜索结果子集的准确 度,其中,准确度用于表示补充搜索结果子集中的目标搜索结果所占的比例。The accuracy of the supplementary search result subset of the target number of supplementary search result sets is determined, where the accuracy is used to represent the proportion of target search results in the supplementary search result subset.
  29. 根据权利要求28所述的装置,其中,所述确定单元进一步被配置成:The apparatus according to claim 28, wherein the determination unit is further configured to:
    针对目标数目个补充搜索结果子集中的补充搜索子集,确定该补充搜索结果子集所在的关联度区间对应的搜索结果集的准确度与预设的准确度阈值之间的差值的绝对值;根据所述绝对值和该补充搜索结果子集包含的补充搜索结果的数目,确定从该补充搜索结果子集中选取的补充搜索结果的数目,其中,所述绝对值和选取的补充搜索结果的数目成反比,包含的补充搜索结果的数目和选取的补充搜索结果的数目成正比。For the supplementary search subset of the target number of supplementary search result subsets, determine the absolute value of the difference between the accuracy of the search result set corresponding to the correlation interval in which the supplementary search result subset is located and the preset accuracy threshold. Determining the number of supplementary search results selected from the supplementary search result subset according to the absolute value and the number of supplementary search results included in the supplementary search result subset, wherein the absolute value and the number of supplementary search results selected The number is inversely proportional, and the number of supplementary search results included is proportional to the number of selected supplementary search results.
  30. 根据权利要求17-29之一所述的装置,其中,搜索结果集中的搜索结果的数目和补充搜索结果子集中的补充搜索结果的数目相同。The apparatus according to any one of claims 17 to 29, wherein the number of search results in the search result set and the number of supplementary search results in the supplementary search result subset are the same.
  31. 根据权利要求29所述的装置,其中,所述收敛条件包括:The apparatus according to claim 29, wherein the convergence conditions include:
    目标关联度区间对应的搜索结果子集和补充搜索结果子集的准确度的差值小于预设的差值阈值。The difference between the accuracy of the search result subset and the supplementary search result subset corresponding to the target relevance degree interval is less than a preset difference threshold.
  32. 根据权利要求17所述的装置,其中,所述收敛条件包括:The apparatus according to claim 17, wherein the convergence conditions include:
    搜索结果集和补充搜索结果集的差集所包含的搜索结果的数目小于预设的数目差阈值。The number of search results included in the difference between the search result set and the supplementary search result set is less than a preset number difference threshold.
  33. 一种电子设备,包括:An electronic device includes:
    一个或多个处理器;One or more processors;
    存储装置,其上存储有一个或多个程序;A storage device on which one or more programs are stored;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-16中任一所述的方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any one of claims 1-16.
  34. 一种计算机可读介质,其上存储有计算机程序,其中,该程序被处理器执行时实现如权利要求1-16中任一所述的方法。A computer-readable medium having stored thereon a computer program, wherein the program, when executed by a processor, implements the method of any one of claims 1-16.
PCT/CN2018/116342 2018-09-12 2018-11-20 Information search method and device WO2020052067A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811060981.3 2018-09-12
CN201811060981.3A CN109308299B (en) 2018-09-12 2018-09-12 Method and apparatus for searching information

Publications (1)

Publication Number Publication Date
WO2020052067A1 true WO2020052067A1 (en) 2020-03-19

Family

ID=65225022

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/116342 WO2020052067A1 (en) 2018-09-12 2018-11-20 Information search method and device

Country Status (2)

Country Link
CN (1) CN109308299B (en)
WO (1) WO2020052067A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628283A (en) * 2019-06-04 2023-08-22 苏州智贸捷通科技有限公司 Manual data verification method based on big data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1609859A (en) * 2004-11-26 2005-04-27 孙斌 Search result clustering method
CN102999556A (en) * 2012-10-15 2013-03-27 百度在线网络技术(北京)有限公司 Text searching method and text searching device and terminal equipment
CN103136213A (en) * 2011-11-23 2013-06-05 阿里巴巴集团控股有限公司 Method and device for providing related words
CN106156109A (en) * 2015-04-03 2016-11-23 阿里巴巴集团控股有限公司 A kind of searching method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102207968B (en) * 2011-06-08 2013-11-20 北京百度网讯科技有限公司 Search result correlation judgment-based search method and device
CN102955837A (en) * 2011-12-13 2013-03-06 华东师范大学 Analogy retrieval control method based on Chinese word pair relationship similarity
CN105243060B (en) * 2014-05-30 2019-11-08 小米科技有限责任公司 A kind of method and device of retrieving image
CN104834693B (en) * 2015-04-21 2017-11-28 上海交通大学 Visual pattern search method and system based on deep search
CN105426529B (en) * 2015-12-15 2017-02-22 中南大学 Image retrieval method and system based on user search intention positioning
CN106649554A (en) * 2016-11-08 2017-05-10 北京奇虎科技有限公司 Application program search method, device, server and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1609859A (en) * 2004-11-26 2005-04-27 孙斌 Search result clustering method
CN103136213A (en) * 2011-11-23 2013-06-05 阿里巴巴集团控股有限公司 Method and device for providing related words
CN102999556A (en) * 2012-10-15 2013-03-27 百度在线网络技术(北京)有限公司 Text searching method and text searching device and terminal equipment
CN106156109A (en) * 2015-04-03 2016-11-23 阿里巴巴集团控股有限公司 A kind of searching method and device

Also Published As

Publication number Publication date
CN109308299A (en) 2019-02-05
CN109308299B (en) 2020-01-14

Similar Documents

Publication Publication Date Title
WO2018192491A1 (en) Information pushing method and device
CN108846753B (en) Method and apparatus for processing data
US11403303B2 (en) Method and device for generating ranking model
US8468146B2 (en) System and method for creating search index on cloud database
US20200082814A1 (en) Method and apparatus for operating smart terminal
CN110263277B (en) Page data display method, page data updating device, page data equipment and storage medium
US11314451B2 (en) Method and apparatus for storing data
CN109858045B (en) Machine translation method and device
US20210042470A1 (en) Method and device for separating words
CN108933695B (en) Method and apparatus for processing information
CN111435376A (en) Information processing method and system, computer system, and computer-readable storage medium
CN111427971A (en) Business modeling method, device, system and medium for computer system
WO2021203918A1 (en) Method for processing model parameters, and apparatus
US20200327140A1 (en) Systems and methods for access to multi-tenant heterogeneous databases
CN111104479A (en) Data labeling method and device
WO2024036662A1 (en) Parallel graph rule mining method and apparatus based on data sampling
WO2020199659A1 (en) Method and apparatus for determining push priority information
WO2020119173A1 (en) Information pushing method and apparatus
CN112182255A (en) Method and apparatus for storing media files and for retrieving media files
WO2020052067A1 (en) Information search method and device
WO2023130960A1 (en) Service resource determination method and apparatus, and service resource determination system
CN109376220B (en) Method and device for acquiring information
WO2023040612A1 (en) Order processing method and apparatus
CN111488386A (en) Data query method and device
US9201937B2 (en) Rapid provisioning of information for business analytics

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18933225

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 15/06/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18933225

Country of ref document: EP

Kind code of ref document: A1