CN109308299B - Method and apparatus for searching information - Google Patents

Method and apparatus for searching information Download PDF

Info

Publication number
CN109308299B
CN109308299B CN201811060981.3A CN201811060981A CN109308299B CN 109308299 B CN109308299 B CN 109308299B CN 201811060981 A CN201811060981 A CN 201811060981A CN 109308299 B CN109308299 B CN 109308299B
Authority
CN
China
Prior art keywords
search result
search
supplementary
search results
subset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811060981.3A
Other languages
Chinese (zh)
Other versions
CN109308299A (en
Inventor
何轶
李磊
宗显子
汤颢
郑光果
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Douyin Vision Co Ltd
Douyin Vision Beijing Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201811060981.3A priority Critical patent/CN109308299B/en
Priority to PCT/CN2018/116342 priority patent/WO2020052067A1/en
Publication of CN109308299A publication Critical patent/CN109308299A/en
Application granted granted Critical
Publication of CN109308299B publication Critical patent/CN109308299B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation

Abstract

The embodiment of the application discloses a method and a device for searching information. One embodiment of the method comprises: searching in a target information base by utilizing the search information to obtain a search result set; selecting a search result subset from the search result set, and determining search results which are target search results in the search result subset as supplementary search information; the following search steps are performed: determining the union of the search result searched by the supplementary search information in the target information base and the search result set as a supplementary search result set; determining whether the supplementary search result set meets a preset convergence condition; and in response to determining that the supplementary search result set does not meet the preset convergence condition, selecting a supplementary search result subset from the supplementary search result set, determining supplementary search results belonging to the target search results in the supplementary search result subset as supplementary search information, determining the supplementary search result set as a search result set, and continuing to execute the searching step. This embodiment enables an increase in the number of search results.

Description

Method and apparatus for searching information
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a method and a device for searching information.
Background
In the search process, a search result corresponding to the input search information is generally directly selected according to a search algorithm. In this way, information obtained after multiple adjustments are made to the search information may not be directly searched, and thus, the obtained search result may not be comprehensive.
Disclosure of Invention
The embodiment of the application provides a method and a device for searching information.
In a first aspect, an embodiment of the present application provides a method for searching information, where the method includes: searching in a target information base by using the acquired search information to obtain a search result set; selecting a search result subset from the search result set, and determining search results belonging to the target search result in the search result subset as supplementary search information; the following search steps are performed: determining the union of the search result searched by the supplementary search information in the target information base and the search result set as a supplementary search result set; determining whether the supplementary search result set meets a preset convergence condition; and in response to determining that the supplementary search result set does not meet the preset convergence condition, selecting a supplementary search result subset from the supplementary search result set, determining supplementary search results belonging to the target search results in the supplementary search result subset as supplementary search information, determining the supplementary search result set as a search result set, and continuing to execute the searching steps.
In some embodiments, the searching step further comprises: in response to determining that the supplemental search result set satisfies the convergence condition, a target search result set is determined from the supplemental search result set.
In some embodiments, determining search results in the subset of search results that belong to the target search result as supplemental search information comprises: acquiring label information of the search results in the search result subset, wherein the label information is used for indicating whether the search results are target search results; and determining the search results belonging to the target search result in the search result subset according to the labeling information.
In some embodiments, after searching in the target information base by using the search information to obtain the search result set, the method further includes: and determining the relevance of the search results in the search result set, wherein the relevance is used for expressing the relevance of the search results and the search information.
In some embodiments, for a search result in the search result set, the relevancy of the search result is determined by: determining the similarity between the search result and the search information; and determining the relevance of the search result according to the similarity, wherein the similarity is in direct proportion to the relevance.
In some embodiments, after determining the union of the search result of the supplemental search information in the target information repository and the search result set as the supplemental search result set, further comprising: aiming at the supplementary search results in the supplementary search result set, responding to the fact that the supplementary search results exist in the search result set, updating the relevance of the supplementary search results according to a preset relevance promoting algorithm, and taking the updated relevance as the relevance of the supplementary search results; in response to the supplemental search result not being in the set of search results, a degree of relevancy of the supplemental search result is determined.
In some embodiments, the determining the relevancy of the supplemental search result comprises: determining similarity of the supplemental search results to the search information; and determining the relevance of the supplementary search result according to the similarity, wherein the similarity is in direct proportion to the relevance.
In some embodiments, the determining a target search result set from the supplemental search result set comprises: and selecting a target number of supplementary search results from the supplementary search result set as target search results according to the sequence of the relevance degrees from large to small to obtain a target search result set.
In some embodiments, the method further comprises: dividing the search result set into corresponding target number search result subsets according to a preset target number relevance degree interval; and selecting a subset of search results from the set of search results, comprising: and respectively selecting the search results from the target number of search result subsets to obtain the search result subsets.
In some embodiments, the method further comprises: and determining the accuracy of the search result subsets in the target number of search result subsets, wherein the accuracy is used for expressing the proportion of the target search results in the search result subsets.
In some embodiments, after determining the union of the search result of the supplemental search information in the target information repository and the search result set as the supplemental search result set, further comprising: dividing the supplementary search result set into a corresponding target number of supplementary search result subsets according to the relevance interval; and selecting a subset of supplemental search results from the set of supplemental search results, comprising: and respectively selecting supplementary search results from the target number of supplementary search result subsets to obtain the supplementary search result subsets.
In some embodiments, the method further comprises: an accuracy of a subset of the supplemental search results from the target number of the subset of supplemental search results is determined, where the accuracy is indicative of a proportion of the target search results from the subset of supplemental search results.
In some embodiments, the selecting the supplementary search results from the target number of supplementary search result subsets respectively includes: determining the absolute value of the difference between the accuracy of the search result set corresponding to the association degree interval in which the supplementary search result subset is located and a preset accuracy threshold value aiming at the supplementary search subset in the target number of supplementary search result subsets; determining the number of the supplementary search results selected from the subset of supplementary search results according to the absolute value and the number of the supplementary search results included in the subset of supplementary search results, wherein the absolute value is inversely proportional to the number of the selected supplementary search results, and the number of the included supplementary search results is proportional to the number of the selected supplementary search results.
In some embodiments, the number of search results in the search result set and the number of supplemental search results in the subset of supplemental search results are the same.
In some embodiments, the convergence condition comprises: and the difference value of the accuracy of the search result subset and the accuracy of the supplementary search result subset corresponding to the target relevance interval is smaller than a preset difference value threshold value.
In some embodiments, the convergence condition comprises: the number of search results contained in the difference set of the search result set and the supplemental search result set is less than a preset number difference threshold.
In a second aspect, an embodiment of the present application provides an apparatus for searching for information, where the apparatus includes: the search unit is configured to search in the target information base by using the acquired search information to obtain a search result set; a selecting unit configured to select a subset of search results from the search result set and determine search results belonging to the target search result in the subset of search results as the supplemental search information; a supplementary search unit configured to perform the following search steps: determining the union of the search result searched by the supplementary search information in the target information base and the search result set as a supplementary search result set; determining whether the supplementary search result set meets a preset convergence condition; and the determining unit is configured to respond to the fact that the supplementary search result set does not meet the preset convergence condition, select a supplementary search result subset from the supplementary search result set, determine supplementary search results belonging to the target search results in the supplementary search result subset as supplementary search information, determine the supplementary search result set as a search result set, and continue to execute the searching steps.
In some embodiments, the supplemental search unit is further configured to: in response to determining that the supplemental search result set satisfies the convergence condition, a target search result set is determined from the supplemental search result set.
In some embodiments, the selecting unit is further configured to: acquiring label information of the search results in the search result subset, wherein the label information is used for indicating whether the search results are target search results; and determining the search results belonging to the target search result in the search result subset according to the labeling information.
In some embodiments, the apparatus further comprises: and the association degree determining unit is configured to determine the association degree of the search results in the search result set, wherein the association degree is used for expressing the association degree of the search results and the search information.
In some embodiments, the association degree determining unit is further configured to: determining the similarity between the search result and the search information; and determining the relevance of the search result according to the similarity, wherein the similarity is in direct proportion to the relevance.
In some embodiments, the association degree determining unit is further configured to: aiming at the supplementary search results in the supplementary search result set, responding to the fact that the supplementary search results exist in the search result set, updating the relevance of the supplementary search results according to a preset relevance promoting algorithm, and taking the updated relevance as the relevance of the supplementary search results; in response to the supplemental search result not being in the set of search results, a degree of relevancy of the supplemental search result is determined.
In some embodiments, the association degree determining unit is further configured to: determining similarity of the supplemental search results to the search information; and determining the relevance of the supplementary search result according to the similarity, wherein the similarity is in direct proportion to the relevance.
In some embodiments, the supplemental search unit is further configured to: and selecting a target number of supplementary search results from the supplementary search result set as target search results according to the sequence of the relevance degrees from large to small to obtain a target search result set.
In some embodiments, the apparatus further comprises a dividing unit configured to: dividing the search result set into corresponding target number search result subsets according to a preset target number relevance degree interval; and the selecting unit is further configured to: and respectively selecting the search results from the target number of search result subsets to obtain the search result subsets.
In some embodiments, the apparatus further comprises: and the accuracy determining unit is configured to determine the accuracy of the search result subsets in the target number of search result subsets, wherein the accuracy is used for expressing the proportion of the target search results in the search result subsets.
In some embodiments, the dividing unit is further configured to: dividing the supplementary search result set into a corresponding target number of supplementary search result subsets according to the relevance interval; and the determining unit is further configured to: and respectively selecting supplementary search results from the target number of supplementary search result subsets to obtain the supplementary search result subsets.
In some embodiments, the accuracy determination unit is further configured to: an accuracy of a subset of the supplemental search results from the target number of the subset of supplemental search results is determined, where the accuracy is indicative of a proportion of the target search results from the subset of supplemental search results.
In some embodiments, the determining unit is further configured to: determining the absolute value of the difference between the accuracy of the search result set corresponding to the association degree interval in which the supplementary search result subset is located and a preset accuracy threshold value aiming at the supplementary search subset in the target number of supplementary search result subsets; determining the number of the supplementary search results selected from the subset of supplementary search results according to the absolute value and the number of the supplementary search results included in the subset of supplementary search results, wherein the absolute value is inversely proportional to the number of the selected supplementary search results, and the number of the included supplementary search results is proportional to the number of the selected supplementary search results.
In some embodiments, the number of search results in the search result set and the number of supplemental search results in the subset of supplemental search results are the same.
In some embodiments, the convergence condition comprises: and the difference value of the accuracy of the search result subset and the accuracy of the supplementary search result subset corresponding to the target relevance interval is smaller than a preset difference value threshold value.
In some embodiments, the convergence condition comprises: the number of search results contained in the difference set of the search result set and the supplemental search result set is less than a preset number difference threshold.
In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; storage means for storing one or more programs; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.
In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which, when executed by a processor, implements the method as described in any implementation manner of the first aspect.
According to the method and the device for searching information, after the search result is directly obtained, part of the search result can be sampled, the search result belonging to the target search result is selected from the search results, searching is carried out again, and then whether sampling is continued or not is judged according to the searched search result again, so that the number of the search results is increased, and the coverage rate of the search results is improved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;
FIG. 2 is a flow diagram for one embodiment of a method for searching information according to the present application;
FIG. 3 is a flow diagram of yet another embodiment of a method for searching information according to the present application;
FIG. 4 is a flow diagram of yet another embodiment of a method for searching information according to the present application;
FIG. 5 is a schematic diagram of an application scenario of a method for searching information according to an embodiment of the present application;
FIG. 6 is a schematic block diagram illustrating one embodiment of an apparatus for searching information according to the present application;
FIG. 7 is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows an exemplary architecture 100 to which embodiments of the method for searching information or the apparatus for searching information of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The terminal devices 101, 102, 103 interact with a server 105 via a network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various client applications installed thereon, such as a web browser application, a shopping-type application, a search-type application, an instant messaging tool, and the like.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices supporting data storage and data exchange, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server that provides various services, such as a search server that searches in a target information base based on search information transmitted by the terminal apparatuses 101, 102, 103 and returns search results to the terminal apparatuses 101, 102, 103.
Note that the search information and the target information base may be directly stored locally in the server 105, and the server 105 may directly extract the search information and perform a search in the target information base stored locally, in which case the terminal apparatuses 101, 102, and 103 and the network 104 may not be present.
It should be noted that the method for searching for information provided in the embodiment of the present application is generally performed by the server 105, and accordingly, the apparatus for searching for information is generally disposed in the server 105.
It should be noted that the terminal devices 101, 102, and 103 may also be installed with an information search application, in this case, the method for processing the image may also be executed by the terminal devices 101, 102, and 103, and accordingly, the apparatus for processing the image may also be provided in the terminal devices 101, 102, and 103. At this point, the exemplary system architecture 100 may not have the server 105 and the network 104.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 for one embodiment of a method for searching information in accordance with the present application is shown. The method for searching information includes the steps of:
step 201, searching in the target information base by using the acquired search information to obtain a search result set.
In this embodiment, the executing entity (e.g., server 105 in fig. 1) of the method for searching information may first obtain the search information from a local or other storage device, and then search in the target information base to obtain the search result set. The target information base may be an information base specified by a user in advance, a search base determined according to the search information and matched with the search information, or an information base set according to actual application requirements. The search algorithm used in the search process may be preset.
Step 202, a subset of search results is selected from the search result set, and the search results belonging to the target search result in the subset of search results are determined as the supplemental search information.
In this embodiment, the search result set may be sampled first, and at least one search result may be selected to obtain the search result subset. The proportion of the samples may be randomly set, may be preset by a user or a technician, or may be determined according to the size of the search result set (for example, ten percent of the number of search results included in the search result set is selected as the number of search result subsets).
In the present embodiment, the target search result may refer to a search result satisfying the search requirement of the user. Whether each search result is a target search result may be verified, for example, by calculating a similarity of each search result in the subset of search results to the search information.
Optionally, the label information of the search result in the search result subset may also be obtained first, where the label information is used to indicate whether the search result is the target search result. And then, determining the search results belonging to the target search result in the search result subset according to the labeling information.
Specifically, each search result in the search result subset may be verified manually, and the verification result may be input as the label information and then sent to the execution main body.
Step 203, the following search steps are performed:
step 2031, determining the union of the search result searched by the supplementary search information in the target information base and the search result set as the supplementary search result set.
In this embodiment, the supplementary search information may be first used to search in the target information base to obtain a search result. Because the supplementary search information is used, the search result of this time may include a new search result compared to the search result set. Therefore, the union of the search result of this time and the search result set can be determined as the supplementary search result set.
Step 2032, determine whether the supplemental search result set meets a preset convergence condition.
In the present embodiment, the convergence condition may be specifically set by a user or a technician according to an actual search requirement. For example, the convergence condition may be that the number of supplemental search result sets is greater than a preset threshold. For another example, the convergence condition may be that the number of search results included in the supplementary search result set and the difference set of the search result sets is smaller than a preset threshold.
And 204, in response to the fact that the supplementary search result set does not meet the preset convergence condition, selecting a supplementary search result subset from the supplementary search result set, determining supplementary search results belonging to the target search result in the supplementary search result subset as supplementary search information, determining the supplementary search result set as a search result set, and continuing to execute the search steps.
In this embodiment, when the supplementary search result set does not satisfy the preset convergence condition, according to a method similar to the method in the above step 202, sampling may be performed from the supplementary search result set to obtain a supplementary search result subset, then determining the supplementary search result belonging to the target search result in the supplementary search subset as supplementary search information, determining the supplementary search result set as a search result set, and continuing to perform the above step 203 to perform iterative search.
Optionally, the target search result set is determined from the supplemental search result set in response to determining that the supplemental search result set satisfies the convergence condition. For example, the supplemental search result set may be directly used as the target search result set.
The method for searching information provided by the above embodiment of the present application samples the search result set obtained by searching the search information in the target information base, and searches again in the target information base by using the search result belonging to the target search result in the sampled search result subset as the supplementary search information, so that a new search result can be supplemented. And then, whether the supplementary search result set obtained by adding the new search result to the search result set meets the convergence condition can be further judged, if not, the iterative search can be carried out in the target information base according to the method for sampling and selecting the supplementary search information, so that new search results are continuously added until all the obtained search results meet the convergence condition, the coverage rate of the search results is improved, and the condition that a plurality of search results are missed possibly caused by direct one-time search is avoided.
With further reference to fig. 3, a flow 300 of yet another embodiment of a method for searching information is shown. The process 300 of the method for searching information includes the following steps:
step 301, searching in the target information base by using the obtained search information to obtain a search result set.
The specific implementation process of this step can refer to the related description of step 201 in the corresponding embodiment of fig. 2, and is not repeated here.
Step 302, determining the relevancy of the search results in the search result set.
In this embodiment, the degree of association may be used to indicate the degree of association between the search result and the search information. Specifically, the association degree between each search result and the search information may be calculated according to a preset association degree calculation method, or the association degree between each search result and the search information may be manually marked and returned to the execution main body.
Optionally, for a search result in the search result set, the relevance of the search result may be determined by: the similarity between the search result and the search information can be determined by using some existing similarity calculation methods, and then the relevance of the search result is determined according to the similarity according to the proportional relation between the similarity and the relevance. For example, the similarity may be directly determined as the relevancy of the search result.
Step 303, selecting a subset of search results from the search result set, and determining search results belonging to the target search result in the subset of search results as supplemental search information.
The specific implementation process of this step can refer to the related description of step 202 in the corresponding embodiment of fig. 2, and is not repeated here.
Step 304, the following search steps are performed:
step 3041, determining a union of the search result searched by the supplementary search information in the target information base and the search result set as a supplementary search result set.
The specific execution process of step 3041 may refer to the related description of step 2031 in the corresponding embodiment of fig. 2, and is not described herein again.
Step 3042, for a supplemental search result in the supplemental search result set, determining a relevance of the supplemental search result by:
step 30421, in response to the supplementary search result existing in the search result set, updating the relevance of the supplementary search result according to a preset relevance boosting algorithm, and taking the updated relevance as the relevance of the supplementary search result.
In this embodiment, for a supplementary search result existing in the search result set, that is, the search result is searched again, the relevance of the supplementary search result can be improved. Specifically, a technician may preset a relevancy promotion algorithm for promoting relevancy, or preset a relevancy promotion value, and when a search result is searched again, add the set relevancy promotion value to the relevancy of the search result, and use the obtained new relevancy as the relevancy of the search result.
Step 30422, in response to the supplemental search result not being in the search result set, determining a relevance of the supplemental search result.
In this embodiment, the similarity between the supplementary search result and the search information may be determined first by using some existing similarity calculation methods, and then the relevance of the supplementary search result may be determined according to the similarity according to a relationship that the similarity is in direct proportion to the relevance. For example, the similarity may be directly determined as the relevancy of the supplemental search result.
For the fact that there is no supplementary search result in the search result set, that is, a newly added search result, the specific calculation process of the association degree may also refer to the related description of step 302, which is not described herein again.
Step 3043, it is determined whether the supplemental search result set satisfies a preset convergence condition.
The specific execution process of step 3043 may refer to the related description of step 2032 in the corresponding embodiment of fig. 2, and is not described herein again.
And 305, in response to the fact that the supplementary search result set does not meet the preset convergence condition, selecting a supplementary search result subset from the supplementary search result set, determining supplementary search results belonging to the target search result in the supplementary search result subset as supplementary search information, determining the supplementary search result set as a search result set, and continuing to execute the search steps.
The specific implementation process of this step can refer to the related description of step 204 in the corresponding embodiment of fig. 2, and is not repeated here.
Step 306, in response to determining that the supplementary search result set satisfies the convergence condition, selecting a target number of supplementary search results from the supplementary search result set as target search results according to the sequence of the relevance degrees from large to small, and obtaining a target search result set.
In this embodiment, the target number may be preset, or may be determined according to a number determination rule (for example, the target number is half of the number of the supplementary search results included in the supplementary search result set).
As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, the flow of the method for searching information in this embodiment may also determine the relevance degree corresponding to each search result for the search result after each search, and may dynamically update the relevance degree of the search result searched again in the iterative search process. Therefore, the target search result can be selected according to the relevance of each search result and the supplementary search result, and the accuracy of the search result is further improved.
With further reference to fig. 4, a flow 400 of yet another embodiment of a method for searching information is shown. The process 400 of the method for searching information includes the steps of:
step 401, searching in the target information base by using the obtained search information to obtain a search result set.
The specific implementation process of this step can refer to the related description of step 201 in the corresponding embodiment of fig. 2, and is not repeated here.
Step 402, determining the relevancy of the search results in the search result set.
The specific implementation process of this step can refer to the related description of step 302 in the corresponding embodiment of fig. 3, and is not described herein again.
Step 403, dividing the search result set into corresponding search result subsets with the number of targets according to the preset number of relevance intervals with the number of targets.
In the present embodiment, the target number may be preset according to the actual application requirement. Each association degree interval may be uniformly divided or non-uniformly divided. For example, for the case that the association degree is between 0 and 1 (including 0 and 1), the association degree may be uniformly divided into the following five association degree intervals: 0 to 0.2 (including 0, not including 0.2), 0.2 to 0.4 (including 0.2, not including 0.4), 0.4 to 0.6 (including 0.4, not including 0.6), 0.6 to 0.8 (including 0.6, not including 0.8) to 0.8-1 (including 0.8 and 1), and may also be non-uniformly divided into the following five relevance intervals: 0 to 0.5 (including 0, not including 1), 0.5 to 0.7 (including 0.5, not including 0.7), 0.7 to 0.8 (including 0.7, not including 0.8), 0.8 to 0.9 (including 0.8, not including 0.9), 0.9 to 1 (including 0.9 and 1).
Taking the above-mentioned five relevance intervals as an example, correspondingly, the search result with relevance between 0 and 0.2 may be determined as a search result subset, the search result with relevance between 0.2 and 0.4 may be determined as a search result subset, the search result with relevance between 0.4 and 0.6 may be determined as a search result subset, the search result with relevance between 0.6 and 0.8 may be determined as a search result subset, and the search result with relevance between 0.8 and 1 may be determined as a search result subset.
Step 404, selecting search results from the target number of search result subsets, respectively, to obtain search result subsets, and determining search results belonging to the target search results in the search result subsets as supplementary search information.
In this embodiment, a subset of search results may be sampled from each subset of search results and the sampled search results combined to produce a subset of search results. The number of search results sampled for each subset of search results may be arbitrarily specified, or may be determined according to a sampling number determination rule. For example, the sample number determination rule may sample each search result subset for a number of search results that is one tenth of the number of search results included in the respective search result subset. For another example, the number of samples may be determined according to the degree of association, for example, the closer the corresponding degree of association is to 0.5, the larger the number of corresponding samples.
Optionally, the accuracy of a subset of search results from the target number of subsets of search results may be further determined. Wherein accuracy may be used to represent a proportion of target search results in the subset of search results. For example, for a subset of search results that includes 30 search results, of which 20 belong to the target search result, the accuracy of the subset of search results is two thirds.
Step 405, the following search steps are performed:
step 4051, the union of the search result and the search result set searched by the supplemental search information in the target information base is determined as the supplemental search result set.
The specific process of step 4051 may refer to the related description of step 2031 in the corresponding embodiment of fig. 2, and is not described herein again.
Step 4052, for a supplementary search result in the supplementary search result set, determining a relevance of the supplementary search result as follows:
step 40521, in response to the supplementary search result existing in the search result set, updating the relevance of the supplementary search result according to a preset relevance boosting algorithm, and using the updated relevance as the relevance of the supplementary search result.
Step 40522, responsive to the supplemental search result not being in the search result set, determining a relevance of the supplemental search result.
The specific processes of the above steps 40521 and 40522 can refer to the related descriptions of steps 30421 and 30422 in the corresponding embodiment of fig. 3, and are not described herein again.
Step 4053, according to the association degree interval, dividing the supplementary search result set into a corresponding target number of supplementary search result subsets.
In this embodiment, after obtaining the supplementary search result set, since a new search result may be included and the relevance of the searched search result is improved, the supplementary search result may be divided again according to the relevance, and the specific division process may refer to the related description in step 403.
Step 4054, a determination is made as to whether the supplemental search result set satisfies a predetermined convergence condition.
The specific process of step 4054 may refer to the related description of step 2032 in the corresponding embodiment of fig. 2, and is not described herein again.
And step 406, in response to determining that the supplementary search result set does not satisfy the preset convergence condition, selecting supplementary search results from the target number of supplementary search result sets respectively to obtain a supplementary search result subset, determining the supplementary search results belonging to the target search results in the supplementary search result subset as supplementary search information, determining the supplementary search result set as a search result set, and continuing to execute the search steps.
In this embodiment, a subset of supplemental search results may be sampled from each subset of supplemental search results and combined. The number of the supplementary search results sampled for each supplementary search result subset may be arbitrarily specified, or may be determined according to a sampling number determination rule. For example, the sample number determination rule may sample, for each of the supplemental search result subsets, one tenth the number of supplemental search results that each of the supplemental search result subsets contains. For another example, the number of samples may be determined according to the degree of association, for example, the closer the corresponding degree of association is to 0.5, the larger the number of corresponding samples.
Optionally, for a supplementary search subset of the target number of supplementary search result subsets, an absolute value of a difference between an accuracy of a search result set corresponding to a relevance interval in which the supplementary search result subset is located and a preset accuracy threshold may be determined; then, determining the number of the supplementary search results selected from the subset of supplementary search results according to the determined absolute value and the number of the supplementary search results included in the subset of supplementary search results, wherein the determined absolute value is inversely proportional to the number of the selected supplementary search results, and the number of the included supplementary search results is proportional to the number of the selected supplementary search results.
Wherein the preset accuracy threshold may be preset by a technician. In practice, the minimum and maximum values of accuracy may be estimated in advance, and the accuracy threshold may be set according to the minimum and maximum values of accuracy estimated in advance. For example, if the accuracy is between 0 and 1, the accuracy threshold may be set to 0.5. At this point, the closer the corresponding accuracy is to the 0.5 subset of supplemental search results, the greater the number of corresponding samples; conversely, the closer the corresponding accuracy is to the complementary search result subsets of 0 and 1, the fewer the number of corresponding samples.
For example, if the two subsets of supplemental search results include 200 and 400 supplemental search results, respectively, then the number of samples of the subset of supplemental search results that includes 400 supplemental search results may be greater than the number of samples of the subset of supplemental search results that includes 200 supplemental search results.
Alternatively, the number of search results in the subset of search results and the number of search results in the subset of supplemental search results may be the same. I.e. the total number of samples after each search may be fixed.
Optionally, an accuracy of the subset of the supplemental search results from the target number of the subset of supplemental search results may be further determined, where the accuracy may be used to represent a proportion of the target search results from the subset of supplemental search results.
Optionally, the convergence condition may be that a difference between accuracies of the search result subset and the supplementary search result subset corresponding to the target relevance interval is smaller than a preset difference threshold.
Optionally, the convergence condition may also be that the number of search results included in the difference set of the search result set and the supplementary search result set is smaller than a preset number difference threshold.
Step 407, in response to determining that the supplementary search result set satisfies the convergence condition, selecting a target number of supplementary search results from the supplementary search result set as target search results according to the sequence of the relevance degrees from large to small, and obtaining a target search result set.
The specific implementation process of this step can refer to the related description of step 306 in the corresponding embodiment of fig. 3, and is not described herein again.
With continued reference to fig. 5, fig. 5 is a schematic diagram of an application scenario of the method for searching for information according to the present embodiment. In the application scenario of fig. 5, a search may be performed in the image library 502 by using the image 501 to obtain a search result set 503, where the search result set 503 includes 100 images. Thereafter, the similarity between each image in the search result set 503 and the image 501 may be calculated, and the similarity may be used as the association degree of each image. The set of search results may then be divided into three subsets (as shown by reference numeral 504) according to the relevance. The relevance of the images in the first subset is between 0 and 0.5, and the relevance of the images in the second subset is between 0.5 and 0.8, and the relevance of the images in the third subset is between 0.8 and 1, and the relevance of the images in the first subset is between 30 images, and the relevance of the images in the second subset is between 0.5 and 0.8, and the relevance of the images in the third subset is between 0.8 and 1, and the relevance of the images in the third subset is 20 images.
Then, sampling may be performed in the three subsets, and the target images included in the sampling are combined to obtain the supplemental search image 506. Specifically, 10 images may be sampled for the first subset, 10 images may be sampled for the second subset, and 3 images may be sampled for the third subset. Specifically, as shown by reference numeral 505 in the figure, the first sub-sampled image includes 5 target images, the second sub-sampled image includes 8 target images, and the third sub-sampled image includes 3 target images, so that 16 target images can be obtained as the supplementary search image 506. Further, it is also possible to obtain an accuracy of one half for the 10 images sampled by the first subset, an accuracy of four fifths for the second subset, and an accuracy of one for the third subset.
Then, the supplementary search image 506 may be searched in the image library 502 to obtain the current search result set 507, where the current search result set 507 includes 120 images. The union of the last search result set 503 and the supplemental search result set of the present search, which contains 150 images, may further be used as the supplemental search result set 508.
It can be seen that, in the search process, 30 newly searched images and 90 newly searched images are searched again. The relevance of 90 searched images again can be improved, and the similarity between each of the images and the image 501 can be used as the relevance of the images for 30 newly searched images. The 150 images contained in the supplemental search result set are then regrouped by relevance, resulting in new three subsets (as shown at 509).
The first subset with a newly derived relevance between 0 and 0.5 contains 45 images, the second subset with a newly derived relevance between 0.5 and 0.8 contains 80 images, and the third subset with a newly derived relevance between 0.8 and 1 contains 25 images.
Then, the newly obtained three subsets may be sampled again according to the accuracy of the sampling of the subsets corresponding to the three association degree intervals obtained by the previous calculation. The accuracy of the last three sub-set samples is one-half, four-fifths and one, respectively, so that in the process of this sampling, the number of samples in each sub-set can be determined according to a preset accuracy threshold (e.g. 0.5).
For the first subset obtained newly, some more images may be sampled in the subset since the absolute value of the difference between the accuracy of the subset and the accuracy threshold is 0. For the newly derived second subset and the newly derived third subset, some less images may be sampled in the two subsets due to the larger absolute value of the difference between the accuracy and the accuracy threshold of the two subsets. In particular, the number of samples for the newly derived second subset may be greater than the number of samples for the newly derived third subset, since the absolute value of the difference between the accuracy of the newly derived second subset and the accuracy threshold is less than the absolute value of the difference between the accuracy of the newly derived third subset and the accuracy threshold.
Specifically, 12 images may be sampled in the first newly obtained subset, 10 images may be sampled in the second newly obtained subset, and 1 image may be sampled in the third newly obtained subset. Specifically, as shown by reference numeral 510, the newly obtained images sampled by the first subset and the second subset each include 8 target images, and the newly obtained image sampled by the third subset includes 1 target image. Further, the accuracy of the newly obtained first subset may be calculated to be two thirds, the accuracy of the newly obtained second subset four fifths, and the accuracy of the newly obtained third subset one.
Thereafter, the supplementary search result set 511 composed of a total of 17 newly sampled images is used as a supplementary search image, and the above-described search process is performed again using the supplementary search result set 508 as a search result set.
Specifically, a search is first performed in the image library 502 using the supplemental search result set 511, and the union of the obtained search results and the supplemental search result set 508 is used as a new supplemental search result set 512. The new supplemental search result set 512 contains 160 images. Similarly, the degree of association of the image searched again may be raised, and for the newly searched image, the degree of similarity between the newly searched image and the image 501 may be taken as the degree of association of the newly searched image. The new supplemental search result set 512 is then subdivided into three subsets according to the re-determined degrees of relevancy. Specifically, as shown by reference numeral 513 in the figure, the first subset of the new division includes 45 images, the second subset of the new division includes 87 images, and the third subset of the new division includes 28 images.
The accuracy of the three relevance intervals obtained by the last calculation is two thirds, four fifths and one respectively. The images may be sampled from the newly divided three subsets, and the accuracy of this sampling may be calculated, wherein some more images may be sampled for the first subset. The accuracy calculated after sampling of the three newly divided subsets is respectively one half, nine tenth and one. It can be seen that the accuracy of the newly divided second subset and the newly divided third subset changes less, and the newly searched images in the current new supplementary search result set 512 are fewer than those in the previous supplementary search result set 508, so that a total of 115 images in the current newly divided second subset and the newly divided third subset can be directly used as the target image set 514.
As can be seen from fig. 4, compared with the embodiments corresponding to fig. 2 and fig. 3, the process of the method for searching information in this embodiment highlights that after each search is completed and the relevancy of each search result is determined, the search results may also be divided into a plurality of corresponding subsets according to a preset relevancy interval, and the subsets are respectively sampled to perform the supplementary search, so that the supplementary search process may cover each relevancy interval, and further improve the coverage rate of the search results. On the basis, after each subset is respectively sampled, the accuracy corresponding to each subset can be further determined according to the sampling result, and the sampling number of the next relevance interval is determined according to the accuracy, so that targeted supplementary search is realized, and the accuracy and the search efficiency of the search result are further improved.
With further reference to fig. 6, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for searching for information, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 6, the apparatus 600 for searching information provided by the present embodiment includes a searching unit 601, a selecting unit 602, a supplementary searching unit 603, and a determining unit 604. The searching unit 601 is configured to search in the target information base by using the acquired search information to obtain a search result set; an extracting unit 602 configured to extract a subset of search results from the search result set, and determine search results belonging to the target search result in the subset of search results as the supplemental search information; a supplementary search unit 603 configured to perform the following search steps: determining the union of the search result searched by the supplementary search information in the target information base and the search result set as a supplementary search result set; determining whether the supplementary search result set meets a preset convergence condition; a determining unit 604 configured to select a subset of the supplemental search results from the set of supplemental search results in response to determining that the set of supplemental search results does not satisfy a preset convergence condition, and determine, as the supplemental search information, the supplemental search results belonging to the target search result from the subset of supplemental search results, determine the set of supplemental search results as the set of search results, and continue to perform the searching step.
In the present embodiment, in the apparatus 600 for searching information: the specific processing of the searching unit 601, the selecting unit 602, the supplementary searching unit 603, and the determining unit 604 and the technical effects thereof can refer to the related descriptions of step 201, step 202, step 203, and step 204 in the corresponding embodiment of fig. 2, respectively, and are not described herein again.
In some optional implementations of the present embodiment, the supplemental search unit 603 is further configured to: in response to determining that the supplemental search result set satisfies the convergence condition, a target search result set is determined from the supplemental search result set.
In some optional implementations of the present embodiment, the selecting unit 602 is further configured to: acquiring label information of the search results in the search result subset, wherein the label information is used for indicating whether the search results are target search results; and determining the search results belonging to the target search result in the search result subset according to the labeling information.
In some optional implementations of this embodiment, the apparatus 600 for searching for information further includes: the relevance determining unit (not shown in the figure) is configured to determine relevance of the search results in the search result set, wherein the relevance is used for indicating the relevance degree of the search results and the search information.
In some optional implementations of the present embodiment, the association degree determining unit is further configured to: determining the similarity between the search result and the search information; and determining the relevance of the search result according to the similarity, wherein the similarity is in direct proportion to the relevance.
In some optional implementations of the present embodiment, the association degree determining unit is further configured to: aiming at the supplementary search results in the supplementary search result set, responding to the fact that the supplementary search results exist in the search result set, updating the relevance of the supplementary search results according to a preset relevance promoting algorithm, and taking the updated relevance as the relevance of the supplementary search results; in response to the supplemental search result not being in the set of search results, a degree of relevancy of the supplemental search result is determined.
In some optional implementations of the present embodiment, the association degree determining unit is further configured to: determining similarity of the supplemental search results to the search information; and determining the relevance of the supplementary search result according to the similarity, wherein the similarity is in direct proportion to the relevance.
In some optional implementations of the present embodiment, the supplemental search unit 603 is further configured to: and selecting a target number of supplementary search results from the supplementary search result set as target search results according to the sequence of the relevance degrees from large to small to obtain a target search result set.
In some optional implementations of the present embodiment, the apparatus 600 for searching for information further includes a dividing unit (not shown in the figure) configured to: dividing the search result set into corresponding target number search result subsets according to a preset target number relevance degree interval; and the selecting unit is further configured to: and respectively selecting the search results from the target number of search result subsets to obtain the search result subsets.
In some optional implementations of this embodiment, the apparatus 600 for searching for information further includes: the accuracy determining unit (not shown in the figure) is configured to determine the accuracy of a subset of search results from the target number of subsets of search results, wherein the accuracy is used to represent the proportion of the target search results in the subset of search results.
In some optional implementations of this embodiment, the dividing unit is further configured to: dividing the supplementary search result set into a corresponding target number of supplementary search result subsets according to the relevance interval; and the determining unit is further configured to: and respectively selecting supplementary search results from the target number of supplementary search result subsets to obtain the supplementary search result subsets.
In some optional implementations of the present embodiment, the accuracy determining unit is further configured to: an accuracy of a subset of the supplemental search results from the target number of the subset of supplemental search results is determined, where the accuracy is indicative of a proportion of the target search results from the subset of supplemental search results.
In some optional implementations of this embodiment, the determining unit is further configured to: determining the absolute value of the difference between the accuracy of the search result set corresponding to the association degree interval in which the supplementary search result subset is located and a preset accuracy threshold value aiming at the supplementary search subset in the target number of supplementary search result subsets; determining the number of the supplementary search results selected from the subset of supplementary search results according to the absolute value and the number of the supplementary search results included in the subset of supplementary search results, wherein the absolute value is inversely proportional to the number of the selected supplementary search results, and the number of the included supplementary search results is proportional to the number of the selected supplementary search results.
In some alternative implementations of the present embodiment, the number of search results in the search result set and the number of supplemental search results in the subset of supplemental search results are the same.
In some optional implementations of this embodiment, the convergence condition includes: and the difference value of the accuracy of the search result subset and the accuracy of the supplementary search result subset corresponding to the target relevance interval is smaller than a preset difference value threshold value.
In some optional implementations of this embodiment, the convergence condition includes: the number of search results contained in the difference set of the search result set and the supplemental search result set is less than a preset number difference threshold.
According to the device provided by the embodiment of the application, after the search results are directly obtained through the search unit, part of the search results can be sampled through the selection unit, the search results belonging to the target search result are selected from the search results, and the supplementary search unit searches again, and then the determination unit judges whether to continue sampling for searching according to the searched search results again, so that the number of the search results is increased, and the coverage rate of the search results is improved.
Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by a Central Processing Unit (CPU)701, performs the above-described functions defined in the method of the present application.
It should be noted that the computer readable medium of the present application can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a search unit, a selection unit, a supplemental search unit, and a determination unit. The names of these units do not form a limitation on the units themselves in some cases, and for example, the search unit may also be described as "a unit that searches in the target information library using the acquired search information to obtain a search result set".
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: searching in a target information base by using the acquired search information to obtain a search result set; selecting a search result subset from the search result set, and determining search results belonging to the target search result in the search result subset as supplementary search information; the following search steps are performed: determining the union of the search result searched by the supplementary search information in the target information base and the search result set as a supplementary search result set; determining whether the supplementary search result set meets a preset convergence condition; and in response to determining that the supplementary search result set does not meet the preset convergence condition, selecting a supplementary search result subset from the supplementary search result set, determining supplementary search results belonging to the target search results in the supplementary search result subset as supplementary search information, determining the supplementary search result set as a search result set, and continuing to execute the searching steps.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (30)

1. A method for searching information, comprising:
searching in a target information base by using the acquired search information to obtain a search result set;
selecting a search result subset from the search result set, and determining search results belonging to the target search result in the search result subset as supplementary search information;
the following search steps are performed: determining the union of the search result searched by the supplementary search information in the target information base and the search result set as a supplementary search result set; determining whether the supplementary search result set meets a preset convergence condition;
in response to determining that the supplementary search result set does not satisfy the preset convergence condition, selecting a supplementary search result subset from the supplementary search result set, determining supplementary search results belonging to the target search result in the supplementary search result subset as supplementary search information, determining the supplementary search result set as a search result set, and continuing to execute the searching step;
after the searching in the target information base by using the search information to obtain the search result set, the method further includes: determining the relevance of the search results in the search result set, wherein the relevance is used for expressing the relevance of the search results and the search information;
after the determining that the union of the search result of the supplementary search information in the target information base and the search result set is the supplementary search result set, the method further comprises the following steps: aiming at the supplementary search results in the supplementary search result set, responding to the fact that the supplementary search results exist in the search result set, updating the relevance of the supplementary search results according to a preset relevance promoting algorithm, and taking the updated relevance as the relevance of the supplementary search results; in response to the supplemental search result not being in the set of search results, a degree of relevancy of the supplemental search result is determined.
2. The method of claim 1, wherein the searching step further comprises:
in response to determining that the supplemental search result set satisfies the convergence criterion, a target search result set is determined from the supplemental search result set.
3. The method of claim 1, wherein the determining search results in the subset of search results that belong to the target search result as supplemental search information comprises:
acquiring label information of the search results in the search result subset, wherein the label information is used for indicating whether the search results are target search results;
and determining the search results belonging to the target search result in the search result subset according to the labeling information.
4. The method of claim 1, wherein for a search result in the set of search results, the relevance of the search result is determined by:
determining the similarity between the search result and the search information;
and determining the relevance of the search result according to the similarity, wherein the similarity is in direct proportion to the relevance.
5. The method of claim 1, wherein said determining the relevancy of the supplemental search result comprises:
determining similarity of the supplemental search results to the search information;
and determining the relevance of the supplementary search result according to the similarity, wherein the similarity is in direct proportion to the relevance.
6. The method of claim 1, wherein the determining a target search result set from a supplemental search result set comprises:
and selecting a target number of supplementary search results from the supplementary search result set as target search results according to the sequence of the relevance degrees from large to small to obtain a target search result set.
7. The method of claim 1, wherein the method further comprises:
dividing the search result set into corresponding target number search result subsets according to a preset target number relevance degree interval; and
the selecting a subset of search results from the set of search results comprises:
and respectively selecting the search results from the target number of search result subsets to obtain the search result subsets.
8. The method of claim 7, wherein the method further comprises:
and determining the accuracy of the search result subsets in the target number of search result subsets, wherein the accuracy is used for expressing the proportion of the target search results in the search result subsets.
9. The method of claim 8, wherein after determining the union of the search result of the supplemental search information in the target information repository and the set of search results as the supplemental set of search results, further comprising:
dividing the supplementary search result set into a corresponding target number of supplementary search result subsets according to the relevance interval; and
selecting a subset of supplemental search results from the set of supplemental search results, comprising:
and respectively selecting supplementary search results from the target number of supplementary search result subsets to obtain the supplementary search result subsets.
10. The method of claim 9, wherein the method further comprises:
an accuracy of a subset of the supplemental search results from the target number of the subset of supplemental search results is determined, where the accuracy is indicative of a proportion of the target search results from the subset of supplemental search results.
11. The method of claim 10, wherein said selecting supplemental search results from a subset of a target number of supplemental search results, respectively, comprises:
determining the absolute value of the difference between the accuracy of the search result set corresponding to the association degree interval in which the supplementary search result subset is located and a preset accuracy threshold value aiming at the supplementary search subset in the target number of supplementary search result subsets; and determining the number of the supplementary search results selected from the supplementary search result subset according to the absolute value and the number of the supplementary search results contained in the supplementary search result subset, wherein the absolute value is inversely proportional to the number of the selected supplementary search results, and the number of the contained supplementary search results is proportional to the number of the selected supplementary search results.
12. The method of any of claims 1-11, wherein a number of search results in the search result set and a number of supplemental search results in the subset of supplemental search results are the same.
13. The method of claim 11, wherein the convergence condition comprises:
and the difference value of the accuracy of the search result subset and the accuracy of the supplementary search result subset corresponding to the target relevance interval is smaller than a preset difference value threshold value.
14. The method of claim 1, wherein the convergence condition comprises:
the number of search results contained in the difference set of the search result set and the supplemental search result set is less than a preset number difference threshold.
15. An apparatus for searching information, comprising:
the search unit is configured to search in the target information base by using the acquired search information to obtain a search result set;
a selecting unit configured to select a subset of search results from the search result set and determine search results belonging to the target search result in the subset of search results as the supplemental search information;
a supplementary search unit configured to perform the following search steps: determining the union of the search result searched by the supplementary search information in the target information base and the search result set as a supplementary search result set; determining whether the supplementary search result set meets a preset convergence condition;
a determining unit configured to select a supplementary search result subset from the supplementary search result set in response to determining that the supplementary search result set does not satisfy a preset convergence condition, and determine supplementary search results belonging to a target search result from the supplementary search result subset as supplementary search information, determine the supplementary search result set as a search result set, and continue to perform the searching step;
wherein the apparatus further comprises: a relevancy determining unit configured to determine relevancy of search results in the search result set, wherein the relevancy is used for indicating relevancy of the search results and the search information;
wherein the association degree determination unit is further configured to: aiming at the supplementary search results in the supplementary search result set, responding to the fact that the supplementary search results exist in the search result set, updating the relevance of the supplementary search results according to a preset relevance promoting algorithm, and taking the updated relevance as the relevance of the supplementary search results; in response to the supplemental search result not being in the set of search results, a degree of relevancy of the supplemental search result is determined.
16. The apparatus of claim 15, wherein the supplemental search unit is further configured to:
in response to determining that the supplemental search result set satisfies the convergence criterion, a target search result set is determined from the supplemental search result set.
17. The apparatus of claim 15, wherein the extraction unit is further configured to:
acquiring label information of the search results in the search result subset, wherein the label information is used for indicating whether the search results are target search results;
and determining the search results belonging to the target search result in the search result subset according to the labeling information.
18. The apparatus of claim 15, wherein the association determination unit is further configured to:
determining the similarity between the search result and the search information;
and determining the relevance of the search result according to the similarity, wherein the similarity is in direct proportion to the relevance.
19. The apparatus of claim 15, wherein the association determination unit is further configured to:
determining similarity of the supplemental search results to the search information;
and determining the relevance of the supplementary search result according to the similarity, wherein the similarity is in direct proportion to the relevance.
20. The apparatus of claim 15, wherein the supplemental search unit is further configured to:
and selecting a target number of supplementary search results from the supplementary search result set as target search results according to the sequence of the relevance degrees from large to small to obtain a target search result set.
21. The apparatus of claim 15, wherein the apparatus further comprises a partitioning unit configured to:
dividing the search result set into corresponding target number search result subsets according to a preset target number relevance degree interval; and
the selecting unit is further configured to:
and respectively selecting the search results from the target number of search result subsets to obtain the search result subsets.
22. The apparatus of claim 21, wherein the apparatus further comprises:
and the accuracy determining unit is configured to determine the accuracy of the search result subsets in the target number of search result subsets, wherein the accuracy is used for expressing the proportion of the target search results in the search result subsets.
23. The apparatus of claim 22, wherein the partitioning unit is further configured to:
dividing the supplementary search result set into a corresponding target number of supplementary search result subsets according to the relevance interval; and
the determination unit is further configured to:
and respectively selecting supplementary search results from the target number of supplementary search result subsets to obtain the supplementary search result subsets.
24. The apparatus of claim 23, wherein the accuracy determination unit is further configured to:
an accuracy of a subset of the supplemental search results from the target number of the subset of supplemental search results is determined, where the accuracy is indicative of a proportion of the target search results from the subset of supplemental search results.
25. The apparatus of claim 24, wherein the determining unit is further configured to:
determining the absolute value of the difference between the accuracy of the search result set corresponding to the association degree interval in which the supplementary search result subset is located and a preset accuracy threshold value aiming at the supplementary search subset in the target number of supplementary search result subsets; and determining the number of the supplementary search results selected from the supplementary search result subset according to the absolute value and the number of the supplementary search results contained in the supplementary search result subset, wherein the absolute value is inversely proportional to the number of the selected supplementary search results, and the number of the contained supplementary search results is proportional to the number of the selected supplementary search results.
26. The apparatus of one of claims 15-25, wherein a number of search results in the search result set and a number of supplemental search results in the subset of supplemental search results are the same.
27. The apparatus of claim 25, wherein the convergence condition comprises:
and the difference value of the accuracy of the search result subset and the accuracy of the supplementary search result subset corresponding to the target relevance interval is smaller than a preset difference value threshold value.
28. The apparatus of claim 15, wherein the convergence condition comprises:
the number of search results contained in the difference set of the search result set and the supplemental search result set is less than a preset number difference threshold.
29. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-14.
30. A computer-readable medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method of any one of claims 1 to 14.
CN201811060981.3A 2018-09-12 2018-09-12 Method and apparatus for searching information Active CN109308299B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811060981.3A CN109308299B (en) 2018-09-12 2018-09-12 Method and apparatus for searching information
PCT/CN2018/116342 WO2020052067A1 (en) 2018-09-12 2018-11-20 Information search method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811060981.3A CN109308299B (en) 2018-09-12 2018-09-12 Method and apparatus for searching information

Publications (2)

Publication Number Publication Date
CN109308299A CN109308299A (en) 2019-02-05
CN109308299B true CN109308299B (en) 2020-01-14

Family

ID=65225022

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811060981.3A Active CN109308299B (en) 2018-09-12 2018-09-12 Method and apparatus for searching information

Country Status (2)

Country Link
CN (1) CN109308299B (en)
WO (1) WO2020052067A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628283A (en) * 2019-06-04 2023-08-22 苏州智贸捷通科技有限公司 Manual data verification method based on big data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102955837A (en) * 2011-12-13 2013-03-06 华东师范大学 Analogy retrieval control method based on Chinese word pair relationship similarity
CN106649554A (en) * 2016-11-08 2017-05-10 北京奇虎科技有限公司 Application program search method, device, server and system

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1609859A (en) * 2004-11-26 2005-04-27 孙斌 Search result clustering method
CN102207968B (en) * 2011-06-08 2013-11-20 北京百度网讯科技有限公司 Search result correlation judgment-based search method and device
CN103136213B (en) * 2011-11-23 2017-04-12 阿里巴巴集团控股有限公司 Method and device for providing related words
CN102999556B (en) * 2012-10-15 2016-02-10 百度在线网络技术(北京)有限公司 Text search method, device and terminal device
CN105243060B (en) * 2014-05-30 2019-11-08 小米科技有限责任公司 A kind of method and device of retrieving image
CN106156109B (en) * 2015-04-03 2020-09-04 阿里巴巴集团控股有限公司 Searching method and device
CN104834693B (en) * 2015-04-21 2017-11-28 上海交通大学 Visual pattern search method and system based on deep search
CN105426529B (en) * 2015-12-15 2017-02-22 中南大学 Image retrieval method and system based on user search intention positioning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102955837A (en) * 2011-12-13 2013-03-06 华东师范大学 Analogy retrieval control method based on Chinese word pair relationship similarity
CN106649554A (en) * 2016-11-08 2017-05-10 北京奇虎科技有限公司 Application program search method, device, server and system

Also Published As

Publication number Publication date
CN109308299A (en) 2019-02-05
WO2020052067A1 (en) 2020-03-19

Similar Documents

Publication Publication Date Title
CN107506495B (en) Information pushing method and device
CN108846753B (en) Method and apparatus for processing data
US20190253760A1 (en) Method and apparatus for recommending video
US11314451B2 (en) Method and apparatus for storing data
CN109582873B (en) Method and device for pushing information
CN108933695B (en) Method and apparatus for processing information
CN109829164B (en) Method and device for generating text
CN109858045B (en) Machine translation method and device
CN108011949B (en) Method and apparatus for acquiring data
CN111104479A (en) Data labeling method and device
CN108038172B (en) Search method and device based on artificial intelligence
CN110188113B (en) Method, device and storage medium for comparing data by using complex expression
CN109165723B (en) Method and apparatus for processing data
CN114817651A (en) Data storage method, data query method, device and equipment
CN111597107A (en) Information output method and device and electronic equipment
CN113590756A (en) Information sequence generation method and device, terminal equipment and computer readable medium
CN112364185B (en) Method and device for determining characteristics of multimedia resources, electronic equipment and storage medium
CN108011936B (en) Method and device for pushing information
CN107908662B (en) Method and device for realizing search system
CN109308299B (en) Method and apparatus for searching information
CN108920707B (en) Method and device for labeling information
CN110647623B (en) Method and device for updating information
CN111125163A (en) Method and apparatus for processing data
CN111125503A (en) Method and apparatus for generating information
CN111552715B (en) User query method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Tiktok vision (Beijing) Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: BEIJING BYTEDANCE NETWORK TECHNOLOGY Co.,Ltd.

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Douyin Vision Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: Tiktok vision (Beijing) Co.,Ltd.