CN114595319A - Method for sorting search results of search content - Google Patents

Method for sorting search results of search content Download PDF

Info

Publication number
CN114595319A
CN114595319A CN202210233404.XA CN202210233404A CN114595319A CN 114595319 A CN114595319 A CN 114595319A CN 202210233404 A CN202210233404 A CN 202210233404A CN 114595319 A CN114595319 A CN 114595319A
Authority
CN
China
Prior art keywords
position parameter
parameter
node
leaf node
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210233404.XA
Other languages
Chinese (zh)
Inventor
臧文娟
江蓉
熊雅莉
谢隆飞
黄建平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202210233404.XA priority Critical patent/CN114595319A/en
Publication of CN114595319A publication Critical patent/CN114595319A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application belongs to the field of entity relation extraction, relates to a method for sequencing search results of search contents, and is used for solving the problem that the prior art is poor in text sequencing effect of the search results. The method specifically comprises the following steps: establishing a logic tree according to a logic expression corresponding to the search content; the nodes of the logic tree comprise leaf nodes and non-leaf nodes, the leaf nodes are keywords, the non-leaf nodes are logic operators, and the nodes are connected according to a logic expression; for any search result text, determining a correlation parameter between the search result text and search content according to a binary group corresponding to a root node of the logic tree; the search result text is determined by searching by using the keywords of the search content; each node corresponds to at least one binary group, and each binary group comprises a position parameter related to the position of the keyword corresponding to the node in the search result text and a distance parameter of the keyword corresponding to the node in the search result text; and sequencing the texts of the search results according to the relevance parameters.

Description

Method for sorting search results of search content
Technical Field
The present application relates to the field of entity relationship extraction technologies, and in particular, to a method for ranking search results of search content.
Background
For a full-text retrieval system or a search engine, a search is performed in a designated search range (for example, a database or an internet web page) according to search contents input by a user, and a most relevant text obtained by search matching is displayed to the user. The existing search algorithm generally obtains keywords by segmenting the content input by a user, then matches the keywords with the words appearing in the text in the search range, and sorts and outputs the successfully matched search result text. In ranking search result text by relevance to the search content input by the user, the prior art generally considers search result text with more words successfully matched with keywords as more relevant to the search content input by the user, and the ranking is output to the user at a more forward position. However, when the keywords in the search content input by the user have a more complex logical relationship, the existing search ranking algorithm mainly focuses on the influence of the matching condition of the keywords on the relevance, so the evaluation on the relevance between each search result text and the keywords input by the user is not reasonable, and the effect of ranking the search result texts according to the relevance is not good.
Disclosure of Invention
The embodiment of the application provides a method for sequencing search results of search contents, which is used for solving the problem that the effect of sequencing search result texts according to relevance is not good in the prior art.
In a first aspect, an embodiment of the present application provides a method for ranking search results of search content, including:
establishing a logic tree corresponding to the search content according to a logic expression corresponding to the search content, wherein nodes of the logic tree comprise leaf nodes and non-leaf nodes, the leaf nodes are the keywords, the non-leaf nodes are logic operators in the logic expression, and the nodes are connected according to a logic relationship of the logic expression;
for any search result text, determining a correlation parameter between the search result text and the search content according to the binary group corresponding to the root node of the logic tree; the search result text is a text which is searched and matched in a specified search range by utilizing the keywords of the search content; wherein each node in the logical tree corresponds to at least one binary group, each binary group comprising a position parameter related to the occurrence position of the keyword corresponding to the node in the search result text and a distance parameter of the keyword corresponding to the node in the search result text;
and sequencing each search result text according to the corresponding relevance parameter of each search result text.
Optionally, the position parameter in the binary group corresponding to each node is determined by:
determining the appearance positions of the keywords corresponding to the leaf nodes in the search result text for any leaf node in the logic tree, and taking each appearance position as the position parameter of a binary group corresponding to the leaf node;
for any non-leaf node in the logic tree, determining a plurality of next-level nodes corresponding to the non-leaf node, and determining the position parameter of at least one binary group corresponding to the non-leaf node according to the position parameter of each binary group corresponding to each next-level node;
determining a distance parameter in the duplet corresponding to each node by:
for any leaf node in the logic tree, taking a preset invalid value as a distance parameter of all binary groups corresponding to the leaf node;
and determining a plurality of next-level nodes corresponding to the non-leaf nodes for any non-leaf node in the logic tree, and determining the distance parameter of at least one binary group corresponding to the non-leaf node according to each binary group corresponding to each next-level node.
Optionally, if the logical operator corresponding to the non-leaf node is an or operator, the binary group corresponding to the non-leaf node is all binary groups corresponding to all next-level nodes corresponding to the non-leaf node.
Optionally, if the logical operator corresponding to the non-leaf node is an and operator, determining, for any non-leaf node in the logical tree, a plurality of next-level nodes corresponding to the non-leaf node, and determining, according to each binary group corresponding to each next-level node, a distance parameter of at least one binary group corresponding to the non-leaf node, including:
if the next-level node corresponding to the non-leaf node only comprises leaf nodes, determining a distance parameter of at least one binary group corresponding to the non-leaf node according to the position parameter of each binary group corresponding to each next-level leaf node;
if the next-level node corresponding to the non-leaf node only comprises a non-leaf node, determining the distance parameter of at least one binary group corresponding to the non-leaf node according to the distance parameter of each binary group corresponding to each next-level non-leaf node;
and if the next-level node corresponding to the non-leaf node comprises a non-leaf node and a leaf node, determining the distance parameter of at least one binary group corresponding to the non-leaf node together according to the position parameter of each binary group corresponding to each next-level leaf node and the distance parameter of each binary group corresponding to each next-level non-leaf node.
Optionally, if the logical operator corresponding to the non-leaf node is an and operator, determining a position parameter and a distance parameter in a binary group corresponding to the non-leaf node by:
for any next-level node corresponding to the non-leaf node, arranging the position parameters of the binary groups corresponding to the next-level node in a descending order to obtain a position parameter sequence;
arranging the position parameter sequences corresponding to the next-level nodes according to the sequence of the first item values in the position parameter sequences from small to large;
starting with the first position parameter in the last sequence of position parameters set as a reference position parameter, the following steps are performed:
selecting a position parameter with the minimum difference value with the reference position parameter in the last position parameter sequence, and deleting all position parameters which are smaller than the selected position parameter in the last position parameter sequence; wherein the last position parameter sequence is a position parameter sequence one before the position parameter sequence in which the reference position parameter is located in the position parameter sequence arrangement; then updating the determined position parameter to the reference position parameter, and returning to the step of selecting the position parameter with the minimum difference value with the reference position parameter in the current position parameter sequence in the last position parameter sequence until the selected position parameter is the position parameter of the first position parameter sequence;
when the selected position parameter is the position parameter of the first position parameter sequence, taking out the first position parameter in each position parameter sequence as a candidate operation array, and if all the position parameter sequences have the position parameter at the moment, returning to the step of arranging the position parameter sequences corresponding to the next-stage nodes in the order of the first item values in the position parameter sequences from small to large; if no position parameter exists in the position parameter sequence at this moment, for any candidate operation array, determining the position parameter and the distance parameter in a binary group corresponding to the non-leaf node according to the candidate operation array and the distance parameter corresponding to each position parameter in the candidate operation array.
Optionally, the position parameter and the distance parameter in a binary group corresponding to the non-leaf node are determined according to the candidate operation array in the following manner:
taking an integer according to the average value of the position parameters in the candidate operational array, and taking the integer as the position parameter in a binary group corresponding to the non-leaf node;
calculating the distance parameter in the binary group according to the following formula:
Figure BDA0003541160030000041
distance is a distance parameter in the binary group, j and k are sequence numbers, AjThe j-th adjacent term difference value after the position parameters in the candidate operation array are arranged according to the size sequence, BkAnd the distance parameter is a distance parameter of a binary group corresponding to the kth next-level non-leaf node, p is the total number of the adjacent item difference values, and q is the total number of the binary group corresponding to the next-level non-leaf node.
Optionally, if there are a plurality of position parameters in the last position parameter sequence that have the smallest difference from the reference position parameter, selecting a position parameter in the last position parameter sequence that has the smallest difference from the reference position parameter, includes:
and selecting the position parameter with the minimum value from the position parameters with the minimum difference value with the reference position parameter in the last position parameter sequence.
Optionally, the relevance parameter of the search result text and the search content is determined specifically by the following formula:
Figure BDA0003541160030000051
wherein, the Score is a correlation parameter between the search result text and the search content, N is a total number of the binary groups corresponding to the root node, i is a serial number, oiFor the position parameter in the i-th tuple corresponding to the root node, diAnd the distance parameter is the distance parameter in the ith binary group corresponding to the root node.
Optionally, the search result text is a text obtained by searching and matching in a specified search range by using an inverted index algorithm by using the keywords of the search content.
Optionally, the occurrence positions of the keywords corresponding to the leaf nodes in the search result text are determined by an inverted index list of the search result text, where the inverted index list is obtained by recording the occurrence positions of the words in the search result text in advance.
In a second aspect, an embodiment of the present application further provides an apparatus for ranking search results of search content, including:
the search content analysis module is used for establishing a logic tree corresponding to the search content according to a logic expression corresponding to the search content, wherein nodes of the logic tree comprise leaf nodes and non-leaf nodes, the leaf nodes are the keywords, the non-leaf nodes are logic operators in the logic expression, and the nodes are connected according to the logic relationship of the logic expression;
the correlation calculation module is used for determining the correlation parameters of the search result text and the search contents according to the binary group corresponding to the root node of the logic tree aiming at any search result text; the search result text is a text which is searched and matched in a specified search range by utilizing the keywords of the search content; wherein each node in the logical tree corresponds to at least one binary group, each binary group comprising a position parameter related to the occurrence position of the keyword corresponding to the node in the search result text and a distance parameter of the keyword corresponding to the node in the search result text;
and the ranking module is used for ranking each search result text according to the corresponding relevance parameter of each search result text.
In a third aspect, an embodiment of the present application further provides an electronic device, including:
a memory for storing program instructions;
a processor, configured to invoke the program instruction stored in the memory, and execute the method for ranking search results of search content according to any one of the first aspect according to the obtained program instruction.
In a fourth aspect, the present application provides a computer-readable storage medium storing a computer program comprising program instructions which, when executed by a computer, cause the computer to perform the method of ranking search results of searching content of any of the first aspects.
In a fifth aspect, the present application provides a computer program product comprising: computer program code for causing a computer to perform the method of ranking search results of searching content of any one of the first aspect when the computer program code is run on a computer.
The beneficial effect of this application is as follows:
according to the method for sequencing the search results of the search content, the influence of the occurrence positions of the keywords in the search result text on the overall semantics of the search result text is considered, the logic tree is established for the search content, the approximate positions of the semantics of all logic branches in the search content in the search result text are determined according to the occurrence positions of the keywords in the search result text, therefore, the correlation between the search result text and the search content is analyzed according to all positions and logic relations, and the difference between the semantics of the search result text and the search content can be better reflected.
Drawings
Fig. 1 is a schematic flowchart of a method for ranking search results of search content according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a logic tree established for a search content in an embodiment of the present application;
fig. 3 is one of partial schematic flow charts of a method for ranking search results of search content according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a logic tree in an embodiment of the present application;
fig. 5 is a second partial flowchart of a method for sorting search results of search content according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an apparatus for sorting search results of search content according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is further described with reference to the accompanying drawings and examples. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar structures, and thus their repetitive description will be omitted. The words used in this application to describe positions and orientations are provided by way of example in the drawings and can be changed as desired and are intended to be encompassed by the present application. The drawings of the present application are for illustrating relative positional relationships only and do not represent true scale.
It should be noted that in the following description, specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import to those skilled in the art without departing from the spirit and scope of this application. The present application is therefore not limited to the specific embodiments disclosed below. The description which follows is a preferred embodiment of the present application, but is made for the purpose of illustrating the general principles of the application and not for the purpose of limiting the scope of the application. The protection scope of the present application shall be subject to the definitions of the appended claims.
A method for ranking search results of search content according to an embodiment of the present application is specifically described below with reference to the accompanying drawings. It should be noted that, in the technical solution of the embodiment of the present application, the data acquisition, storage, use, processing, and the like all conform to relevant regulations of national laws and regulations.
An embodiment of the present application provides a method for ranking search results of search content, as shown in fig. 1, including:
s110, establishing a logic tree corresponding to the search content according to the logic expression corresponding to the search content; the nodes of the logic tree comprise leaf nodes and non-leaf nodes, the leaf nodes are the keywords, the non-leaf nodes are logic operators in the logic expression, and the nodes are connected according to the logic relation of the logic expression.
For example, a certain search content is "(sports OR fitness) AND (soccer OR kicking) & & (happy | holding & & good mood))", AND a corresponding established logical tree is shown in fig. 2.
S120, aiming at any search result text, determining the correlation parameter between the search result text and the search content according to the binary group corresponding to the root node of the logic tree. The search result text is a text which is searched and matched in a specified search range by utilizing the keywords of the search content; wherein each node in the logical tree corresponds to at least one duplet, each duplet comprising a position parameter related to the occurrence position of the keyword corresponding to the node in the search result text and a distance parameter of the keyword corresponding to the node in the search result text.
In a specific implementation process, the specified search range may be a preset text database, or a web page in the internet, and the like, which is not specifically limited herein.
Hereinafter, the binary will be expressed in the form of < offset, distance >, where offset is a position parameter of the binary and distance is a distance parameter of the binary.
S130, ranking each search result text according to the corresponding relevance parameter of each search result text.
In this way, compared with the existing search algorithm, the method for ranking the search results of the search content provided by the embodiment of the application considers the influence of the occurrence positions of the keywords in the search result text on the overall semantics of the search result text, establishes the logic tree for the search content, and determines the approximate positions of the semantics of each logic branch in the search content in the search result text by using the occurrence positions of the keywords in the search result text, so that the relevance between the search result text and the search content is analyzed by using each position and logic relationship, and the difference between the semantics of the search result text and the search content can be better reflected.
For example, still taking the search content exemplified above as an example, there are two search result texts at present:
(1) "I like sports very much, will play a football with friends every week, can make I keep good mood like this. "
(2) "I played yesterday with Xiaoming together played, I played badly, did not like seeing football match at ordinary times yet, if you say that the missing person of football team did not go to play, I still liked playing basketball and building up body. After that, I go to XX with him and have eaten and we see the movie together with the two. That movie was very nice-looking and the main scenario was … … (hundreds of characters omitted). The user is happy after watching the movie. "
In fact, it is known from semantics that the search result text (1) is highly relevant to the search content, whereas the search result text (2) is less relevant to the search content. In the existing search ranking algorithm, since the number of times of occurrence of the keyword in the search result text (2) is higher than that of the search result text (1), the search result text (2) is output to the user before the search result text (1). According to the method for ranking the search results of the search content provided by the embodiment of the application, by considering the occurrence positions of the keywords in the search result text, the keywords "body building", "football", "kicking ball" and "body building" in the search result text (2) can be used to appear at the beginning of the text, and the keywords "open heart" appear at the end of the text, so that the appearance positions of the logical branches in the logical tree are far apart, the semantic association between the logical branch "(open heart | happy | (keep & & good mood))" and other logical branches can be considered not large, and the correlation between the search result text (2) and the search content is relatively low.
Further, the position parameters in the binary group corresponding to each node are determined by the following method:
determining the appearance positions of the keywords corresponding to the leaf nodes in the search result text for any leaf node in the logic tree, and taking each appearance position as the position parameter of a binary group corresponding to the leaf node;
and determining a plurality of next-level nodes corresponding to the non-leaf nodes for any non-leaf node in the logic tree, and determining the position parameter of at least one binary group corresponding to the non-leaf node according to the position parameter of each binary group corresponding to each next-level node.
Determining a distance parameter in the duplet corresponding to each node by:
for any leaf node in the logic tree, taking a preset invalid value as a distance parameter of all binary groups corresponding to the leaf node;
and determining a plurality of next-level nodes corresponding to the non-leaf nodes for any non-leaf node in the logic tree, and determining the distance parameter of at least one binary group corresponding to the non-leaf node according to each binary group corresponding to each next-level node.
In a specific implementation process, for a leaf node, a position parameter in a corresponding binary group is an appearance position of a keyword corresponding to the leaf node in the search result text, where the appearance position may be a coordinate taking the number of words as a unit, or a coordinate taking the number of words as a unit after performing word segmentation on the search result text in the search process, and is not limited here. The following description will be given taking coordinates in units of word numbers as an example.
In the specific implementation process, for any leaf node, the corresponding binary group is at least one. For example, for the search result text (2), the keyword "fitness" appears only once, and then the binary group corresponding to the leaf node "fitness" has only one <36, -1 >; and the keyword "football" appears twice, then the leaf node "football" corresponds to the two-tuple has two <17, -1>, <23, -1 >.
In the logical tree, the non-leaf node is necessarily a direct or indirect upper node of the leaf node, and then the binary group of the non-leaf node can be finally determined by the binary group of the corresponding leaf node.
The following explains the binary group determination methods of different non-leaf nodes respectively.
And (I) if the logical operator corresponding to the non-leaf node is an OR operator, the binary group corresponding to the non-leaf node is all the binary groups corresponding to all the next-level nodes corresponding to the non-leaf node.
For example, the logical tree shown in FIG. 2 includes non-leaf nodes J1-J5, wherein the duplets corresponding to non-leaf node J2 are all the duplets corresponding to leaf node "sports" and leaf node "fitness"; the duplets corresponding to non-leaf node J4 are all the duplets corresponding to leaf node "happy", and non-leaf node J5.
(II) if the logical operator corresponding to the non-leaf node is an AND operator, determining a plurality of next-level nodes corresponding to the non-leaf node for any one non-leaf node in the logical tree, and determining a distance parameter of at least one binary group corresponding to the non-leaf node according to each binary group corresponding to each next-level node, comprising:
if the next-level node corresponding to the non-leaf node only comprises leaf nodes, determining a distance parameter of at least one binary group corresponding to the non-leaf node according to the position parameter of each binary group corresponding to each next-level leaf node;
if the next-level node corresponding to the non-leaf node only comprises a non-leaf node, determining the distance parameter of at least one binary group corresponding to the non-leaf node according to the distance parameter of each binary group corresponding to each next-level non-leaf node;
and if the next-level node corresponding to the non-leaf node comprises a non-leaf node and a leaf node, determining the distance parameter of at least one binary group corresponding to the non-leaf node together according to the position parameter of each binary group corresponding to each next-level leaf node and the distance parameter of each binary group corresponding to each next-level non-leaf node.
The binary determination method of the non-leaf node of the AND operator will be explained as follows.
Embodiment 1:
as shown in fig. 3, the method specifically includes the following steps:
s201, determining all the operation arrays corresponding to the non-leaf nodes, wherein each operation array is an operation array obtained by selecting a position parameter of a binary group corresponding to a next-level node for any next-level node and combining the selected position parameters.
S202, calculating the range of the position parameters in each operational array.
The range is the difference between the maximum position parameter and the minimum position parameter in the operation array.
S203, taking the next-level node with the minimum number of the corresponding binary groups as a reference next-level node, and taking any position parameter corresponding to the reference next-level node as a reference position parameter; and for any reference position parameter, determining the arithmetic array with the minimum range in the arithmetic array containing the reference position parameter as a candidate arithmetic array.
S204, arranging the candidate operation arrays according to the sequence of the reference position parameters from small to large.
S205, taking the second candidate operation array in the arrangement as a judgment operation array.
S206, judging whether the position parameters appear in the candidate operation array of the position before arrangement in the judgment operation array.
If the result of step S206 is yes, step S207 is executed; if the result of step S206 is no, step S208 is executed.
S207, deleting the judgment arithmetic array from all the determined arithmetic arrays, and re-determining the arithmetic array with the smallest difference among the arithmetic arrays including the reference position parameter in the deleted judgment arithmetic array, replacing the deleted judgment arithmetic array with the re-determined arithmetic array as a candidate arithmetic array in the arrangement, and updating the replaced candidate arithmetic array to the judgment arithmetic array. Returning to the step S206.
And S208, judging whether the judged operational array is the last candidate operational array.
If the result of the step S208 is no, execute step S209; if the result of the step S208 is yes, step S210 is executed.
S209, updating the candidate operation array at the next position in the arrangement into the judgment operation array. Returning to the step S206.
S210, for any one candidate operation array, determining a position parameter and a distance parameter in a binary group corresponding to the non-leaf node according to the candidate operation array and the distance parameter corresponding to each position parameter in the candidate operation array.
Determining a position parameter and a distance parameter in a binary group corresponding to the non-leaf node according to the candidate operation array in the following way:
taking an integer according to the average value of the position parameters in the candidate operational array, and taking the integer as the position parameter in a binary group corresponding to the non-leaf node;
calculating the distance parameter in the binary group according to the following formula:
Figure BDA0003541160030000121
distance is a distance parameter in the binary group, j and k are serial numbers, AjThe j-th adjacent term difference value after the position parameters in the candidate operation array are arranged according to the size sequence, BkA distance parameter of a binary group corresponding to the kth next-level non-leaf node, p is the total number of the adjacent term difference values, and q is the next-level non-leaf nodeThe total number of corresponding duplets.
The above operation is described below with a specific example.
As shown in FIG. 4, assume that a non-leaf node J of an operator corresponds to three next-level nodes A, B, and C (i.e., the logical expression corresponding to the non-leaf node J is "A and B and C"), wherein A and B are leaf nodes, and C is a non-leaf node. The position parameters of the doublet corresponding to the first, second and third are as follows:
a: 3. 5, 11
B: 1. 2, 8, 10
C: 6. 7, 9,13, 15, 20
And determining operation arrays, wherein each operation array comprises 3 position parameters, and different position parameters correspond to different next-level nodes. Then the total number of the operand arrays is 3 × 4 × 6 — 72, as follows, where the left side parenthesis indicates the operand array and the right arrow points to the number which is the range of the position parameter in the operand array:
Figure BDA0003541160030000131
because the number of the binary group corresponding to the next-level node A is the minimum (3 binary groups), the next-level node A is taken as a reference next-level node, the position parameters 3, 5 and 11 corresponding to the next-level node A are respectively taken as reference position parameters, and an arithmetic array (3,2 and 6) with the minimum range among the arithmetic array containing the position parameter 3, an arithmetic array (5,8 and 6) with the minimum range among the arithmetic array containing the position parameter 5 and an arithmetic array (11,10 and 9) with the minimum range among the arithmetic array containing the position parameter 11 are respectively determined. Taking (3,2,6), (5,8,6), (11,10,9) as candidate operation arrays, and arranging the position parameters 3, 5, 11 corresponding to the next-level node A in the order from small to large:
(3,2,6)
(5,8,6)
(11,10,9)
starting from the second candidate operand array (5,8,6) in the permutation, since the position parameter 6 in (5,8,6) appears in the previous candidate operand array (3,2,6), then (5,8,6) is deleted from all the above operand arrays and the operand array (5,8,7) with the smallest range among the operand arrays containing the position parameter 5 is re-determined, and (5,8,7) is replaced by (5,8,6) in the permutation, when the sequence of the candidate operand arrays is as follows:
(3,2,6)
(5,8,7)
(11,10,9)
for the candidate operand array (11,10,9) and so on, it does not need to be deleted and replaced, and the final candidate operand array arrangement is:
(3,2,6)
(5,8,7)
(11,10,9)
for the candidate operand array (3,2,6), the corresponding one of the tuples:
position parameters:
Figure BDA0003541160030000141
if the next level node C corresponds to a doublet <6,7>, then:
distance parameters:
Figure BDA0003541160030000142
thus, a doublet of the non-leaf node J is <4,4 >.
For the candidate operand array (5,8,7), the corresponding one of the tuples:
position parameters:
Figure BDA0003541160030000151
if the next level node C corresponds to a doublet <7,3>, then:
distance parameters:
Figure BDA0003541160030000152
thus, a doublet of <7,2> is obtained for one of the non-leaf nodes J.
For the candidate array of operations (11,10,9), the corresponding one of the tuples:
position parameters:
Figure BDA0003541160030000153
if the next level node C corresponds to a doublet <9,13>, then:
distance parameters:
Figure BDA0003541160030000154
thus, a doublet of <10,5> is obtained for one of the non-leaf nodes J.
In summary, the tuples corresponding to the non-leaf nodes J are <4,4>, <7,2>, <10,5 >.
The inventor conducts creative research to find that the appearance positions of words and the word number distance between words influence the overall semantics of the text to a certain extent. For example, the search result text (2) described above, the semantic difference between the search result text (2) and the search content is large because the different keywords appear too far apart by the number of words. In the above example, the non-leaf node J corresponds to two leaf nodes, and one non-leaf node is taken as an example for explanation. In fact, the next-level nodes corresponding to the non-leaf nodes in the logic tree are all leaf nodes. Then, if a non-leaf node is regarded as an abstract "keyword", in the binary group corresponding to the non-leaf node, through the above operation process, the search result text may be divided according to a text in which the keyword appears once corresponding to each next-level node of the non-leaf node, and each operation array represents a division manner. The process of searching the arithmetic array with the minimum range of extreme differences is a text division mode of searching different keywords which continuously appear once, the text content of the position range where the different keywords continuously appear can represent the semantics of the non-leaf node, and therefore the keywords can be regarded as the central position of each keyword of the next-level node in the text range of the division or the appearance position of the abstract keyword corresponding to the non-leaf node. And the process of eliminating the candidate operation arrays with the same position parameters is to avoid the overlapping of the appearance positions of the abstract keywords corresponding to the finally determined non-leaf nodes. The distance parameter in the binary group corresponding to the non-leaf node can represent the condition that the keywords are separated by the word number distance in the text range in which the keywords continuously appear. In this way, for a non-leaf node J including both leaf nodes and non-leaf nodes in the next-level node, the position parameter of each binary group may represent a "central position" in a partition range where the keyword and the abstract keyword of one next-level node continuously appear once, or be regarded as an appearance position of the abstract keyword corresponding to the non-leaf node J; the distance parameter in the binary group corresponding to the non-leaf node J can represent the situation that the distance between each keyword and the abstract keyword is equal to the distance between the number of words in the text range in which the keywords and the abstract keywords continuously appear. Correspondingly, the binary group of the leaf node only represents the occurrence position condition of one keyword, the position parameter in the binary group represents the occurrence position coordinate of the keyword, the leaf node only corresponds to one keyword, the concept of the distance between a plurality of keywords and the word number does not exist, and the distance parameter is an invalid value.
In the process, the binary group of the operator non-leaf node is calculated by searching the arithmetic array with the minimum range on the premise of non-repeated numerical values, so that the optimal division condition of the text range in which the keywords corresponding to the non-leaf nodes continuously appear in the search result text can be found. But the above processes need to be right
Figure BDA0003541160030000161
(wherein i is a number, niThe number of binary groups corresponding to the ith next-level node, and m is the number of next-level nodes corresponding to the non-leaf node) operation arrays, and when the number of keywords of the search content is large and the logical relationship of the logical expression is complex, the operation amount is very large. To reduce the operationThe above process can be further improved by increasing the operation speed, for example, embodiment 2 described below.
Embodiment 2:
as shown in fig. 5, the method specifically includes the following steps:
s401, for any next-level node corresponding to the non-leaf node, arranging the position parameters of the binary groups corresponding to the next-level node in a sequence from small to large to obtain a position parameter sequence.
S402, arranging the position parameter sequences corresponding to the next-level nodes according to the sequence of the first item values in the position parameter sequences from small to large.
And S403, setting the first position parameter in the last position parameter sequence as a reference position parameter.
S404, selecting a position parameter with the minimum difference value with the reference position parameter in the last position parameter sequence, and deleting all position parameters which are smaller than the determined position parameter in the last position parameter sequence. Wherein the last position parameter sequence is a position parameter sequence that is one before the position parameter sequence in which the reference position parameter is located in the position parameter sequence arrangement.
In a specific implementation process, if a plurality of position parameters in the last position parameter sequence have the smallest difference with the reference position parameter, the position parameter with the smallest value can be selected.
S405, updating the selected position parameter to the reference position parameter.
S406, judging whether the returned reference position parameter is the position parameter in the first position sequence.
If the result of the step S406 is yes, go to step S407; if the result of the step S406 is no, the step S404 is executed.
S407, extracting and combining the first position parameters in each position parameter sequence into a candidate operation array.
S408, judging whether position parameters exist in all the position parameter sequences at the moment.
If the result of the step S408 is yes, returning to the step S402; if the result of the step S408 is no, step S409 is executed.
S409, for any candidate operation array, determining a position parameter and a distance parameter in a binary group corresponding to the non-leaf node according to the candidate operation array and the distance parameter corresponding to each position parameter in the candidate operation array;
determining a position parameter and a distance parameter in a binary group corresponding to the non-leaf node according to the candidate operation array in the following way:
taking an integer according to the average value of the position parameters in the candidate operational array, and taking the integer as the position parameter in a binary group corresponding to the non-leaf node;
calculating the distance parameter in the binary group according to the following formula:
Figure BDA0003541160030000171
distance is a distance parameter in the binary group, j and k are serial numbers, AjThe j-th adjacent term difference value after the position parameters in the candidate operation array are arranged according to the size sequence, BkAnd the distance parameter is a distance parameter of a binary group corresponding to the kth next-level non-leaf node, p is the total number of the adjacent item difference values, and q is the total number of the binary group corresponding to the next-level non-leaf node.
The following description is given by taking the case where the non-leaf node J corresponds to three next-level nodes a, b, and c as an example.
The corresponding position parameter sequences of A, B and C are as follows:
a: 3. 5, 11
B: 1. 2, 8, 10
C: 6. 7, 9,13, 15, 20
Arranging the position parameter sequences corresponding to the next-level nodes according to the sequence of the first item values in the position parameter sequences from small to large, namely:
b: 1. 2, 8, 10
A: 3. 5, 11
C: 6. 7, 9,13, 15, 20
Firstly, taking the first position parameter 6 of the last position parameter sequence as a reference position parameter, and if the position parameter with the minimum difference value with the reference position parameter 6 in the last position parameter sequence A is 5, deleting the position parameter 3 in the position parameter sequence A, and at this time:
b: 1. 2, 8, 10
A: 5. 11. the following examples illustrate the use of
C: 6. 7, 9,13, 15, 20
Taking the position parameter 5 in the position parameter sequence A as a reference position parameter, and selecting the position parameter with the minimum difference value with the reference position parameter 5 in the last position parameter sequence A as 2, then deleting the position parameter 1 in the position parameter sequence A, at this time:
b: 2. 8, 10
A: 5. 11. the following examples illustrate the use of
C: 6. 7, 9,13, 15, 20
At this time, the first position parameter sequence is selected, and the corresponding last position parameter sequence is not available. At this time, the first position parameter in each position parameter sequence is taken out and combined into a candidate operation array (2,5, 6). At this time:
b: 8. 10. the method of the present invention
A: 11
Third: 7. 9,13, 15, 20
Arranging the position parameter sequences corresponding to the next-level nodes according to the sequence of the first item values in the position parameter sequences from small to large, namely:
c: 7. 9,13, 15, 20
B: 8. 10. the method comprises
A: 11
And (3) taking the first position parameter in the last position parameter sequence C as a reference position parameter, repeating the process in the same way, and obtaining a candidate operation array (11,10, 9). At this time:
c: 13. 15, 20
B:
a:
at this time, the position parameter sequence A and the position parameter sequence B have no position parameter, the process of determining the candidate operation array is ended, and two candidate operation arrays (2,5,6), (11,10,9) are obtained.
For the candidate operand array (2,5,6), the corresponding one of the tuples:
position parameters:
Figure BDA0003541160030000191
if the next level node C corresponds to a doublet <6,7>, then:
distance parameters:
Figure BDA0003541160030000192
thus, a doublet of the non-leaf node J is <4,4 >.
For the candidate array of operations (11,10,9), the corresponding one of the tuples:
position parameters:
Figure BDA0003541160030000193
if the next level node C corresponds to a doublet <7,3>, then:
distance parameters:
Figure BDA0003541160030000201
thus, a doublet of <7,2> is obtained for one of the non-leaf nodes J.
In summary, the tuples corresponding to the non-leaf nodes J are <4,4>, <7,2 >.
It should be noted that other technical solutions obtained by simple transformation of the above procedures by those skilled in the art should be regarded as equivalent technical solutions of the embodiments of the present application, and fall within the scope of the present application. For example, in the above embodiment 2, if the step S402 is modified to: arranging the position parameter sequences corresponding to the next-level nodes according to the sequence of the first item values in the position parameter sequences from large to small, and simultaneously modifying the step S403 as follows: setting the last position parameter in the last position parameter sequence as a reference position parameter, and modifying the step S407 to: the last position parameter in each position parameter sequence is extracted and combined into a candidate operation array, and the modification is equivalent to the above-mentioned scheme without departing from the scope of the embodiments of the present application.
By the technical scheme, the binary groups corresponding to all the nodes can be sequentially determined from the leaf nodes of the logic tree according to the sequence from the lower-level node to the upper-level node, and the binary group corresponding to the root node is finally obtained.
In many cases, if the distance between the appearance positions of the keywords is closer to the number of words, the higher the association between the meaning to be expressed by the text in which the part of the keywords appear and the semantics of the keywords themselves is; in general, for text contents with a large number of words, when a keyword appears at an important position such as the head of the text content, the association between the overall semantic meaning of the text content and the keyword is high.
Optionally, the relevance parameter of the search result text and the search content is determined specifically by the following formula:
Figure BDA0003541160030000202
wherein, the Score is a correlation parameter between the search result text and the search content, N is a total number of the binary groups corresponding to the root node, i is a serial number, oiFor the position parameter in the i-th tuple corresponding to the root node, diAnd the distance parameter is the distance parameter in the ith binary group corresponding to the root node.
Optionally, the search result text is a text obtained by searching and matching in a specified search range by using an inverted index algorithm by using the keywords of the search content.
Optionally, the occurrence positions of the keywords corresponding to the leaf nodes in the search result text are determined by an inverted index list of the search result text, where the inverted index list is obtained by recording the occurrence positions of the words in the search result text in advance.
Based on the same inventive concept, an embodiment of the present application further provides an apparatus for ranking search results of search content, as shown in fig. 6, including:
a search content analysis module M110, configured to establish a logic tree corresponding to the search content according to a logic expression corresponding to the search content, where nodes of the logic tree include leaf nodes and non-leaf nodes, the leaf nodes are the keywords, the non-leaf nodes are logic operators in the logic expression, and the nodes are connected according to a logic relationship of the logic expression;
a correlation calculation module M120, configured to determine, for any search result text, a correlation parameter between the search result text and the search content according to the binary group corresponding to the root node of the logical tree; the search result text is a text which is searched and matched in a specified search range by utilizing the keywords of the search content; wherein each node in the logical tree corresponds to at least one binary group, each binary group comprising a position parameter related to the occurrence position of the keyword corresponding to the node in the search result text and a distance parameter of the keyword corresponding to the node in the search result text;
and the sorting module M130 is configured to sort each search result text according to the relevance parameter corresponding to each search result text.
Optionally, the position parameter in the binary group corresponding to each node is determined by:
determining the appearance positions of the keywords corresponding to the leaf nodes in the search result text for any leaf node in the logic tree, and taking each appearance position as the position parameter of a binary group corresponding to the leaf node;
for any non-leaf node in the logic tree, determining a plurality of next-level nodes corresponding to the non-leaf node, and determining the position parameter of at least one binary group corresponding to the non-leaf node according to the position parameter of each binary group corresponding to each next-level node;
determining a distance parameter in the duplet corresponding to each node by:
for any leaf node in the logic tree, taking a preset invalid value as a distance parameter of all binary groups corresponding to the leaf node;
and determining a plurality of next-level nodes corresponding to the non-leaf nodes for any non-leaf node in the logic tree, and determining the distance parameter of at least one binary group corresponding to the non-leaf node according to each binary group corresponding to each next-level node.
Optionally, if the logical operator corresponding to the non-leaf node is an or operator, the binary group corresponding to the non-leaf node is all binary groups corresponding to all next-level nodes corresponding to the non-leaf node.
Optionally, if the logical operator corresponding to the non-leaf node is an and operator, determining, for any non-leaf node in the logical tree, a plurality of next-level nodes corresponding to the non-leaf node, and determining, according to each binary group corresponding to each next-level node, a distance parameter of at least one binary group corresponding to the non-leaf node, including:
if the next-level node corresponding to the non-leaf node only comprises leaf nodes, determining a distance parameter of at least one binary group corresponding to the non-leaf node according to the position parameter of each binary group corresponding to each next-level leaf node;
if the next-level node corresponding to the non-leaf node only comprises a non-leaf node, determining the distance parameter of at least one binary group corresponding to the non-leaf node according to the distance parameter of each binary group corresponding to each next-level non-leaf node;
and if the next-level node corresponding to the non-leaf node comprises a non-leaf node and a leaf node, determining the distance parameter of at least one binary group corresponding to the non-leaf node together according to the position parameter of each binary group corresponding to each next-level leaf node and the distance parameter of each binary group corresponding to each next-level non-leaf node.
Optionally, if the logical operator corresponding to the non-leaf node is an and operator, determining a position parameter and a distance parameter in a binary group corresponding to the non-leaf node by:
for any next-level node corresponding to the non-leaf node, arranging the position parameters of the binary groups corresponding to the next-level node in a sequence from small to large to obtain a position parameter sequence;
arranging the position parameter sequences corresponding to the next-level nodes according to the sequence of the first item values in the position parameter sequences from small to large;
starting with the first position parameter in the last sequence of position parameters set as a reference position parameter, the following steps are performed:
selecting a position parameter with the minimum difference value with the reference position parameter in the last position parameter sequence, and deleting all position parameters which are smaller than the selected position parameter in the last position parameter sequence; wherein the last position parameter sequence is a position parameter sequence one before the position parameter sequence in which the reference position parameter is located in the position parameter sequence arrangement; then updating the determined position parameter to the reference position parameter, and returning to the step of selecting the position parameter with the minimum difference value with the reference position parameter in the current position parameter sequence in the last position parameter sequence until the selected position parameter is the position parameter of the first position parameter sequence;
when the selected position parameter is the position parameter of the first position parameter sequence, taking out the first position parameter in each position parameter sequence as a candidate operation array, and if all the position parameter sequences have the position parameter at the moment, returning to the step of arranging the position parameter sequences corresponding to the next-stage nodes in the order of the first item values in the position parameter sequences from small to large; if no position parameter exists in the position parameter sequence at this moment, for any candidate operation array, determining the position parameter and the distance parameter in a binary group corresponding to the non-leaf node according to the candidate operation array and the distance parameter corresponding to each position parameter in the candidate operation array.
Optionally, the position parameter and the distance parameter in a binary group corresponding to the non-leaf node are determined according to the candidate operation array in the following manner:
taking an integer according to the average value of the position parameters in the candidate operational array, and taking the integer as the position parameter in a binary group corresponding to the non-leaf node;
calculating the distance parameter in the binary group according to the following formula:
Figure BDA0003541160030000231
distance is a distance parameter in the binary group, j and k are sequence numbers, AjThe j-th adjacent term difference value after the position parameters in the candidate operation array are arranged according to the size sequence, BkAnd the distance parameter is a distance parameter of a binary group corresponding to the kth next-level non-leaf node, p is the total number of the adjacent item difference values, and q is the total number of the binary group corresponding to the next-level non-leaf node.
Optionally, if there are a plurality of position parameters in the last position parameter sequence that have the smallest difference from the reference position parameter, selecting a position parameter in the last position parameter sequence that has the smallest difference from the reference position parameter, includes:
and selecting the position parameter with the minimum value from the position parameters with the minimum difference value with the reference position parameter in the last position parameter sequence.
Optionally, the relevance parameter of the search result text and the search content is determined specifically by the following formula:
Figure BDA0003541160030000241
wherein, the Score is a correlation parameter between the search result text and the search content, N is a total number of the binary groups corresponding to the root node, i is a serial number, oiFor the position parameter in the i-th tuple corresponding to the root node, diAnd the distance parameter is the distance parameter in the ith binary group corresponding to the root node.
Optionally, the search result text is a text obtained by searching and matching in a specified search range by using an inverted index algorithm by using the keywords of the search content.
Optionally, the occurrence positions of the keywords corresponding to the leaf nodes in the search result text are determined by an inverted index list of the search result text, where the inverted index list is obtained by recording the occurrence positions of the words in the search result text in advance.
It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation. In addition, each module in the embodiments of the present application may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a form of hardware or a form of a software functional unit.
Since the specific manner in which each module of the apparatus for sorting the search results of the search content performs operations has been described in detail in the embodiment related to the method, it is not described herein again.
Based on the same inventive concept, an embodiment of the present application further provides an electronic device, as shown in fig. 7, including:
a memory 101 for storing program instructions;
the processor 102 is configured to invoke the program instruction stored in the memory 101, and execute the steps included in the method for ranking the search results of the search content according to the obtained program instruction.
Based on the same inventive concept, an embodiment of the present application provides a computer-readable storage medium, and a computer program product includes: computer program code which, when run on a computer, causes the computer to perform any of the methods of ranking search results of searching content as discussed above.
Since the principle of solving the problem of the computer-readable storage medium is similar to the method for sorting the search results of the search content, the implementation of the computer-readable storage medium may refer to the implementation of the method, and repeated details are not repeated.
Based on the same inventive concept, the embodiment of the present application further provides a computer program product, where the computer program product includes: computer program code which, when run on a computer, causes the computer to perform any of the methods of ranking search results of searching content as discussed above.
Because the principle of solving the problem of the computer program product is similar to the method for sorting the search results of the search content, the implementation of the computer program product can refer to the implementation of the method, and repeated details are not repeated.
According to the method for sequencing the search results of the search content, the influence of the occurrence positions of the keywords in the search result text on the overall semantics of the search result text is considered, the logic tree is established for the search content, the approximate positions of the semantics of all logic branches in the search content in the search result text are determined according to the occurrence positions of the keywords in the search result text, therefore, the correlation between the search result text and the search content is analyzed according to all positions and logic relations, and the difference between the semantics of the search result text and the search content can be better reflected.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (14)

1. A method of ranking search results of searching content, comprising:
establishing a logic tree corresponding to the search content according to the logic expression corresponding to the search content; the nodes of the logic tree comprise leaf nodes and non-leaf nodes, the leaf nodes are the keywords, the non-leaf nodes are logic operators in the logic expression, and the nodes are connected according to the logic relation of the logic expression;
for any search result text, determining a correlation parameter between the search result text and the search content according to the binary group corresponding to the root node of the logic tree; the search result text is a text which is searched and matched in a specified search range by utilizing the keywords of the search content; wherein each node in the logical tree corresponds to at least one binary group, each binary group comprising a position parameter related to the occurrence position of the keyword corresponding to the node in the search result text and a distance parameter of the keyword corresponding to the node in the search result text;
and sequencing each search result text according to the corresponding relevance parameter of each search result text.
2. The method of claim 1, wherein the location parameter in the tuple corresponding to each node is determined by:
determining the appearance positions of the keywords corresponding to the leaf nodes in the search result text for any leaf node in the logic tree, and taking each appearance position as the position parameter of a binary group corresponding to the leaf node;
for any non-leaf node in the logic tree, determining a plurality of next-level nodes corresponding to the non-leaf node, and determining the position parameter of at least one binary group corresponding to the non-leaf node according to the position parameter of each binary group corresponding to each next-level node;
determining a distance parameter in the duplet corresponding to each node by:
for any leaf node in the logic tree, taking a preset invalid value as a distance parameter of all binary groups corresponding to the leaf node;
and determining a plurality of next-level nodes corresponding to the non-leaf nodes for any non-leaf node in the logic tree, and determining the distance parameter of at least one binary group corresponding to the non-leaf node according to each binary group corresponding to each next-level node.
3. The method of claim 2, wherein if the logical operator corresponding to the non-leaf node is an or operator, the duplets corresponding to the non-leaf node are all duplets corresponding to all next-level nodes corresponding to the non-leaf node.
4. The method of claim 2, wherein if the logical operator corresponding to the non-leaf node is an and operator, determining a plurality of next-level nodes corresponding to the non-leaf node for any non-leaf node in the logical tree, and determining a distance parameter of at least one duplet corresponding to the non-leaf node according to each duplet corresponding to each next-level node, comprises:
if the next-level node corresponding to the non-leaf node only comprises leaf nodes, determining a distance parameter of at least one binary group corresponding to the non-leaf node according to the position parameter of each binary group corresponding to each next-level leaf node;
if the next-level node corresponding to the non-leaf node only comprises a non-leaf node, determining the distance parameter of at least one binary group corresponding to the non-leaf node according to the distance parameter of each binary group corresponding to each next-level non-leaf node;
and if the next-level node corresponding to the non-leaf node comprises a non-leaf node and a leaf node, determining the distance parameter of at least one binary group corresponding to the non-leaf node together according to the position parameter of each binary group corresponding to each next-level leaf node and the distance parameter of each binary group corresponding to each next-level non-leaf node.
5. The method of claim 4, wherein if the logical operator corresponding to the non-leaf child node is an AND operator, determining the position parameter and the distance parameter in the duplet corresponding to the non-leaf child node by:
for any next-level node corresponding to the non-leaf node, arranging the position parameters of the binary groups corresponding to the next-level node in a descending order to obtain a position parameter sequence;
arranging the position parameter sequences corresponding to the next-level nodes according to the sequence of the first item values in the position parameter sequences from small to large;
starting with the first position parameter in the last sequence of position parameters set as a reference position parameter, the following steps are performed:
selecting a position parameter with the minimum difference value with the reference position parameter in the last position parameter sequence, and deleting all position parameters which are smaller than the selected position parameter in the last position parameter sequence; wherein the last position parameter sequence is a position parameter sequence one before the position parameter sequence in which the reference position parameter is located in the position parameter sequence arrangement; then updating the determined position parameter to the reference position parameter, and returning to the step of selecting the position parameter with the minimum difference value with the reference position parameter in the current position parameter sequence in the last position parameter sequence until the selected position parameter is the position parameter of the first position parameter sequence;
when the selected position parameter is the position parameter of the first position parameter sequence, taking out the first position parameter in each position parameter sequence as a candidate operation array, and if all the position parameter sequences have the position parameter at the moment, returning to the step of arranging the position parameter sequences corresponding to the next-stage nodes in the order of the first item values in the position parameter sequences from small to large; if no position parameter exists in the position parameter sequence at this moment, for any candidate operation array, determining the position parameter and the distance parameter in a binary group corresponding to the non-leaf node according to the candidate operation array and the distance parameter corresponding to each position parameter in the candidate operation array.
6. The method of claim 5, wherein the location parameter and the distance parameter in the one tuple corresponding to the non-leaf node is determined from the array of candidate operations by:
taking an integer according to the average value of the position parameters in the candidate operational array, and taking the integer as the position parameter in a binary group corresponding to the non-leaf node;
calculating the distance parameter in the binary group according to the following formula:
Figure FDA0003541160020000031
distance is a distance parameter in the binary group, j and k are serial numbers, AjThe j-th adjacent term difference value after the position parameters in the candidate operation array are arranged according to the size sequence, BkAnd the distance parameter is a distance parameter of a binary group corresponding to the kth next-level non-leaf node, p is the total number of the adjacent item difference values, and q is the total number of the binary group corresponding to the next-level non-leaf node.
7. The method of claim 5, wherein if there are a plurality of position parameters in the previous position parameter sequence that have the smallest difference from the reference position parameter, selecting a position parameter in the previous position parameter sequence that has the smallest difference from the reference position parameter comprises:
and selecting the position parameter with the minimum value from the position parameters with the minimum difference value with the reference position parameter in the last position parameter sequence.
8. The method of claim 1, wherein the relevance parameter of the search result text to the search content is determined in particular by the formula:
Figure FDA0003541160020000041
wherein, the Score is a correlation parameter between the search result text and the search content, N is a total number of the binary groups corresponding to the root node, i is a serial number, oiFor the position parameter in the i-th tuple corresponding to the root node, diAnd the distance parameter is the distance parameter in the ith binary group corresponding to the root node.
9. The method according to claim 1 or 2, wherein the search result text is a text which is searched and matched by using the keywords of the search content in a specified search range by adopting an inverted index algorithm.
10. The method of claim 9, wherein the occurrence positions of the keywords corresponding to the leaf nodes in the search result text are determined by an inverted index list of the search result text, and the inverted index list is obtained by recording the occurrence positions of the words in the search result text in advance.
11. An apparatus for ranking search results of searching content, comprising:
the search content analysis module is used for establishing a logic tree corresponding to the search content according to a logic expression corresponding to the search content, wherein nodes of the logic tree comprise leaf nodes and non-leaf nodes, the leaf nodes are the keywords, the non-leaf nodes are logic operators in the logic expression, and the nodes are connected according to the logic relationship of the logic expression;
the correlation calculation module is used for determining the correlation parameters of the search result text and the search contents according to the binary group corresponding to the root node of the logic tree aiming at any search result text; the search result text is a text which is searched and matched in a specified search range by utilizing the keywords of the search content; wherein each node in the logical tree corresponds to at least one binary group, each binary group comprising a position parameter related to the occurrence position of the keyword corresponding to the node in the search result text and a distance parameter of the keyword corresponding to the node in the search result text;
and the ranking module is used for ranking each search result text according to the corresponding relevance parameter of each search result text.
12. An electronic device, comprising:
a memory for storing program instructions;
a processor for invoking program instructions stored in said memory to execute the steps of the method of ranking search results for search content of any of claims 1-10 in accordance with the obtained program instructions.
13. A computer-readable storage medium, characterized in that it stores a computer program comprising program instructions which, when executed by a computer, cause the computer to carry out the method of ranking search results of a search for content according to any of claims 1-10.
14. A computer program product, the computer program product comprising: computer program code for causing a computer to perform a method of ranking search results of searching content as claimed in any of claims 1-10 when said computer program code is run on a computer.
CN202210233404.XA 2022-03-10 2022-03-10 Method for sorting search results of search content Pending CN114595319A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210233404.XA CN114595319A (en) 2022-03-10 2022-03-10 Method for sorting search results of search content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210233404.XA CN114595319A (en) 2022-03-10 2022-03-10 Method for sorting search results of search content

Publications (1)

Publication Number Publication Date
CN114595319A true CN114595319A (en) 2022-06-07

Family

ID=81809238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210233404.XA Pending CN114595319A (en) 2022-03-10 2022-03-10 Method for sorting search results of search content

Country Status (1)

Country Link
CN (1) CN114595319A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115599886A (en) * 2022-10-24 2023-01-13 广州广电运通信息科技有限公司(Cn) Method and equipment for generating search logic operator for Lucene and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115599886A (en) * 2022-10-24 2023-01-13 广州广电运通信息科技有限公司(Cn) Method and equipment for generating search logic operator for Lucene and storage medium

Similar Documents

Publication Publication Date Title
CN1552032B (en) Database
US8341159B2 (en) Creating taxonomies and training data for document categorization
KR100451978B1 (en) A method of retrieving data and a data retrieving apparatus
CN103198079B (en) The implementation method of relevant search and device
WO2017092622A1 (en) Legal provision search method and device
CN106570128A (en) Mining algorithm based on association rule analysis
KR20190038243A (en) System and method for retrieving documents using context
CN105659225A (en) Query expansion and query-document matching using path-constrained random walks
KR20080031262A (en) Relationship networks
CN107291895B (en) Quick hierarchical document query method
CN106469097B (en) A kind of method and apparatus for recalling error correction candidate based on artificial intelligence
CN106909669A (en) The detection method and device of a kind of promotion message
CN105787126A (en) K-d (k-dimensional) tree generation method and k-d tree generation device
CN107239549A (en) Method, device and the terminal of database terminology retrieval
CN114595319A (en) Method for sorting search results of search content
JP5373998B1 (en) Dictionary generating apparatus, method, and program
CN103034709B (en) Retrieving result reordering system and method
JPH021059A (en) Associative retrieving system
CN112199461B (en) Document retrieval method, device, medium and equipment based on block index structure
JP3370787B2 (en) Character array search method
CN105426490A (en) Tree structure based indexing method
CN111666420B (en) Method for intensively extracting experts based on subject knowledge graph
CN110866088B (en) Method and system for fast full-text retrieval between corpora
CN104572868A (en) Method and device for information matching based on questioning and answering system
CN114328823A (en) Database natural language query method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination