CN114840754A

CN114840754A - Searching method, searching device, electronic equipment and readable storage medium

Info

Publication number: CN114840754A
Application number: CN202210480051.3A
Authority: CN
Inventors: 张正楠
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2022-05-05
Filing date: 2022-05-05
Publication date: 2022-08-02

Abstract

The application discloses a searching method, a searching device, electronic equipment and a readable storage medium, and belongs to the technical field of artificial intelligence. Wherein the method comprises the following steps: according to target information, acquiring N1 first search results matched with the target information, wherein N1 is a positive integer; determining N2 fragments corresponding to the target information according to the content of the N1 first search results, wherein N2 is a positive integer, and N1 ≧ N2; arranging and combining the N2 fragments according to N3 arrangement modes to obtain N3 corresponding files, wherein N3 is a positive integer; and determining at least one file as a second search result, and outputting the second search result.

Description

Searching method, searching device, electronic equipment and readable storage medium

Technical Field

The application belongs to the technical field of artificial intelligence, and particularly relates to a searching method, a searching device, electronic equipment and a readable storage medium.

Background

With the rapid development of the internet, information on the network is rapidly increasing, and people gradually rely on searching on the network to rapidly acquire required information, so that the acquired information is accurate and rich.

In the prior art, a search method mainly depends on semantic understanding of keywords to match relevant webpages for a user, and displays the webpages in a sequence from high to low according to the correlation degree between each webpage and the keywords. Generally, links of various web pages, a simple sentence, a few words and the like in the web page are displayed to the user, and the user needs to click on the links of the web pages to view the content of the web pages when viewing the search results. Therefore, in order to find a satisfactory search answer, the user needs to click the web page link many times and then search for the required information in the web page.

Therefore, in the prior art, the user operation is complicated because the user repeatedly clicks the web page link for searching the search answer.

Disclosure of Invention

The embodiment of the application aims to provide a searching method, which can solve the problem that in the prior art, the operation of a user is complicated because the user repeatedly clicks a webpage link to search a search answer.

In a first aspect, an embodiment of the present application provides a search method, where the method includes: according to target information, acquiring N1 first search results matched with the target information, wherein N1 is a positive integer; determining N2 fragments corresponding to the target information according to the content of the N1 first search results, wherein N2 is a positive integer, and N1 ≧ N2; arranging and combining the N2 fragments according to N3 arrangement modes to obtain N3 corresponding files, wherein N3 is a positive integer; and determining at least one file as a second search result, and outputting the second search result.

In a second aspect, an embodiment of the present application provides a search apparatus, including: the acquisition module is used for acquiring N1 first search results matched with target information according to the target information, wherein N1 is a positive integer; a first determining module, configured to determine, according to the content of the N1 pieces of first search results, N2 segments corresponding to the target information, where N2 is a positive integer, and N1 ≧ N2; the arrangement module is used for arranging and combining the N2 fragments according to N3 arrangement modes to obtain N3 corresponding files, wherein N3 is a positive integer; and the second determining module is used for determining at least one file as a second search result and outputting the second search result.

In a third aspect, embodiments of the present application provide an electronic device, which includes a processor and a memory, where the memory stores a program or instructions executable on the processor, and the program or instructions, when executed by the processor, implement the steps of the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.

In a sixth aspect, embodiments of the present application provide a computer program product, stored on a storage medium, for execution by at least one processor to implement the method according to the first aspect.

Thus, in the embodiment of the present application, based on the target information input by the user, first, N1 pieces of first search results related to the target information are matched, then, the contents of N1 pieces of first search results are integrated into N2 pieces, and further, N2 pieces are arranged and combined. Wherein, based on the N3 arrangements, corresponding N3 files can be obtained, and thus, of the N3 files, at least one file is determined as the finally output second search result. Therefore, based on the embodiment of the application, content screening can be performed on a plurality of search results to obtain associated content, irrelevant content is removed, the obtained segment-type content is integrated to generate a file to be directly presented to a user, so that the user can be ensured to visually check relatively accurate search results, and meanwhile, the user is prevented from repeatedly clicking webpage links, and further, the user operation is simplified.

Drawings

FIG. 1 is a flow chart of a search method of an embodiment of the present application;

FIG. 2 is a probability curve distribution graph of an embodiment of the present application;

FIG. 3 is a display schematic diagram of an electronic device of an embodiment of the application;

FIG. 4 is a block diagram of a search apparatus according to an embodiment of the present application;

fig. 5 is one of the hardware configuration diagrams of the electronic device according to the embodiment of the present application;

fig. 6 is a second schematic diagram of a hardware structure of the electronic device according to the embodiment of the present application.

Detailed Description

The technical solutions of the embodiments of the present application will be described below clearly with reference to the drawings of the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments that can be derived from the embodiments of the present application by one of ordinary skill in the art are intended to be within the scope of the present application.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

The searching method provided by the embodiment of the present application is described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

Fig. 1 shows a flowchart of a search method according to an embodiment of the present application, which is applied to an electronic device, and includes:

step 110: and acquiring N1 first search results matched with the target information according to the target information, wherein N1 is a positive integer.

Before this step, the present application may further include the steps of: and receiving the input of the target information by the user.

The application scenario is that the user inputs target information in the search box and clicks the "search" control.

Alternatively, the target information includes a form of text, image, or the like.

Thus, in this step, semantic extraction is performed on the target information, and the top N1 first search results with the highest relevance are searched according to the semantic matching degree.

Optionally, the first search result includes a plurality of forms of pages, links, segments, and the like.

Step 120: and determining N2 fragments corresponding to the target information according to the content of N1 pieces of first search results, wherein N2 is a positive integer, and N1 is ≧ N2.

In this step, N1 first search results are converted into N2 segments for subsequent integration of N2 segments to generate one search result.

Step 130: and (3) arranging and combining the N2 fragments according to N3 arrangement modes to obtain N3 corresponding files, wherein N3 is a positive integer.

In this step, N2 fragments are arranged and combined to generate a corresponding file.

Optionally, N3 arrangements are provided, in one arrangement, N2 fragments are arranged and combined in sequence, so that one file can be generated, and correspondingly, N3 corresponding files can be finally generated.

Alternatively, the document may include a plurality of forms such as pages, pictures, and the like.

Step 140: and determining at least one file as a second search result, and outputting the second search result.

And determining at least one file matched with the target information from the N3 files, and presenting the file to the user as a finally output second search result.

In the flow of the search method according to another embodiment of the present application, step 120 includes:

substep A1: and extracting target content matched with the target information from the content corresponding to the first search result, wherein the target content is used for generating a fragment, and N1 fragments are obtained.

In this step, a Bidirectional Encoder Representation (BERT) model based on a Transformer is used to determine a start position and an end position in the content corresponding to the first search result.

Illustratively, the first search result is a web page link, and the content corresponding to the first search result is web page content. Generally, a web page contains a large amount of content, and there may be some advertisement content and some banner content, and even text content, content related to target information may appear only in a part of paragraphs.

Therefore, in this step, based on the first search result obtained by the preliminary search, a start position and an end position are determined in the content corresponding to the first search result according to the target information, so that the content between the start position and the end position is extracted as the target content, and a segment corresponding to the first search result is generated.

Correspondingly, N1 segments can be extracted from N1 first search results.

Optionally, the extracted segment may retain the original format for extracting the feature information of the segment.

Substep a 2: and in the case that the similarity between the at least two segments is greater than a first threshold, reserving one of the at least two segments according to the characteristic information of the at least two segments to obtain N2 segments.

Wherein the feature information of the at least two segments comprises: semantic matching information, source page address information and author information.

In this step, N1 fragments are deduplicated.

Illustratively, similarity among the fragments is calculated, and for a fragment set with the similarity larger than 90%, semantic matching degree r and first score count of the webpage address of each fragment in the set are calculated respectively _url Second score count of author _author And obtaining the A value of each segment, and reserving the segment corresponding to the maximum A value. Wherein, the corresponding calculation mode of the A value is as follows: r (count) _url +count _author +1))。

For example, the snippet may be from a web page, a first score for an address of the web page may be based on a number of times the address was crawled and a second score for the author may be based on a number of times the author was referenced.

The semantic matching information comprises semantic matching degree, the source page address information comprises a first score of a webpage address, and the author information comprises a second score of an author.

For example, for a segment one, a segment two, and a segment three, the segment one is compared with the segment two and the segment three, respectively, and if the similarity between the segment one and the segment two is greater than 90% and the similarity between the segment one and the segment three is also greater than 90%, the segment one, the segment two, and the segment three may form a set with a similarity greater than 90%, so that in the set, the a values of the segments are calculated, respectively, and the segment corresponding to the maximum value is retained.

Illustratively, in this step, the resulting set of fragments is (a) ₁ 、a ₂ …a _N2 )。

In this embodiment, the content corresponding to the first search result includes a large amount of redundant content, and therefore, the content corresponding to the first search result is extracted first; then, deduplication is performed for the extracted content. Therefore, based on the embodiment, the user can be prevented from spending time to check the repeated content and the irrelevant content, and the user operation is simplified.

In the flow of the search method according to another embodiment of the present application, step 130 includes:

substep B1: n2 fragments were aligned in the first position, respectively, resulting in N2 first positions.

Substep B2: based on the N2 first positions, matching values of the segments other than the segment at the first position in the N2 segments arranged at the second position are respectively obtained, and N2 (N2-1) matching values are obtained.

Substep B3: of the N2 (N2-1) matching values, the arrangement corresponding to the first largest N3 matching values is retained.

Substep B4: in any arrangement mode of the N3 arrangement modes, the matching values of the other segments except the segment at the first M positions in the N2 segments arranged at the M +1 th position are obtained, and N3 (N2-M) matching values are obtained.

Sub-step B5: the arrangement mode corresponding to the first N3 maximum matching values in the N3 × N2-M matching values is reserved.

Wherein M is a positive integer, and in the repeated steps, M is equal to 2 and 3 … … (N2-1) in sequence.

In the present embodiment, N2 fragments are arranged and combined according to the sequence relationship and causal relationship between N2 fragments.

In one aspect, the sequence and causal relationships between the various fragments are represented in "probability".

Here, "probability" means that, after the position of one piece is determined, the probability that another piece is arranged at the next position is calculated.

Illustratively, first, segment a is divided _i Arranged at a certain position, calculating a segment a _j Probability p (a) of being ranked at the next position _j |a _i )。

The formula I is as follows: p (a) _j |a _i )＝p _{Reference relationships} (a _j |a _i )+p _{Long browsing relationships} (a _j |a _i )+Σ _c∈C (a _j |c)*r _ac 。

If fragment a _i And fragment a _j There are reference and referenced relationships between, e.g. a _i Reference to a _j Then a is _i Is a _j The subsequent relation of (1); on the contrary, a _i Quilt a _j Quote, then a _i Is a _j The preamble relation of (a) is,therefore, p in the formula I can be obtained according to the reference relation and the referenced relation between the two parties _{Reference relationships} (a _j |a _i ). Correspondingly, when the fragment a _i And fragment a _j When there are reference relationship and referenced relationship between them, the relationship between them is "strong relationship", p _{Reference relationships} (a _j |a _i ) The magnitude of (2) is used to indicate the strength of the strong relationship.

For example, in segment a _i In (1), a hyperlink for opening the fragment a is provided _j Thus, in segment a _i In (1), the fragment a is referred to by means of hyperlink _j Visible, fragment a _i And fragment a _j There are a reference relationship and a referenced relationship between them, and p can be obtained based on the relationship between them _{Reference relationships} (a _j |a _i )。

In addition, there is also a "weak relationship" from fragment to fragment.

The first "weak relationship" is: according to statistical slave fragment a _i Jump to segment a _j The later browsing duration is obtained as p _{Long browsing relationships} (a _j |a _i ) Wherein the longer the browsing time, the stronger the weak relationship, i.e. p _{Long browsing relationships} (a _j |a _i ) The larger.

The second "weak relationship" is: according to the slave fragment a _i The related page set C jumps to the segment a _j To obtain sigma _c∈C (a _j |c)*r _ac ，r _ac For showing a _j Semantic matching with page c.

Wherein, a _i The related page set C is composed of _i Jump to a _j Of the intermediate page of (1).

On the other hand, the sequence relation and the causal relation between the segments are also reflected on the characteristic information of the segments.

The feature information of a segment is feature information of another segment that may be arranged at a next position after the position of one segment is determined.

Examples of the inventionCharacteristically, fragment a _j Contribution of (a) to _j ) F (feature information), the feature information includes whether there is a bold, whether there is a highlight, the number of times of text reference/reference, the degree of search word matching, the user average stay time, the number of times the segment is copied/selected under the relevant search word, and the like. Further, for w (a) _j ) Min-max normalization is performed on f (feature information) to obtain formula two:

in equation two, W is used to represent the contribution set of N2 segments.

Therefore, combining the above two aspects, the formula one is multiplied by the formula two to obtain a matching value, i.e. formula three:

and calculating the matching values of the segments except the segment arranged at the next position under the condition that the segment is arranged at the position according to the formula III, and then determining the top N3 matching values as the partial sequence in the N3 arrangement modes from all the matching values. And so on until the sorting of N2 fragments in N3 arrangements is completed.

Wherein, the other segments except the above segments do not include the segments arranged in the current permutation combination.

For the segment arranged at the first position, there may be any one of the segments, i.e., each segment is arranged at the first position with a probability of 1/N2.

Illustratively, assume there is a ₁ 、a ₂ 、a ₃ 、a ₄ Four fragments, wherein N2-4 and N3-6. Firstly, respectively arranging the four segments at the first position, sequentially calculating the matching values of the segments arranged at the second position to obtain twelve matching values, namely 4 x 3, taking the maximum six matching values, and reserving the permutation and combination of the six matching values. For the remaining six groups of permutation combinations, the first position and the second position are determined, then, in each group of permutation combinations, the matching value of the permutation of the segment which is not permuted in the group at the third position is calculated in turn, twelve matching values, namely 6 x (4-2), are obtained, then the maximum six matching values are taken, and the permutation combination where the six matching values are located is reserved. For the remaining six sets of permutation combinations, the first position, the second position and the third position have been determined, and then, in each set of permutation combinations, the matching value of the permutation of the segment not permuted in the set at the fourth position is sequentially calculated, so that six matching values, i.e., 6 × 4-3, can be obtained.

Wherein, step B4 and step B5 are repeated two steps, that is, one step B4 and one step B5 are used to complete the sorting of one position except the first position and the second position, and thus, M is equal to 2 and 3 … … in turn (N2-1). When M is N2-1, it is actually the segment at the last position of the permutation, in this case, only one position in each permutation combination is left, corresponding to one segment, and the permutation can be performed directly without calculating the matching value. In the present embodiment, the case of M — N2-1 is defined for the purpose of fully explaining the arrangement process of N2 fragments.

Correspondingly, based on the foregoing example, in each set of permutation combinations, the unarranged segments in the set are directly arranged at the fourth position.

In this embodiment, a method for arranging N2 segments according to the order and the causal relationship among the segments is provided, so that the file content formed based on the arrangement and combination of N2 segments is relatively coherent, smooth and fluent; meanwhile, the embodiment provides a plurality of files, so that a better result can be selected from the plurality of files.

In more embodiments, in addition to considering the sequence relationship and the causal relationship between the fragments, the fragments may be further processed with word granularity, which ensures that the finally generated file is more coherent.

In the flow of the search method according to another embodiment of the present application, before step 140, the method further includes:

step C1: in any file, in the case where a first segment is arranged at a first position, if the probability that a second segment is arranged at a second position satisfies a preset condition, the second segment and the segments arranged thereafter are deleted, the second position is the next position to the first position, and the N2 segments include the first segment and the second segment.

Wherein the preset condition comprises at least one of the following conditions:

in the case that the first segment is arranged at the first position, the probability that the second segment is arranged at the second position is less than a second threshold;

in the case where the first segment is arranged at the first position, the amplitude of fluctuation of the probability of the second segment being arranged at the second position is larger than the third threshold.

In this embodiment, since each segment in the file is from a different first search result, there may be a case where there is inconsistency between segments, such as semantic inconsistency and repeated expression meaning, after a new file is generated. In order to further improve the file, the content length of the file can be cut to cut discontinuous parts such as semantic incoherence, so that the file with the optimal length is obtained.

Alternatively, some disjointed fragments are cut by taking the coherence of the file content as a starting point, and the probability of the subsequent fragments being arranged after the preceding fragments can embody the coherence between the fragments, so that the disjointed places can be checked in the file based on the probability of the subsequent fragments being arranged after the preceding fragments.

Wherein, taking any file as an example, in the file, the fragment set included is (c) ₁ 、c ₂ …c _N2 ). The probability p (c) that a subsequent fragment is arranged after a preceding fragment can be obtained with reference to the foregoing embodiment _i+1 |c _i ). Thus, according to all p (c) _i+1 |c _i ) A graph of the probability distribution is plotted (see fig. 2). In fig. 2, the abscissa is used to represent N2 positions arranged in order, and the ordinate is used to represent the probability that a segment is arranged at that position.

Correspondingly, the present embodiment provides a clipping manner, that is, finding an abnormal point in the curve shown in fig. 2, taking the abnormal point as a clipping point, and clipping all segments at the position and after the position. When there are a plurality of outliers, the first outlier may be the standard.

Optionally, the determining manner of the abnormal point includes at least one of the following:

taking the position with the probability smaller than a second threshold value as an abnormal point; taking the position where the probability fluctuation amplitude is larger than the third threshold as an abnormal point, the probability fluctuation amplitude can be determined by calculating the variance, such as finding out the point deviating from the mean value by minus three standard deviations based on 3 σ law.

In this embodiment, a probability curve distribution diagram is obtained based on the probability of arrangement of each segment at the corresponding position, so that the abnormal points in the diagram are regarded as the positions where the segments are not spliced together, and the file content is deleted based on the positions where the segments are not spliced together, so that the segments which are finally retained are connected together.

In further embodiments, the clipping manner may also be: only a single fragment not consistent with the above is discarded, and the remaining fragments can be spliced with other fragments.

In the flow of the search method according to another embodiment of the present application, step 140 includes:

substep D1: and determining at least one file as a second search result according to the characteristic information of the N3 files.

Wherein, the characteristic information of the file comprises: the method comprises the following steps of obtaining relevant information between two adjacent fragments in a file, feature information of each fragment in the file, and matching information between each fragment in the file and target information.

In this embodiment, a better answer needs to be found out of the N3 files for presentation to the user.

In the present embodiment, a method for determining a second search result is provided.

By reference, using equation four:

v＝b ₁ *v ₁ +b ₂ *v ₂ +b ₃ *v ₃ +b ₄ *v ₄ +b ₅ *v ₅ +b ₆ *v ₆ +b ₇ *v ₇ 。

wherein v is ₁ And is used to show the area of the graph formed by the curve and the coordinate axis under the dotted line box in fig. 2. The curve below the dashed square is used to indicate the segment remaining after clipping. v. of ₁ The larger the size, the longer the length of the content of the presentation file, the higher the fluency.

v ₂ For the difference between the maximum and minimum probabilities in the file. v. of ₂ Reflects the fluency of the whole file content, v ₂ The smaller, the higher the fluency.

v ₃ For representing the maximum probability in the file.

v ₄ For representing the minimum probability in the file. v. of ₄ Too small indicates a poor flow condition.

v ₅ The semantic matching degree average value is used for representing the semantic matching degree average value of each segment in the file and the target information.

v ₆ An average of a first score representing page addresses of segments in the file.

v ₇ A mean of the second scores representing the authors of the segments in the document.

b ₁ ～b ₇ Is a coefficient, b ₂ 、b ₃ 、b ₄ Is a negative number.

For the explanation of the probability in this embodiment, reference may be made to the foregoing embodiments. A first score for a page address may be based on a number of times the crawl of the address is linked and a second score for an author may be based on a number of times the author is referenced.

Further, based on formula four, for each file, v is individually matched ₁ To v ₇ And performing weighted fusion to obtain a weighted value v.

In one scheme, the file corresponding to the largest weight value v may be determined as the second search result.

In another scheme, the file corresponding to the first few maximum weight values v may be determined as the second search result.

In this embodiment, the related information between two adjacent segments in the file includes the probability existing between each two adjacent segments; the characteristic information of each segment in the file comprises a first score of a page address of each segment and a second score of an author of each segment; the matching information between each segment and the target information in the file comprises the semantic matching degree of each segment and the target information.

In this embodiment, the relevance between adjacent segments in the file, the feature information of each segment itself, and the matching information between each segment and the target information are integrated, and finally a more fluent file is selected and presented to the user as a search answer.

In the search method according to another embodiment of the present application, in a case that the second search result is output, the reference links corresponding to the segments in the second search result may also be displayed, so that the user can view the original text content.

Illustratively, referring to FIG. 3, the answer text, i.e., the second search result, is displayed. In addition, the conventional search results (i.e., the first search results), such as links, are also displayed.

The search question in fig. 3 includes target information, and the target information may be all contents of the search question or a keyword in the search question.

In summary, the present application aims to: a method for automatically generating search answers and quickly presenting the search answers to a user is provided. In the searching process, firstly, semantic retrieval and matching are carried out on search words of a user, fragment extraction is carried out on a webpage which is preliminarily searched so as to remove redundant results, then, the probability of the front and back sequences between every two fragments is calculated according to the strength and weakness relation between the fragments, and the contribution degree of the fragments to the results is combined to generate a candidate file. Furthermore, fluency judgment is carried out on the file content, abnormal points are found out and cut, and sequencing is carried out according to semantic matching degree, authority degree and fluency, so that better files are displayed for users, the users can browse the search results conveniently and quickly, the search efficiency and accuracy are guaranteed, and the user experience is improved.

According to the searching method provided by the embodiment of the application, the execution main body can be a searching device. The embodiment of the present application takes a search device executing a search method as an example, and the search device provided in the embodiment of the present application is described.

Fig. 4 shows a block diagram of a search apparatus according to another embodiment of the present application, the apparatus including:

the obtaining module 10 is configured to obtain, according to the target information, N1 first search results that match the target information, where N1 is a positive integer;

a first determining module 20, configured to determine, according to content of N1 pieces of first search results, N2 pieces of segments corresponding to the target information, where N2 is a positive integer, and N1 ≧ N2;

the arrangement module 30 is configured to arrange and combine N2 segments according to N3 arrangement modes to obtain N3 corresponding files, where N3 is a positive integer;

and a second determining module 40, configured to determine at least one file as a second search result, and output the second search result.

Optionally, the first determining module 20 includes:

the extracting unit is used for extracting target content matched with the target information from content corresponding to one first search result, wherein the target content is used for generating one fragment to obtain N1 fragments;

the first retaining unit is used for retaining one of the at least two segments according to the feature information of the at least two segments to obtain N2 segments under the condition that the similarity between the at least two segments is greater than a first threshold;

Optionally, the ranking module 30 comprises:

the arrangement unit is used for respectively arranging the N2 fragments at first positions to obtain N2 first positions;

a first obtaining unit, configured to obtain, based on the N2 first positions, matching values of the N2 segments arranged at the second position except for the segment at the first position, respectively, to obtain N2 × N2-1 matching values;

the second reservation unit is used for reserving the arrangement mode corresponding to the first largest N3 matching values in the N2 × N2-1 matching values;

a second obtaining unit, configured to obtain, in any one of the N3 arrangement manners, matching values of segments, except for the segment at the first M positions, of the N2 segments arranged at the M +1 th position, and obtain N3 × (N2-M) matching values;

the third reservation unit is used for reserving the arrangement mode corresponding to the first largest N3 matching values in the N3 × N2-M matching values;

Optionally, the apparatus further comprises:

the deletion module is used for deleting the second segment and the segments arranged behind the second segment under the condition that the first segment is arranged at the first position in any file and if the probability that the second segment is arranged at the second position meets a preset condition, wherein the second position is the next position of the first position, and the N2 segments comprise the first segment and the second segment;

Optionally, the second determining module 40 includes:

a determining unit configured to determine at least one file as a second search result based on the feature information of the N3 files;

The searching device in the embodiment of the present application may be an electronic device, and may also be a component in the electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be a device other than a terminal. The electronic Device may be, for example, a Mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic Device, a Mobile Internet Device (MID), an Augmented Reality (AR)/Virtual Reality (VR) Device, a robot, a wearable Device, an ultra-Mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and may also be a server, a network Attached Storage (Storage), a personal computer (NAS), a Television (TV), a teller machine, a self-service machine, and the like, and the embodiments of the present application are not particularly limited.

The search device according to the embodiment of the present application may be a device having an action system. The action system may be an Android (Android) action system, an ios action system, or other possible action systems, and the embodiment of the present application is not particularly limited.

The search apparatus provided in the embodiment of the present application can implement each process implemented by the foregoing method embodiment, and is not described here again to avoid repetition.

Optionally, as shown in fig. 5, an electronic device 100 is further provided in this embodiment of the present application, and includes a processor 101, a memory 102, and a program or an instruction stored in the memory 102 and executable on the processor 101, where the program or the instruction is executed by the processor 101 to implement each step of any one of the above embodiments of the search method, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

It should be noted that the electronic device according to the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above.

Fig. 6 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 1000 includes, but is not limited to: a radio frequency unit 1001, a network module 1002, an audio output unit 1003, an input unit 1004, a sensor 1005, a display unit 1006, a user input unit 1007, an interface unit 1008, a memory 1009, and a processor 1010.

Those skilled in the art will appreciate that the electronic device 1000 may further comprise a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to the processor 1010 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The electronic device structure shown in fig. 6 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.

The processor 1010 is configured to obtain, according to target information, N1 first search results that match the target information, where N1 is a positive integer; determining N2 fragments corresponding to the target information according to the content of the N1 first search results, wherein N2 is a positive integer, and N1 ≧ N2; arranging and combining the N2 fragments according to N3 arrangement modes to obtain N3 corresponding files, wherein N3 is a positive integer; and determining at least one file as a second search result, and outputting the second search result.

Optionally, the processor 1010 is further configured to extract target content matched with the target information from content corresponding to one piece of the first search result, where the target content is used to generate one segment, and obtain N1 segments; under the condition that the similarity between at least two segments is greater than a first threshold value, reserving one of the at least two segments according to the characteristic information of the at least two segments to obtain the N2 segments; wherein the feature information of the at least two segments comprises: semantic matching information, source page address information and author information.

Optionally, the processor 1010 is further configured to arrange the N2 fragments in first positions, respectively, to obtain N2 first positions; based on the N2 first positions, respectively acquiring matching values of the other segments, except the segment at the first position, of the N2 segments arranged at second positions, and obtaining N2 (N2-1) matching values; the arrangement mode corresponding to the first N3 maximum matching values in the N2 (N2-1) matching values is reserved; in any arrangement mode of N3 arrangement modes, obtaining the matching values of the M +1 th positions of the other segments except the segments at the first M positions in the N2 segments to obtain N3 (N2-M) matching values; the arrangement mode corresponding to the first N3 maximum matching values in the N3 × N2-M matching values is reserved; wherein M is a positive integer, and in the repeated steps, M is equal to 2 and 3 … … (N2-1) in sequence.

Optionally, the processor 1010 is further configured to, in any file, in a case that a first segment is arranged at a first position, delete a second segment and segments arranged after the second segment if a probability that the second segment is arranged at a second position satisfies a preset condition, where the second position is a position next to the first position, and the N2 segments include the first segment and the second segment; wherein the preset condition comprises at least one of the following: the probability that the second segment is arranged at the second position is less than a second threshold with the first segment arranged at the first position; in the case where the first segment is arranged at the first position, the amplitude of fluctuation of the probability of the second segment being arranged at the second position is greater than a third threshold.

Optionally, the processor 1010 is further configured to determine at least one file as a second search result according to the feature information of the N3 files; wherein the characteristic information of the file comprises: the method comprises the following steps of obtaining relevant information between two adjacent fragments in a file, feature information of each fragment in the file, and matching information between each fragment in the file and the target information.

It should be understood that in the embodiment of the present application, the input Unit 1004 may include a Graphics Processing Unit (GPU) 10041 and a microphone 10042, and the Graphics Processing Unit 10041 processes image data of a still picture or a video image obtained by an image capturing device (such as a camera) in a video image capturing mode or an image capturing mode. The display unit 1006 may include a display panel 10061, and the display panel 10061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1007 includes at least one of a touch panel 10071 and other input devices 10072. The touch panel 10071 is also referred to as a touch screen. The touch panel 10071 may include two parts, a touch detection device and a touch controller. Other input devices 10072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and an action stick, which are not described in detail herein. The memory 1009 may be used to store software programs as well as various data, including but not limited to applications and action systems. The processor 1010 may integrate an application processor, which primarily handles motion systems, user pages, applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 1010.

The memory 1009 may be used to store software programs as well as various data. The memory 1009 may mainly include a first storage area storing a program or an instruction and a second storage area storing data, wherein the first storage area may store an operating system, an application program or an instruction (such as a sound playing function, an image playing function, and the like) required for at least one function, and the like. Further, the memory 1009 may include volatile memory or nonvolatile memory, or the memory 1009 may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. The volatile Memory may be a Random Access Memory (RAM), a Static Random Access Memory (Static RAM, SRAM), a Dynamic Random Access Memory (Dynamic RAM, DRAM), a Synchronous Dynamic Random Access Memory (Synchronous DRAM, SDRAM), a Double Data Rate Synchronous Dynamic Random Access Memory (Double Data Rate SDRAM, ddr SDRAM), an Enhanced Synchronous SDRAM (ESDRAM), a Synchronous Link DRAM (SLDRAM), and a Direct Memory bus RAM (DRRAM). The memory 1009 in the embodiments of the present application includes, but is not limited to, these and any other suitable types of memory.

Processor 1010 may include one or more processing units; optionally, the processor 1010 integrates an application processor, which primarily handles operations related to the operating system, user interface, and applications, and a modem processor, which primarily handles wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into processor 1010.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the above search method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a computer read only memory ROM, a random access memory RAM, a magnetic or optical disk, and the like.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement each process of the above search method embodiment, and can achieve the same technical effect, and in order to avoid repetition, the details are not repeated here.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

Embodiments of the present application provide a computer program product, where the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the processes of the foregoing search method embodiments, and achieve the same technical effects, and in order to avoid repetition, details are not described here again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of searching, the method comprising:

according to target information, acquiring N1 first search results matched with the target information, wherein N1 is a positive integer;

determining N2 fragments corresponding to the target information according to the content of the N1 first search results, wherein N2 is a positive integer, and N1 ≧ N2;

arranging and combining the N2 fragments according to N3 arrangement modes to obtain N3 corresponding files, wherein N3 is a positive integer;

and determining at least one file as a second search result, and outputting the second search result.

2. The method of claim 1, wherein the determining N2 segments corresponding to the target information from the content of the N1 first search results comprises:

extracting target content matched with the target information from content corresponding to one first search result, wherein the target content is used for generating one segment, and N1 segments are obtained;

under the condition that the similarity between at least two segments is greater than a first threshold value, reserving one of the at least two segments according to the characteristic information of the at least two segments to obtain the N2 segments;

3. The method of claim 1, wherein said permuting and combining said N2 fragments in N3 permutations comprises:

arranging the N2 fragments at first positions respectively to obtain N2 first positions;

based on the N2 first positions, respectively acquiring matching values of the other segments, except the segment at the first position, of the N2 segments arranged at second positions, and obtaining N2 (N2-1) matching values;

the arrangement mode corresponding to the first N3 maximum matching values in the N2 (N2-1) matching values is reserved;

in any arrangement mode of N3 arrangement modes, obtaining the matching values of the M +1 th positions of the other segments except the segments at the first M positions in the N2 segments to obtain N3 (N2-M) matching values;

the arrangement mode corresponding to the first N3 maximum matching values in the N3 × N2-M matching values is reserved;

4. The method of claim 1, wherein prior to determining at least one file as a second search result, the method further comprises:

in any file, in a case where a first segment is arranged at a first position, if a probability that a second segment is arranged at a second position satisfies a preset condition, deleting the second segment and the segments arranged thereafter, where the second position is a position next to the first position, and the N2 segments include the first segment and the second segment;

wherein the preset condition comprises at least one of the following:

the probability that the second segment is arranged at the second position is less than a second threshold with the first segment arranged at the first position;

in the case where the first segment is arranged at the first position, the amplitude of fluctuation of the probability of the second segment being arranged at the second position is greater than a third threshold.

5. The method of claim 1, wherein determining at least one file as a second search result comprises:

determining at least one file as a second search result according to the characteristic information of the N3 files;

wherein the characteristic information of the file comprises: the method comprises the following steps of obtaining relevant information between two adjacent fragments in a file, feature information of each fragment in the file, and matching information between each fragment in the file and the target information.

6. A search apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring N1 first search results matched with target information according to the target information, wherein N1 is a positive integer;

a first determining module, configured to determine, according to the content of the N1 pieces of first search results, N2 segments corresponding to the target information, where N2 is a positive integer, and N1 ≧ N2;

the arrangement module is used for arranging and combining the N2 fragments according to N3 arrangement modes to obtain N3 corresponding files, wherein N3 is a positive integer;

and the second determining module is used for determining at least one file as a second search result and outputting the second search result.

7. The apparatus of claim 6, wherein the first determining module comprises:

an extracting unit, configured to extract, from content corresponding to one of the first search results, target content that matches the target information, where the target content is used to generate one segment, and N1 segments are obtained;

a first retaining unit, configured to, when a similarity between at least two segments is greater than a first threshold, retain one of the at least two segments according to feature information of the at least two segments, to obtain the N2 segments;

8. The apparatus of claim 6, wherein the arrangement module comprises:

a permutation unit, configured to permute the N2 fragments at first positions, respectively, to obtain N2 first positions;

a first obtaining unit, configured to obtain, based on the N2 first positions, matching values of segments, except for the segment at the first position, of the N2 segments arranged at a second position, respectively, so as to obtain N2 × matching values (N2-1);

a second obtaining unit, configured to obtain, in any one of N3 arrangement manners, matching values of M + 1-th positions of segments, except for segments at the first M positions, of the N2 segments, and obtain N3 × (N2-M) matching values;

9. The apparatus of claim 6, further comprising:

a deleting module, configured to delete, in any file, a second segment and a subsequent segment if a probability that the second segment is arranged at a second position satisfies a preset condition when a first segment is arranged at the first position, where the second position is a position next to the first position, and the N2 segments include the first segment and the second segment;

wherein the preset condition comprises at least one of the following:

10. The apparatus of claim 6, wherein the second determining module comprises:

11. An electronic device comprising a processor and a memory, the memory storing a program or instructions executable on the processor, the program or instructions when executed by the processor implementing the steps of the search method of any one of claims 1 to 5.

12. A readable storage medium, on which a program or instructions are stored, which when executed by a processor, carry out the steps of the search method according to any one of claims 1 to 5.