CN116089599B

CN116089599B - Information query method, system and storage medium

Info

Publication number: CN116089599B
Application number: CN202310367208.6A
Authority: CN
Inventors: 郝亮; 马永亮; 周明
Original assignee: Beijing Lanzhou Technology Co ltd
Current assignee: Beijing Lanzhou Technology Co ltd
Priority date: 2023-04-07
Filing date: 2023-04-07
Publication date: 2023-07-25
Anticipated expiration: 2043-04-07
Also published as: CN116089599A

Abstract

The invention relates to the technical field of information retrieval, in particular to an information query method, an information query system and a storage medium, wherein the information query method provided by the invention comprises the following steps: determining a plurality of corresponding first target documents according to the acquired search terms; extracting a plurality of key points and a plurality of corresponding paragraphs which are related to the search word in each first target document; performing point integration to obtain a plurality of first points and a plurality of corresponding first paragraphs; judging whether at least one relevant characteristic value related to the first target document or the first gist is smaller than a corresponding preset characteristic threshold value or not, and/or judging whether the search round is larger than the preset threshold value or not; if yes, carrying out global integration on the first points and the first paragraphs to form a hierarchical abstract for display. The method and the device have the advantages that the user can directly read the returned text information by only initiating one search, and the interaction cost of inquiring the information by the user is greatly saved without manually clicking links, extracting the information, integrating and searching again.

Description

Information query method, system and storage medium

Technical Field

The present invention relates to the field of information retrieval technologies, and in particular, to an information query method, an information query system, and a storage medium.

Background

The existing information query mode is mainly a search engine, and the interaction mode of the existing search engine is as follows: the user inputs search words, expresses own search intention, and the search engine returns results to be displayed in a list form of website links. The user browses the results, extracts information related to the search intention from the results, and integrates the information. If the user wants to get more relevant information or get a bit deep, he needs to input the search word again to initiate a new search. The user needs to integrate the information obtained from several searches.

However, there are several places in this interaction method that require the user to manually participate in the work: extracting the required information from the search results, integrating the information, and replacing the search terms to initiate the search again increases the cost to the user to obtain the information using the search engine.

Disclosure of Invention

In order to solve the problem that searching by using an existing search engine increases the cost of acquiring information, the invention provides an information query method, an information query system and a storage medium.

The invention provides an information query method, which comprises the following steps:

acquiring search words, and determining a plurality of corresponding first target documents according to the search words;

Extracting a plurality of key points and a plurality of corresponding paragraphs which are related to the search word in each first target document;

the method comprises the steps of integrating a plurality of points and corresponding paragraphs extracted from a plurality of first target documents to obtain a plurality of first points and corresponding first paragraphs;

judging whether at least one relevant characteristic value related to the first target document or the first key point is smaller than a corresponding preset characteristic threshold value or not, and/or judging whether the search round is larger than the preset threshold value or not;

if yes, the first points and the first paragraphs are integrated globally to form a hierarchical abstract for display.

Preferably, a search word is acquired, and a corresponding plurality of first target documents are returned according to the search word, and the method specifically comprises the following steps:

acquiring search words input by a user;

searching the search word through a search engine to return a plurality of corresponding document lists;

and determining a plurality of first target documents corresponding to the search terms according to the ordering of the document list.

Preferably, extracting a plurality of points and a plurality of corresponding paragraphs related to the search term in each first target document specifically includes the following steps:

Extracting a plurality of key points related to the search word from each first target document through a abstract model;

and extracting a plurality of paragraphs corresponding to a plurality of points from each first target document through reading and understanding models.

Preferably, the method includes the steps of integrating a plurality of points and a plurality of corresponding paragraphs extracted from the plurality of first target documents to obtain a plurality of first points and a plurality of corresponding first paragraphs, and specifically includes the following steps:

matching the extracted multiple key points one by one through a matching model to obtain multiple corresponding similarity among the multiple key points;

judging whether at least one similarity is larger than a preset first threshold value or not;

if yes, de-duplication is carried out, paragraphs corresponding to points with the similarity larger than a preset first threshold value are combined, and a plurality of first paragraphs are obtained;

judging whether at least one similarity exists between the residual key points after the weight removal or not, wherein the similarity is larger than a preset second threshold value;

if yes, the points corresponding to the similarity larger than the preset second threshold value are aggregated, and a plurality of first points are obtained.

Preferably, the relevant feature values include a first target document relevance, a first target document length, a number of points, a proportion of points, and a relevance of points to the search term.

Preferably, after determining whether at least one relevant feature value related to the first target document or the first gist is smaller than a corresponding preset feature threshold, and/or whether the search round is larger than a preset threshold, the method further comprises:

and if the related characteristic values related to the first target document and the first gist are both larger than the corresponding preset characteristic threshold value and the search round is smaller than the corresponding preset threshold value, rewriting the first gist into a new search word through a rewriting model to perform the next round of search, and obtaining a plurality of second target documents, a plurality of second gist and a plurality of second paragraphs corresponding to the new search word.

Preferably, after the first gist is rewritten into the new search word by the rewrite model to perform the next round of search to obtain a plurality of second target documents, a plurality of second gist, and a plurality of second paragraphs corresponding to the new search word, the method further includes:

judging whether at least one relevant characteristic value related to the second target document or the second key point is smaller than a corresponding preset characteristic threshold value, and/or searching for a round is larger than a preset threshold value;

if yes, the first points, the second points, the first paragraphs and the second paragraphs are integrated globally to form a hierarchical abstract for display.

Preferably, if yes, the first points, the second points, the first paragraphs and the second paragraphs are globally integrated to form a hierarchical abstract for display, which specifically includes the following steps:

integrating a plurality of first key points, a plurality of second key points, a plurality of first paragraphs and a plurality of second paragraphs through a matching model to form a plurality of third key points and a plurality of third paragraphs;

globally integrating the third key points and the third paragraphs to organize a hierarchical structure;

and generating a layering abstract according to the layering structure through the abstract model for display.

The invention also provides an information query system for solving the technical problems, which is used for realizing the information query method according to any one of the above, and comprises the following steps:

the searching module is used for acquiring search words and determining a plurality of corresponding first target documents according to the search words;

the extraction module is used for extracting a plurality of key points and a plurality of corresponding paragraphs, which are related to the search word, in each first target document;

the key point integration module is used for integrating a plurality of key points and a plurality of corresponding paragraphs extracted from the plurality of first target documents to obtain a plurality of first key points and a plurality of corresponding first paragraphs;

The processing module is used for judging whether at least one relevant characteristic value related to the first target document or the first main point is smaller than a corresponding preset characteristic threshold value and/or whether the search round is larger than the preset threshold value; and when the judgment is true, integrating the first key points and the first paragraphs globally to form a hierarchical abstract for display.

The present invention also provides a computer readable storage medium storing a computer program, which when executed implements the information query method according to any one of the above.

Compared with the prior art, the information query method, the information query system and the storage medium provided by the invention have the following advantages:

1. the information query method of the invention comprises the following steps: acquiring search words, and determining a plurality of corresponding first target documents according to the search words; extracting a plurality of key points and a plurality of corresponding paragraphs which are related to the search word in each first target document; the method comprises the steps of integrating a plurality of points and corresponding paragraphs extracted from a plurality of first target documents to obtain a plurality of first points and corresponding first paragraphs; judging whether at least one relevant characteristic value related to the first target document or the first gist is smaller than a corresponding preset characteristic threshold value or not, and/or judging whether the search round is larger than the preset threshold value or not; if yes, carrying out global integration on the first points and the first paragraphs to form a hierarchical abstract for display. Determining a plurality of corresponding first target documents through the search terms to automatically expand the search intention; and when at least one relevant characteristic value related to the first target document or the first gist is smaller than a corresponding preset characteristic threshold value or the search round is larger than a preset threshold value, a plurality of first gist and a plurality of corresponding first paragraphs are integrated globally and displayed to a user in the form of a layering abstract. Therefore, in the interaction process, the user only needs to initiate one search and then directly read the returned text information, and the user does not need to manually click links, extract information, integrate and re-search, so that the interaction cost of inquiring information by the user is greatly saved.

2. The method for acquiring the search word returns a plurality of corresponding first target documents according to the search word, and specifically comprises the following steps: acquiring search words input by a user; searching the search word through a search engine to return a plurality of corresponding document lists; and determining a plurality of first target documents corresponding to the search terms according to the ordering of the document list. Searching the search words through the search engine, and determining a plurality of corresponding first target documents according to the ordering of the document list, so that a plurality of first target documents with the maximum relevance to the search words input by the user can be returned. And further, the correlation between the finally formed layering abstract and the user search word is improved.

3. According to the invention, the extracted multiple key points are matched one by one through the matching model, so that the corresponding similarity among the multiple key points is obtained; judging whether at least one similarity is larger than a preset first threshold value or not; if yes, performing de-duplication and merging paragraphs corresponding to points with similarity greater than a preset first threshold value to obtain a plurality of first paragraphs, so as to avoid duplication and redundancy of paragraph information, improve accuracy and reliability of paragraph information, and reduce redundancy and invalidity of paragraph information. Judging whether at least one similarity exists between the residual key points after the duplicate removal or not, wherein the similarity is larger than a preset second threshold value; if so, the points with similarity greater than the preset second threshold are aggregated to form a first point, i.e. the similar points are integrated together to form a more comprehensive and accurate overview.

4. The relevance and the richness of the search word and the extracted key point and the first target document are reflected in the relevance of the first target document, the length of the first target document, the number of key points, the key point proportion and the relevance of the key point and the search word in the relevant characteristic values, so that judgment can be carried out through any one of the relevance of the first target document, the length of the first target document, the number of key points, the key point proportion, the relevance of the key point and the search word and the search round, and whether the next round of search is carried out is determined.

5. The method for judging whether at least one relevant characteristic value related to the first target document or the first gist is smaller than a corresponding preset characteristic threshold value or not and/or whether the search round is larger than the preset threshold value or not comprises the following steps: and if the relevant characteristic values related to the first target document or the first gist are larger than the corresponding preset characteristic threshold values and the search turn is smaller than the corresponding preset threshold values, the first gist is rewritten into a new search word through a rewrite model to perform the next search, and a plurality of second gist and a plurality of second paragraphs corresponding to the new search word are obtained. The first gist after the last integration can be rewritten into a new search word through a rewrite model, so that the next round of search is carried out through the new search word, and more comprehensive and deeper information related to the search word input by a user is convenient to obtain.

6. If yes, the invention carries out global integration on a plurality of first points, a plurality of second points, a plurality of first paragraphs and a plurality of second paragraphs to form a hierarchical abstract for display, and specifically comprises the following steps: integrating the first points, the second points, the first sections and the second sections through a matching model to form a plurality of third points and a plurality of third sections; globally integrating the third key points and the corresponding third paragraphs to organize the third key points and the corresponding third paragraphs to form a hierarchical structure; and generating a layering abstract according to the layering structure through the abstract model for display. Therefore, the information of multiple rounds of searching is integrated, deeper and more comprehensive information of the search words input by the user is obtained, and the information is displayed in a layering abstract form, so that more comprehensive and detailed information is provided, and the user can clearly know key contents corresponding to the search words.

7. The invention also provides an information query system, which is used for realizing the information query method according to any one of the above, and has the same beneficial effects as the information query method, and the detailed description is omitted herein.

8. The invention also provides a computer readable storage medium, which has the same beneficial effects as the information inquiry method, and is not described herein.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart illustrating steps of an information query method according to a first embodiment of the present invention.

FIG. 2 is a diagram showing an example of a hierarchical summary of an information query method according to a first embodiment of the present invention.

Fig. 3 is a flowchart showing steps of step S3 of the information query method according to the first embodiment of the present invention.

Fig. 4 is a block diagram of an information query system according to a second embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples of implementation in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The terms "vertical," "horizontal," "left," "right," "upper," "lower," "upper left," "upper right," "lower left," "lower right," and the like are used herein for illustrative purposes only.

Referring to fig. 1 and 2, a first embodiment of the present invention provides an information query method, which includes the following steps:

s1: acquiring search words, and determining a plurality of corresponding first target documents according to the search words;

s2: extracting a plurality of key points and a plurality of corresponding paragraphs which are related to the search word in each first target document;

s3: integrating the points extracted from the first target documents and the corresponding paragraphs to obtain first points and corresponding first paragraphs;

s4: judging whether at least one relevant characteristic value related to the first target document or the first gist is smaller than a corresponding preset characteristic threshold value or not, and/or judging whether the search round is larger than the preset threshold value or not;

s5: if yes, carrying out global integration on the first points and the first paragraphs to form a hierarchical abstract for display.

It will be appreciated that a search term is a keyword or phrase entered by a user in a search engine or other online search tool for text entry of relevant information or content. The search term is typically a description or expression of the desired information by the user. In this embodiment, through step S1, a search engine is used to obtain a search term input by a user. The search engine, upon obtaining the search term, may match the related content in its index and return a plurality of related first target documents corresponding to the search term. Through step S2, after obtaining the search word and the plurality of first target documents returned by the search engine, a plurality of points and corresponding paragraphs related to the search word in each first target document may be extracted to automatically expand the search intention.

It will be appreciated that the gist integration is to integrate the extracted multiple repeated and highly similar gist and paragraphs to remove repeated portions and combine the remaining portions to form more comprehensive, more specific gist and corresponding paragraphs. The global integration is to integrate all the points and corresponding paragraphs after the point integration into a whole. Through step S3, the repeated or similar points and paragraphs thereof in the multiple points and corresponding multiple paragraphs thereof are integrated to remove the completely repeated portions, and the remaining portions are combined to form a first point and a corresponding first paragraph thereof which are more comprehensive and more specific, and meanwhile, the points with small similarity and the corresponding paragraphs thereof are also used as the first point and the corresponding first paragraph thereof, so that multiple first points and corresponding first paragraphs are obtained, and therefore, the confusion of the structure and the repetition of the finally formed hierarchical abstract are avoided.

It will be appreciated that the relevant feature value is a feature value that is closely related to the search term and can reflect the relevance and richness of the extracted first gist or first target document. The first gist or the relevant characteristic value of the first target document is a plurality of, and different relevant characteristic values have corresponding different preset characteristic thresholds. Therefore, in step S4, it may be determined whether at least one relevant feature value related to the first target document or the first gist is greater than a corresponding preset feature threshold, and/or whether the search turn is greater than a preset threshold, that is, a plurality of relevant feature values related to the first target document or the first gist are respectively compared with corresponding preset feature thresholds, and the search turn of the search is compared with the corresponding preset threshold, so as to determine whether the first gist and the first paragraph integrated in the search can reflect the relevance and the richness of the search word. After the comparison, in step S5, if at least one relevant feature value related to the first target document or the first gist is greater than the corresponding preset feature threshold value and/or the search round is greater than the corresponding preset threshold value, it is indicated that the first gist and the first paragraph integrated by the search can reflect the relevance and the abundance of the search word. Therefore, a plurality of first points and a plurality of corresponding first paragraphs after the point integration of the search can be globally integrated to form a whole, namely, a hierarchical abstract is generated and displayed to a user in the form of the hierarchical abstract. Therefore, in the interaction process, the user only needs to initiate one search and then directly read the returned text information, and the user does not need to manually click links, extract information, integrate and re-search, so that the interaction cost of inquiring information by the user is greatly saved.

Further, step S1 specifically includes the following steps:

s11: acquiring search words input by a user;

s12: searching the search word through a search engine to return a plurality of corresponding document lists;

s13: and determining a plurality of first target documents corresponding to the search terms according to the ordering of the document list.

It will be appreciated that a user may input a search term on a search engine, which may obtain the search term entered by the user and search based on the search term to determine a plurality of documents related to the search term. The documents are sequentially ranked from high to low according to the relevance of the documents to the search word to form a document list. The top ranked documents in the document list may be taken as the first plurality of target documents associated with the search term. In this embodiment, preferably, the first three documents ranked in the first document list are taken as the corresponding three first target documents, and the correlation degree between the first three documents and the search term is highest. The higher the relatedness of a document to a search term, the deeper the search term is contacted with the document, and the more relevant content to the search term can be presented. Thereby improving the relevance between the finally formed layering abstract and the user search word. For example, the original search term entered by the user is "biopharmaceutical development direction", which is used to access the search engine, and the first 3 documents, noted as D1, D2, and D3, are taken from the plurality of documents searched.

Further, step S2 specifically includes the following steps:

s21: extracting a plurality of key points related to the search word from each first target document through a abstract model;

s22: and extracting a plurality of paragraphs corresponding to the plurality of points from each first target document through reading and understanding models.

It is understood that in step S21, the gist associated with the search term may be extracted from each first target document by the abstract model. For each gist, a paragraph corresponding to the gist in each first target document may be extracted by using the reading model in step S22. For example, points related to the word "biopharmaceutical development direction" are extracted and searched from three first target documents D1, D2, D3, respectively. Assume that the following points are extracted: a history of biopharmaceuticals; biopharmaceutical history; market demand for biopharmaceuticals; technical difficulties in biopharmaceuticals; the great support of national industry policies will accelerate the development investment and industrialization process of the biopharmaceutical industry. And then, for the 5 points, respectively removing corresponding paragraphs in the corresponding first target document, and marking the corresponding paragraphs as P1, P2, P3, P4 and P5.

Referring to fig. 3, further, step S3 specifically includes the following steps:

S31: matching the extracted multiple key points one by one through a matching model to obtain multiple corresponding similarity among the multiple key points;

s32: judging whether at least one similarity is larger than a preset first threshold value or not;

s33: if yes, carrying out de-duplication and merging paragraphs corresponding to corresponding points with similarity greater than a preset first threshold value to obtain a plurality of first paragraphs;

s34: judging whether at least one similarity exists between the residual key points after the weight removal or not, wherein the similarity is larger than a preset second threshold value;

s35: if yes, the points corresponding to the similarity larger than the preset second threshold value are aggregated, and a plurality of first points are obtained.

It will be appreciated that in this embodiment, the combination is a combination of a plurality of elements into a single body. Aggregation is the division of multiple elements into a hierarchy and does not require merging to form a whole. The remaining points, i.e. points for which the similarity is smaller than a preset first threshold.

It will be appreciated that in step S31, the extracted plurality of points may be matched one by a matching model, so as to obtain the similarity between the plurality of points. Through step S32, a plurality of similarities corresponding to the plurality of points are compared with a preset first threshold, so as to determine whether at least one of the plurality of similarities is greater than the preset first threshold. Through step S33, if at least one similarity among the plurality of similarities is greater than the preset first threshold, at least two points corresponding to the similarity greater than the preset first threshold are de-duplicated, i.e. only one point is reserved, and the rest points of the at least two points are removed. And combining the paragraphs corresponding to at least two points with similarity greater than a preset first threshold value to form a paragraph, wherein the paragraph is the first paragraph. And meanwhile, the paragraph corresponding to the residual key points is also used as a first paragraph. Thereby obtaining a plurality of first paragraphs. The method and the device can avoid the repetition and redundancy of paragraph information, improve the accuracy and reliability of the paragraph information, and reduce the redundancy and invalidity of the paragraph information. In this embodiment, preferably, the preset first threshold may be 0.9. If at least one similarity is greater than 0.9, the semantic similarity of at least two points corresponding to the similarity is greater than 0.9, and the at least two points corresponding to the similarity can be considered to be repeated, so that duplication removal is performed, namely, only one point corresponding to the similarity is reserved, other points corresponding to the similarity are removed, and paragraphs corresponding to the at least two points corresponding to the similarity are combined to form a paragraph, wherein the paragraph corresponds to the reserved one point.

It can be understood that if the similarities are smaller than the preset first threshold, step S34 is directly performed.

It will be appreciated that, through step S34, it may be determined whether at least one similarity exists between the remaining points that is greater than a preset second threshold. Through step S35, if at least one similarity exists between the remaining points and is greater than a preset second threshold, the points with at least one similarity greater than the preset second threshold are aggregated, that is, the points corresponding to the at least one similarity greater than the preset second threshold are divided into a hierarchy and are respectively used as first points, and the points with the similarity smaller than the preset second threshold in the remaining points are also used as the first points, so that a plurality of first points are obtained, and a more comprehensive and accurate overview is formed. In this embodiment, preferably, the preset second threshold may be 0.75, and if at least one similarity exists between the remaining points and is greater than 0.75, performing semantic aggregation on at least two points corresponding to the similarity.

It will be appreciated that after step S32, the method further comprises:

s36: if the similarities are smaller than a preset second threshold, the extracted points and paragraphs are directly used as the first points and corresponding first paragraphs. No deduplication and no polymerization need be performed to perform the process of step S4.

Further, the relevant feature values include a first target document relevance, a first target document length, a number of points, a proportion of points, and a relevance of the points to the search term.

It is understood that the relevant feature values for determining whether to perform the next round of search include the first target document relevance, the first target document length, the number of points, the point proportion, the relevance of the points to the search term. The first target document relevance, the first target document length, the number of points, the proportion of points, the relevance of the points and the search word all have corresponding preset characteristic thresholds. In this embodiment, preferably, the preset feature threshold corresponding to the relevance of the first target document is a threshold with strong relevance, for example, 0.7. The corresponding preset feature threshold for the first target document length is a long document threshold, such as 5000 words. The preset feature threshold corresponding to the number of points is the number of newly added points, namely the number of newly added points after the next round of searching is carried out, such as 3 points. The preset feature threshold corresponding to the point proportion is a new point proportion, i.e. the new point proportion after the next round of searching is performed, for example, 20%. The preset feature threshold with the gist corresponding to the relevance of the search word is a threshold with weak relevance of the newly added gist and the original search word after the next round of search, for example, 0.3.

It should be noted that, the condition for determining whether to perform the next round of searching further includes a search round, where the determination of the search round is different from the determination of the relevant feature value, and the determination of the search round is to determine whether the search round is greater than a corresponding preset threshold, so as to determine whether to perform the next round of searching. In this embodiment, preferably, the preset threshold corresponding to the search round is 3 times.

Specifically, the specific judgment conditions in step S4 are: the correlation degree of the first target document is smaller than a threshold value 0.7 of strong correlation; the length of the first target document is smaller than 5000 words of the long document threshold value; the number of key points is less than 3; the key point proportion is less than 20%; the relevance of the gist and the search word is smaller than a threshold value of 0.3 with weak relevance; the search round is greater than 3 times. Stopping further searching when the above conditions meet one of the conditions, and performing step S5 to form a layering abstract; if none of the above conditions is satisfied, the next round of search is required.

It should be noted that, the relevance of the first target document, the length of the first target document, the number of points, the proportion of points, the relevance of the points and the search word, and the search round can show that the current search has displayed the corresponding whole content of the search word, and the round of search can obtain more comprehensive and deeper information. If the search is performed again, the subsequent searches have failed to obtain new content related to the search terms. And resource waste caused by too many search rounds is easy to occur.

Further, after step S4, the method further includes:

s6: and if the relevant characteristic values related to the first target document and the first gist are both larger than the corresponding preset characteristic threshold value and the search round is smaller than the corresponding preset threshold value, the first gist is rewritten into a new search word through a rewrite model to perform the next round of search, and a plurality of second target documents, a plurality of second gist and a plurality of second paragraphs corresponding to the new search word are obtained.

It will be appreciated that if the relevant feature values associated with the first target document and the first gist are both greater than the corresponding preset feature threshold values, and the search turn is less than the corresponding preset threshold values, the current search does not obtain deeper and more comprehensive information of the search term input by the user, and thus a next search turn is required to obtain more comprehensive and more comprehensive information.

Specifically, in the next round of searching, if searching is performed by using the search word input in the previous round, the obtained information is consistent with the information returned by the previous round of searching. Therefore, the key points determined by the previous round of search can be rewritten through the rewrite model, namely, the key points needing to enter a next round of search in the key points determined by the previous round of search are rewritten into new search words suitable for processing by a search engine. The gist is close to the meaning of the new search word, and the new search word is used for carrying out the next round of search, so that a plurality of second target documents, a plurality of second gist and a plurality of second paragraphs corresponding to the new search word can be obtained. For example: the key point of the technology is that the national industrial policy is greatly supported and the development investment and industrialization progress of the biopharmaceutical industry are accelerated and promoted to be rewritten into the biopharmaceutical policy support.

Further, after step S6, the method further includes:

s7: judging whether at least one relevant characteristic value related to the second target document or the second main point is smaller than a corresponding preset characteristic threshold value or not, and/or judging whether the search round is larger than the preset threshold value or not;

s8: if yes, the first points, the second points, the first paragraphs and the second paragraphs are integrated globally to form a hierarchical abstract for display.

It will be appreciated that the relevant characteristic value associated with the second target document or the second gist is the same as the relevant characteristic value associated with the first target document or the first gist. The difference is only that one is for a first target document and a first gist and the other is for a second target document and a second gist.

Specifically, after the next round of searching is performed according to the new search term, it is also required to determine whether at least one relevant feature value related to the second target document or the second gist is greater than a corresponding preset feature threshold value, and/or whether the search round is greater than the preset threshold value, so as to determine whether the next round of searching is still required, that is, whether the search term input by the user has been obtained in the current round of searching in a more comprehensive and deeper manner. If the search meets the conditions, stopping the search, and performing global integration on the first principal point, the second principal point, the first paragraph and the second paragraph determined by the current round of search and the previous rounds of search to form a layering abstract for display.

Further, step S8 specifically includes the following steps:

s81: integrating the first points, the second points, the first paragraphs and the second paragraphs through a matching model to form a plurality of third points and a plurality of third paragraphs;

s82: globally integrating the third points and the third paragraphs to organize a hierarchical structure;

s83: and generating a layering abstract according to the layering structure through the abstract model for display.

It will be appreciated that after multiple searches are performed and the current search pass determines that the next search pass is no longer performed. In step S81, the first points, the second points, the first paragraphs and the second paragraphs may be matched by a matching model, so that the points are integrated to form a third points and third paragraphs. And (3) sequentially performing de-duplication and aggregation, wherein the threshold value corresponding to the similarity in the de-duplication and aggregation is the same as the preset first threshold value and the preset second threshold value set in the step (S3). And the repeated key points and paragraphs in the multi-round searching are removed, and the key points with the similarity larger than a preset second threshold value in the rest key points are aggregated, so that a more comprehensive and accurate summary is formed.

It will be appreciated that, through step S82, the deduplicated and aggregated third points and third paragraphs may be globally integrated to organize the third points and paragraphs into a hierarchy according to the search round and the search term expansion order. And through step S83, generating a layering abstract for display according to the layering structure through the abstract model. Therefore, the information of multiple rounds of searching is integrated, deeper and more comprehensive information of the search words input by the user is obtained, and the information is displayed in a layering abstract form, so that more comprehensive and detailed information is provided, and the user can clearly know key contents corresponding to the search words. For example, the number of first points and the number of second points determined by the multi-round search total 20. In step S81, the 20 points are de-duplicated, and assuming that 15 points remain after de-duplication, the 15 points are used as a plurality of corresponding third points, the 15 points are aggregated, and the aggregated results are denoted as K1-K15. In step S82, if K1, K2, and K3 are all obtained by the first search, and K8, K11, and K12 are obtained after the search by K3, where K8 and K2 are aggregated together, then K1, K2, K8, and K3 are level, and K11 and K12 are used as sub-points of K3, that is, the next level, so as to form a hierarchical structure. Step S83, a hierarchical abstract can be generated according to the hierarchical structure for display, and the hierarchical abstract can show the hierarchical characteristics of the 15 key points after aggregation.

Referring to fig. 4, a second embodiment of the present invention provides an information query system 1, configured to implement an information query method as set forth in any one of the above, including:

the search module 10 is used for acquiring search words and determining a plurality of corresponding first target documents according to the search words;

an extracting module 20, configured to extract a plurality of points and a plurality of corresponding paragraphs related to the search term in each of the first target documents;

the gist integration module 30 is configured to integrate a plurality of gist extracted from the plurality of first target documents and a plurality of corresponding paragraphs to obtain a plurality of first gist and a plurality of corresponding first paragraphs;

a processing module 40, configured to determine whether at least one relevant feature value related to the first target document or the first gist is less than a corresponding preset feature threshold, and/or whether a search round is greater than a preset threshold; and when the judgment is true, integrating the first points and the first paragraphs globally to form a hierarchical abstract for display.

It will be appreciated that the search module 10 may obtain search terms entered by the user and return a corresponding plurality of first target documents based on the search terms. The extraction module 20 may extract a plurality of points and a corresponding plurality of paragraphs in each first target document that are related to the search term to automatically expand the search intent. The gist integration module 30 can integrate repeated or similar gist and paragraphs thereof in a plurality of gist and corresponding paragraphs thereof to remove completely repeated parts, combine the remaining parts to form more comprehensive and more specific first gist and corresponding first paragraphs thereof, and simultaneously take the gist with small similarity and corresponding paragraphs thereof as the first gist and corresponding first paragraphs thereof, thereby obtaining a plurality of first gist and corresponding first paragraphs, and avoiding the confusion of the structure and the repeated content of the finally formed hierarchical abstract. The processing module 40 includes a decision unit 410 and a global integration unit 420. The stopping unit 410 may determine whether at least one relevant feature value related to the first target document or the first gist is greater than a corresponding preset feature threshold, and/or whether the search round is greater than a preset threshold, so as to determine whether the first gist and the first paragraph integrated in the search can reflect the relevance and the richness of the search term. After comparison, if at least one relevant feature value related to the first target document or the first gist is greater than a corresponding preset feature threshold value and/or the search round is greater than a corresponding preset threshold value, the first gist and the first paragraph integrated by the search can reflect the relevance and the richness of the search word. No next round of searching is performed. Therefore, the global integration unit 420 can integrate the first points and the corresponding first paragraphs after the point integration of the search to form a whole, i.e. generate a hierarchical abstract, and display the hierarchical abstract to the user. Therefore, in the interaction process, the user only needs to initiate one search and then directly read the returned text information, and the user does not need to manually click links, extract information, integrate and re-search, so that the interaction cost of inquiring information by the user is greatly saved.

A third embodiment of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements an information query method provided by the first embodiment of the present invention.

In the embodiments provided herein, it should be understood that "B corresponding to a" means that B is associated with a, from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Those skilled in the art will also appreciate that the embodiments described in the specification are alternative embodiments and that the acts and modules referred to are not necessarily required for the present invention.

In various embodiments of the present invention, it should be understood that the sequence numbers of the foregoing processes do not imply that the execution sequences of the processes should be determined by the functions and internal logic of the processes, and should not be construed as limiting the implementation of the embodiments of the present invention.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, with the determination being made based upon the functionality involved. It will be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

5. The method for judging whether at least one relevant characteristic value related to the first target document or the first gist is smaller than a corresponding preset characteristic threshold value or not and/or whether the search round is larger than the preset threshold value or not comprises the following steps: and if the relevant characteristic values related to the first target document and the first gist are both larger than the corresponding preset characteristic threshold value and the search turn is smaller than the corresponding preset threshold value, the first gist is rewritten into a new search word through a rewrite model to perform the next search, and a plurality of second gist and a plurality of second paragraphs corresponding to the new search word are obtained. The first gist after the last integration can be rewritten into a new search word through a rewrite model, so that the next round of search is carried out through the new search word, and more comprehensive and deeper information related to the search word input by a user is convenient to obtain.

The above describes in detail an information query method, system and storage medium disclosed in the embodiments of the present invention, and specific examples are applied to illustrate the principles and embodiments of the present invention, where the above description of the embodiments is only for helping to understand the method and core ideas of the present invention; meanwhile, as for those skilled in the art, according to the idea of the present invention, there are changes in the specific embodiments and the application scope, and in summary, the present disclosure should not be construed as limiting the present invention, and any modifications, equivalent substitutions and improvements made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An information query method is characterized in that: the method comprises the following steps:

Judging whether at least one relevant characteristic value related to the first target document or the first key point is smaller than a corresponding preset characteristic threshold value or not, and/or judging whether the search round is larger than the preset threshold value or not; the relevant characteristic values comprise a first target document relevance, a first target document length, a key point number, a key point proportion and a relevance of the key point and the search word;

if yes, carrying out global integration on the first key points and the first paragraphs to form a hierarchical abstract for display;

2. The information query method of claim 1, wherein: obtaining a search word, and returning a plurality of corresponding first target documents according to the search word, wherein the method specifically comprises the following steps:

acquiring search words input by a user;

3. The information query method of claim 1, wherein: extracting a plurality of key points and a plurality of corresponding paragraphs, which are related to the search word, in each first target document, wherein the method specifically comprises the following steps:

4. The information query method of claim 1, wherein: the method comprises the following steps of integrating a plurality of points and corresponding paragraphs extracted from a plurality of first target documents to obtain a plurality of first points and corresponding first paragraphs, and specifically comprises the following steps:

5. The information query method of claim 1, wherein: after the first gist is rewritten into a new search word through the rewrite model to perform the next round of search to obtain a plurality of second target documents, a plurality of second gist and a plurality of second paragraphs corresponding to the new search word, the method further comprises:

6. The information query method of claim 5, wherein: if yes, performing global integration on the first points, the second points, the first paragraphs and the second paragraphs to form a hierarchical abstract for display, wherein the method specifically comprises the following steps:

7. An information query system for implementing the information query method as claimed in any one of claims 1 to 6, characterized in that: comprising the following steps:

the main point integration module is used for integrating a plurality of main points and a plurality of corresponding paragraphs extracted from the plurality of first target documents to obtain a plurality of first main points and a plurality of corresponding first paragraphs;

the processing module is used for judging whether at least one relevant characteristic value related to the first target document or the first main point is smaller than a corresponding preset characteristic threshold value and/or whether the search round is larger than the preset threshold value; the relevant characteristic values comprise a first target document relevance, a first target document length, a key point number, a key point proportion and a relevance of the key point and the search word; when the judgment is true, integrating a plurality of first points and a plurality of first paragraphs globally to form a layering abstract for display;

And the rewriting module is used for rewriting the first gist into a new search word through a rewriting model to perform the next round of search if the related characteristic values related to the first target document and the first gist are both larger than the corresponding preset characteristic threshold value and the search round is smaller than the corresponding preset threshold value, so as to obtain a plurality of second target documents, a plurality of second gist and a plurality of second paragraphs corresponding to the new search word.

8. A computer-readable storage medium storing a computer program, characterized in that: computer program, when executed, implements the information query method of any of claims 1-6.