CN117609428A - Information recall method and related equipment - Google Patents

Information recall method and related equipment Download PDF

Info

Publication number
CN117609428A
CN117609428A CN202311640457.4A CN202311640457A CN117609428A CN 117609428 A CN117609428 A CN 117609428A CN 202311640457 A CN202311640457 A CN 202311640457A CN 117609428 A CN117609428 A CN 117609428A
Authority
CN
China
Prior art keywords
search
term
provincial
word
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311640457.4A
Other languages
Chinese (zh)
Inventor
王惠芬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Youzhuju Network Technology Co Ltd
Original Assignee
Beijing Youzhuju Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Youzhuju Network Technology Co Ltd filed Critical Beijing Youzhuju Network Technology Co Ltd
Priority to CN202311640457.4A priority Critical patent/CN117609428A/en
Publication of CN117609428A publication Critical patent/CN117609428A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides an information recall method, comprising: acquiring a search statement; word segmentation is carried out on the search sentences to obtain a plurality of search words; determining importance parameters of each search word respectively; constructing at least one search expression based on importance parameters of each search term; and performing information recall based on the at least one search expression. Based on the information recall method, the disclosure further provides an information recall device, electronic equipment, a storage medium and a program product.

Description

Information recall method and related equipment
Technical Field
The present disclosure relates to the field of data retrieval technologies, and in particular, to an information recall method, an apparatus, an electronic device, a storage medium, and a program product.
Background
Information recall is typically applied in search scenarios. In a search scenario, a search platform typically generates a search expression based on a search statement entered by a user, then recalls information using the search expression, and orders the recalled information for presentation to a searcher. It will be appreciated that in the above search scenario, the accuracy of the search expression will directly impact the accuracy of the recall information, and therefore, how to generate a search expression based on a search statement entered by a user is one of the key techniques for information recall.
Disclosure of Invention
In view of this, embodiments of the present disclosure provide an information recall method that can determine an accurate search expression based on a search statement input by a user, thereby improving the accuracy of search results.
The information recall method according to the embodiment of the disclosure may include: acquiring a search statement; word segmentation is carried out on the search sentences to obtain a plurality of search words; determining importance parameters of each search word respectively; constructing at least one search expression based on importance parameters of each search term; and performing information recall based on the at least one search expression.
In an embodiment of the disclosure, the determining the importance parameters of the respective search terms includes: determining the word weight of each search word respectively; and determining whether each search term is a provincial search term or an non-provincial search term based on the term weight of each search term, respectively.
In an embodiment of the disclosure, the determining whether each search word is a provincial search word or a non-provincial search word based on the word weight of each search word includes: determining a number N of non-provincial search terms based on the number of the plurality of search terms; and selecting N search words with the greatest word weights from the plurality of search words as the non-provincial search words, and taking the rest search words as the provincial search words.
In an embodiment of the disclosure, the constructing at least one query expression based on the importance parameter of each search term includes: determining at least one additional set of search terms based on the sparable search terms of the plurality of search terms; wherein the at least one additional set of search terms comprises an empty set; and combining all the non-provincial search words in the plurality of search words and the provincial search words in the additional search word sets for each additional search word set respectively to obtain a search expression.
In an embodiment of the present disclosure, the method further includes: determining synonyms of each search term respectively; wherein the constructing at least one search expression based on the importance parameter of each search term includes: at least one search expression is constructed based on the importance parameter of each search term and the synonym of each search term.
In an embodiment of the disclosure, the constructing at least one search expression based on the importance parameter of each search term and the synonym of each search term includes: constructing at least one key search word set based on all non-provincial search words or synonyms thereof in the plurality of search words; constructing at least one additional set of search terms based on the provincial search term or its synonyms of the plurality of search terms; wherein the at least one additional set of search terms comprises an empty set; and combining the non-economized search words in the key search word set and the economized search words in the additional search word set to obtain a search expression for the combination of each key search word set and each additional search word set.
In an embodiment of the present disclosure, the method further includes: determining scores of a plurality of search results obtained through information recall based on at least one scoring parameter; wherein the at least one scoring parameter comprises: at least one of a corresponding score of a search result from a search engine, a click rate score of the search result over a predetermined period of time, a rank score corresponding to the search result, a closeness distance score between unremoved search terms in the search result, and a relevance score between the search result and a search term; and selecting a predetermined number of search results with highest scores as recalled information.
In an embodiment of the present disclosure, the determining the score of the plurality of search results obtained by information recall based on the at least one scoring parameter includes: for each search result, respectively determining the sub-score of each scoring parameter in the at least one scoring parameter corresponding to the search result; and for each search result, respectively carrying out weighted summation on the sub-scores of the search result corresponding to each scoring parameter to obtain the score of the search result.
Based on the information recall method, some embodiments of the present disclosure further provide an information recall device, including:
The search statement acquisition module is used for acquiring search statements;
the word segmentation module is used for segmenting the search sentence to obtain a plurality of search words;
the importance determining module is used for determining importance parameters of each search word respectively;
the search expression construction module is used for constructing at least one search expression based on the importance parameters of each search word; and
and the information recall module is used for carrying out information recall based on the at least one search expression.
In addition, the embodiment of the disclosure also provides an electronic device, including: the information recall device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the information recall method when executing the program.
Embodiments of the present disclosure also provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the above-described information recall method.
Embodiments of the present disclosure also provide a computer program product comprising computer program instructions which, when run on a computer, cause the computer to perform the above-described information recall method.
According to the information recall method and the information recall device, the importance of the search terms contained in the search statement is evaluated, and the search expression is constructed according to the importance parameters of each search term, so that on one hand, the constructed at least one search expression can contain more important search terms, the constructed search expression can be ensured to accurately reflect the search intention of a user, and further the search result can meet the requirement of the user; on the other hand, by reducing unimportant search terms in the constructed search expression, search sentences input by the user are diverged to return richer related information, so that the search interest of the user is stimulated, and the search experience of the user can be further improved.
Drawings
In order to more clearly illustrate the technical solutions of the present disclosure or related art, the drawings required for the embodiments or related art description will be briefly described below, and it is apparent that the drawings in the following description are only embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort to those of ordinary skill in the art.
FIG. 1 illustrates an implementation flow of an information recall method according to some embodiments of the present disclosure;
FIG. 2 illustrates an implementation flow of a particular method of constructing at least one search expression based on importance parameters of individual search terms according to some embodiments of the present disclosure;
FIG. 3 shows a flow of implementation of the information recall method according to other embodiments of the present disclosure;
FIG. 4 illustrates an implementation flow of a particular method of constructing at least one search expression based on importance parameters of individual search terms and their synonyms according to some embodiments of the present disclosure;
FIG. 5 shows the internal structure of an information recall device according to an embodiment of the present disclosure; and
fig. 6 illustrates a more specific electronic device hardware architecture diagram according to some embodiments of the present disclosure.
Detailed Description
For the purposes of promoting an understanding of the principles and advantages of the disclosure, reference will now be made to the embodiments illustrated in the drawings and specific language will be used to describe the same.
It should be noted that unless otherwise defined, technical or scientific terms used in the embodiments of the present disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which the present disclosure pertains. The terms "first," "second," and the like, as used in embodiments of the present disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.
It will be appreciated that before using the technical solutions of the various embodiments in the disclosure, the user may be informed of the type of personal information involved, the range of use, the use scenario, etc. in an appropriate manner, and obtain the authorization of the user.
For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Therefore, the user can select whether to provide personal information to the software or hardware such as the electronic equipment, the application program, the server or the storage medium for executing the operation of the technical scheme according to the prompt information.
As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.
It will be appreciated that the above-described notification and user authorization process is merely illustrative, and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.
As previously described, in a search scenario, the construction of a search expression is one of the key techniques for information recall, and the accuracy of the search expression directly affects the accuracy of the recalled information. In order to improve accuracy of information recall and improve use experience of a user in a search scene, an embodiment of the disclosure provides an information recall method.
Fig. 1 shows an implementation flow of an information recall method according to some embodiments of the present disclosure. The information recall method can be executed by a search platform. As shown in fig. 1, the information recall method may include the following steps:
in step 110, a search statement is obtained.
In an embodiment of the present disclosure, the search term may be a search term generally input by a user through a user interface provided by a browser or a client application corresponding to a search platform.
Further, the search term may be text or speech in natural language, for example, a question or a sentence, or the like. It should be noted that, in the embodiment of the present disclosure, the voice-form search sentence may be converted into the text-form search sentence by the voice recognition technology before the subsequent method is performed.
In step 120, the obtained search sentence is segmented to obtain a plurality of search words.
In the embodiments of the present disclosure, the above search sentence may be segmented by various methods, for example, a dictionary-based segmentation method, a statistical-based segmentation method, a rule-based segmentation method, a word-labeling-based segmentation method, an understanding-based segmentation method, and the like. It should be noted that, the information recall method according to the embodiment of the present disclosure is not limited to the specific word segmentation method adopted.
In step 130, importance parameters for each search term are determined separately.
In some embodiments of the present disclosure, the importance parameter is mainly used to characterize whether a certain search term is a provincial search term or an non-provincial search term. Specifically, in the embodiments of the present disclosure, the above-described unavoidable search words refer to search words that are indispensable in constructing a search expression, that is, search words that must be included in recalled information; the above-mentioned saving search terms refer to search terms that may be absent in constructing the search expression, that is, search terms that may not be included in the recalled information.
In this case, the step 130 may include: first, determining a Term Weight (Term Weight) of each of the plurality of search terms; and secondly, determining whether each search word is an provincial search word or an irreducible search word respectively based on the word weight of each search word.
In the embodiment of the present disclosure, the term weight is an index for measuring the importance degree of each search term in the search sentence.
In some examples, a computing method mainly used for determining the word weight of each search word is to obtain relevant co-occurrence statistical features of the word from the multi-text dataset, for example, obtain features such as word Frequency-reverse document Frequency (TF-IDF, term Frequency-Inverse Document Frequency) or mutual information of each search word; then, word weights of the respective search words are determined based on the obtained statistical features.
In other examples, the plurality of search terms may also be respectively input into a trained term weight model, thereby respectively obtaining term weights of the plurality of search terms. Specifically, the word weight model may encode the input search word to obtain a corresponding word vector, and then determine the word weight of the search word according to the word vector. In practical applications, the word weight model may be implemented by a variety of machine learning models, for example, at least one of a trained bi-directional encoder representation (BERT) model from a transducer, a Convolutional Neural Network (CNN) model, or a long-term memory model (LSTM), and has relatively good performance.
It should be noted that, the information recall method according to the embodiment of the present disclosure is not limited to the specific word weight determining method adopted therein.
Specifically, in the embodiments of the present disclosure, a determination policy of non-provincial search words related to the word weight may be predetermined, and non-provincial search words may be determined from among the respective search words based on the predetermined determination policy of non-provincial search words, and the remaining search words may be determined as provincial search words. The information recall method according to the embodiment of the present disclosure also does not limit the determination policy of the irrelevance search terms adopted therein. For example, in some embodiments, the above-described decision strategy may include:
first, the number N of non-economizable search terms is determined based on the number of the plurality of search terms included in the search sentence. For example, in some examples, it may be determined that when the number of the plurality of search words included in the above search sentence is 3 or more, the number N of the non-negligible search words is set to 2.
And secondly, selecting N search words with the greatest word weights from the plurality of search words as the non-provincial search words, and taking the rest search words as the provincial search words. That is, in the above example, the plurality of search words may be sequentially arranged from large to small based on the word weight, and then two search words having the greatest word weight may be selected therefrom as non-provincial search words, while the remaining search words are determined as provincial search words.
It can be seen that the above decision strategy and specific steps can divide a plurality of search terms into two parts of non-provincial search terms and provincial search terms. Moreover, the above-described classification results for the search terms will be used in the construction of subsequent search expressions.
At step 140, at least one search expression is constructed based on the importance parameters of the respective search terms.
In an embodiment of the present disclosure, the specific method for constructing at least one search expression based on the importance parameter of each search term in the step 140 may be as shown in fig. 2, and mainly includes the following steps:
at step 210, at least one additional set of search terms is constructed based on the curvable search terms of the plurality of search terms.
In embodiments of the present disclosure, any combination of all of the above-described decompable search terms may be used to arrive at the above-described at least one additional set of search terms. In particular, the at least one additional set of search terms may include an empty set. In practical applications, assuming that there are a total number of provincial search terms, the at least one additional set of search terms may include: empty set, A sets of any one of the provincial search words, A-1A/2 sets of any two provincial search words, … …, and sets of all the A provincial search words.
In step 220, for each additional set of search terms, all of the non-provincial search terms in the plurality of search terms and the provincial search terms in the additional set of search terms are combined to obtain a search expression.
That is, in embodiments of the present disclosure, one search expression may be generated for each additional set of search terms, and each search expression contains all of the non-provincial search terms and a different provincial search term combination or no provincial search terms. It can be seen that at least one search expression can be generated by the above method.
Specifically, in embodiments of the present disclosure, the above-described combinations may include a combinational logic "and". For example, one search expression generated may be: "search word a and search word B and search word C … …".
In step 150, information recall is performed based on the at least one search expression.
In an embodiment of the present disclosure, in step 150 above, information recall may be performed in one or more knowledge bases (or may also be referred to as databases) based on the at least one search expression. Specifically, in the information recall process, a text inverted index recall mode can be used for information recall, so that the accuracy of the recalled information can be ensured.
In some embodiments of the present disclosure, if there are multiple knowledge bases (or databases) to be searched, in order to reduce the information recall delay, the search platform may perform information recall in the multiple knowledge bases in parallel, respectively, that is, establish multiple information recall threads in parallel, where each thread corresponds to one or multiple knowledge bases, perform information recall in parallel from the multiple knowledge bases based on the at least one search expression, and jointly obtain a preset first number of search results.
According to the information recall method, the importance of the search terms contained in the search statement is evaluated, and the construction mode of the search expression is carried out according to the importance parameters of each search term, so that on one hand, the constructed at least one search expression can contain more important search terms, the constructed search expression can be ensured to accurately reflect the search intention of a user, and further the search result can meet the requirement of the user. Particularly, under the condition that the expression of the search statement provided by the user is unclear, the method can still more accurately determine the search intention of the user by evaluating the importance parameters of each search word. On the other hand, by reducing unimportant search terms in the constructed search expression, the search sentences input by the user are moderately diverged to return richer related information, so that the search interest of the user is stimulated, and the search experience of the user is further improved.
The above information recall method is described below in connection with a specific example. Assume that the search term input by the user is "two-in-one between the study clothes and caps" obtained in step 110. Then, in the step 120, three search words "study", "coat and hat shop" and "two-in-one" may be obtained after the word segmentation. In step 130, it may be determined that the word weight ranking result of the three search words is "clothes and hat room" > "study" > "two-in-one". Further, the number of the non-economizable search words can be determined to be 2 based on a preset decision strategy, so that the 'coat room' and the 'study room' are non-economizable search words in the three search words; "two-in-one" is a search term that can be saved. Next, in step 140, based on the determination results, two additional sets of search terms may be determined, including: empty set, { "two-in-one" }. Thus, two search expressions can be derived: "study" and "coat-hat" and "two-in-one". Finally, at step 150, information recall is performed at multiple knowledge bases based on the two search expressions constructed as described above.
It can be seen that, in the above example, the search expressions constructed on the one hand all include two non-economizable search words of "study" and "coat and hat", so as to ensure the accuracy of recall information; on the other hand, the built search expression does not comprise the search word 'two-in-one' input by the user, so that the search sentence input by the user is diverged to a certain extent, and more rich relevant information is returned, so that the search interest of the user is stimulated.
In order to further correlate and diverge search sentences input by a user, so that richer related information is returned, and the problem that recalled information is inaccurate due to the fact that the user cannot accurately describe own search requirements is effectively solved, other embodiments of the present disclosure also provide an information recall method. The implementation flow of the method is shown in fig. 3, and mainly comprises the following steps:
in step 310, a search statement is obtained.
In step 320, the obtained search term is segmented to obtain a plurality of search terms.
In step 330, importance parameters for each search term are determined separately. In some embodiments of the present disclosure, the importance parameter is mainly used to characterize whether a certain search term is a provincial search term or an non-provincial search term.
It should be noted that the implementation method of the steps 310 to 330 is the same as the steps 110 to 130 in the above embodiment, and thus, the description is not repeated here.
In step 340, synonyms for each search term are determined separately.
In embodiments of the present disclosure, synonyms for each search term may be determined separately based on a pre-established synonym library. It should be noted that, the information recall method according to the embodiments of the present disclosure is not limited to a specific method for determining synonyms of each search term.
At step 350, at least one search expression is constructed based on the importance parameter of each search term and the synonym of each search term.
In an embodiment of the disclosure, a specific method for constructing at least one search expression based on the importance parameter of each search term and the synonym of each search term in step 350 may be as shown in fig. 4, and mainly includes the following steps:
at step 410, at least one set of key search terms is constructed based on all of the non-provincial search terms or synonyms thereof in the plurality of search terms.
In an embodiment of the present disclosure, each of the at least one set of key search terms includes all of the non-provincial search terms of the plurality of search terms, or includes a portion of the non-provincial search terms of the plurality of search terms and synonyms of the remaining non-provincial search terms, or includes synonyms of all of the non-provincial search terms of the plurality of search terms. That is, each of the keyword search sets described above will include all of the non-provincial search terms themselves or their synonyms.
At step 420, at least one additional set of search terms is constructed based on the provincial search term or its synonyms in the plurality of search terms.
In an embodiment of the present disclosure, the performing method of the step 420 may refer to the step 310 in the foregoing embodiment, and any combination of all the negligible search terms and their synonyms may be performed to obtain the at least one additional search term set. In particular, the at least one additional set of search terms may include an empty set. In practical applications, assuming that there are B of all the provincial search terms and their synonyms, the at least one additional set of search terms may include: empty sets, B sets of any one of the provincial search terms or synonyms thereof, sets of any two of the provincial search terms or synonyms thereof (wherein, in general, one of the provincial search terms and the synonyms thereof may not occur in the same additional search set at the same time), sets of any three of the provincial search terms or synonyms thereof, and the like.
In step 430, for each combination of the set of key search terms and each set of additional search terms, the non-provincial search terms in the set of key search terms and the provincial search terms in the set of additional search terms are combined, respectively, to obtain a search expression.
That is, in embodiments of the present disclosure, the combination of the corresponding set of key search terms and each additional set of search terms may generate one search expression, and each search expression contains all of the non-provincial search terms or their synonyms. Furthermore, each search expression may also contain a different provincial search term or synonym combination thereof. It can be seen that at least one search expression can be generated by the above method.
In step 360, information recall is performed based on the at least one search expression.
It should be noted that the implementation method of the step 360 is the same as that of the step 150 in the above embodiment, and thus, the description is not repeated here.
According to the information recall method, the search words contained in the search sentences are expanded to the synonyms of the search words to construct the search expression, so that the search sentences input by the user are further subjected to associated divergence on the basis that the constructed search expression can accurately reflect the search intention of the user, and more rich relevant information is returned, the search interest of the user is further stimulated, and the search experience of the user is improved.
The above information recall method is described in further detail with reference to a specific example. Assume that the search term input by the user is "two-in-one between the study clothes and caps" obtained in step 310. Then, in the step 320, three search words "study", "coat and hat shop" and "two-in-one" may be obtained after the word segmentation. In step 330, it may be determined that the word weight ranking result of the three search words is "clothes and hat room" > "study" > "two-in-one". Further, the number of the non-economizable search words can be determined to be 2 based on a preset decision strategy, so that the 'coat room' and the 'study room' are non-economizable search words in the three search words; "two-in-one" is a search term that can be saved. Next, at step 340, it may be determined that "coat-hat shop" has no synonyms; synonyms for "study" include "book"; synonyms for "two-in-one" include "integral" and "integral". Thus, in step 350, it may be determined that the set of key search terms includes two, including: { "coat-hat booth", "study" } "and {" coat-hat booth "," study "}. In addition, four additional sets of search terms may be determined, including: empty, { "two-in-one" }, { "integral" }, and { "integral" }. Thus, 8 search expressions can be obtained: the 'study' and the 'coat-hat room', 'study' and the 'coat-hat room' and the 'two-in-one', the 'study' and 'coat-hat room' and 'integral', 'study' and 'coat-hat room' and 'integral'. Finally, at step 360, information recall is performed at multiple knowledge bases based on the 8 search expressions constructed as described above. As can be seen, in this example, compared to the previous example, the search expressions constructed on the one hand each include two non-economizable search terms or synonyms thereof, namely "study" (or "bookstore") and "coat-hat shop", thereby ensuring the accuracy of recall information; on the other hand, the search expression constructed by the method does not comprise the search word 'two-in-one' input by the user, or the search sentences input by the user are further associated and diverged in a mode of carrying out synonym replacement on the search word, so that richer related information can be returned, and the search interest of the user is stimulated more effectively.
It will be appreciated that after the above steps 150 or 360 are performed, a plurality of recalled pieces of information, also referred to as search results, may be obtained. It will be appreciated that in general, the number of search results returned by information recall is often relatively large. In order to further ensure the accuracy and the effectiveness of the information recalled by the search platform, the information recall method may further include:
firstly, determining scores of a plurality of search results obtained through information recall based on at least one scoring parameter;
next, a predetermined number of search results in which the score is highest are selected as target information for recall.
In embodiments of the present disclosure, the at least one scoring parameter may include at least one of the following various parameters.
1) Corresponding scores for search results from a search engine.
Typically, in the course of information recall, the search engine used by the search platform will score each search result. That is, the search engine returns the search results together with a corresponding score for the search results. In embodiments of the present disclosure, the corresponding score of the search results from the search engine may be one of the scoring parameters described above. It is generally believed that the higher the corresponding score of the search results returned by the search engine, the greater the likelihood that they will be both accurate and valid and selected by the user.
2) Click rate scores for search results over a predetermined period of time.
In embodiments of the present disclosure, the click rate of the search results over a predetermined period of time, for example, the posterior click rate of one item of information over the last 10 days, may be used as one of the scoring parameters described above. It is generally considered that the higher the click rate of an item of information over a predetermined period of time, the greater the probability of it being selected by the user.
3) And (5) grading scores corresponding to the search results.
In embodiments of the present disclosure, the reliability and accuracy of the search results themselves may be evaluated based on the information sources or other attributes of the search results to obtain a rank score corresponding to the search results. For example, it is generally believed that the reliability and accuracy of information published by the official authentication platform is high, and therefore, a higher ranking score may be given to the search results from the official authentication platform as one of the scoring parameters described above, so that more accurate and reliable search results may be provided to the user with higher priority. In a specific application, a pre-set ranking scoring strategy may be used to determine a ranking score corresponding to each search result.
4) The closeness distance score between search terms is not saved in the search results.
Because in search sentences, the closeness distance between the unavoidable search words is usually very short, and the precedence relationship of the unavoidable search words has a certain influence on the search results. Thus, for each search result, the search result may be scored based on the positional relationship between the non-provincial search terms contained therein, resulting in the closeness distance score described above to reflect the closeness between the non-provincial search terms in the search result. The higher the compactness between the unavoidable search words in one search result is, the closer the position distance is, the higher the relatedness between the search result and the search sentence is considered to be possible. In practice, a trained affinity distance assessment model may be used to determine affinity distance scores between the unavoidable search terms in the search results.
5) Relevance scores between search results and search terms.
It will be appreciated that in general the higher the relevance between the search results and the search terms, the greater the probability of being selected by the user. In practical applications, a trained relevance evaluation model may be used to determine a relevance score between a search result and a search term.
In the case that one or more of the above-mentioned parameters are selected as scoring parameters, in the above-mentioned step, a sub-score of each scoring parameter corresponding to each search result may be determined for each search result, respectively; and then, normalizing and weighting summing the sub-scores of the search result corresponding to each scoring parameter to obtain the score of the search result. Each scoring parameter may also correspond to a predetermined weighting coefficient, which is used to characterize the importance of the scoring parameter, where the weighting coefficient may be set empirically and may be set and dynamically adjusted according to the actual application scenario.
For example, in one specific example, determining a score for a search result based on the 5 scoring parameters described above may be accomplished by the following computational expression:
cu=(1+α1*es_score)*(1+α2*click_score)*(1+α3*grade_score)
*(1+α4*term_tight_score)*(1+α5*sim_score)
where cu represents the score of the search result; the es score represents the corresponding score of the search results from the search engine; click score represents the click rate score of the search results over a predetermined period of time; grade_score represents the rank score corresponding to the search result; term_light_score represents the closeness distance score between the unavoidable search terms in the search results; sim score represents a relevance score between the search results and the search term. In the above expression, all the sub-scores are normalized scores.
Based on the method, recalled information can be comprehensively evaluated in a plurality of different dimensions, so that comprehensive scores which can comprehensively reflect accuracy, reliability, relativity with search sentences, probability of interest of users and the like of the multi-dimensional information are obtained. Then, a part of information with the highest score is selected as the final recalled target information, so that the searching experience of the user can be greatly improved.
On the other hand, for the search scenario requiring information search from multiple knowledge bases, the method can mix information recalled from the multiple knowledge bases together to form a new knowledge base, and recall a predetermined amount of information from the new knowledge base as final target information according to a unified standard. The recall process of the two stages can effectively balance the difference between different knowledge bases, and the information which is more accurate, more reliable, higher in correlation and more likely to be selected by a user is selected from a plurality of knowledge bases to serve as the target information of the final recall.
Corresponding to the method, the embodiment of the disclosure also discloses an information recall device. Fig. 5 shows an internal structure of an information recall device according to an embodiment of the present disclosure. As shown in fig. 5, the information recall device may include the following modules: .
A search term acquisition module 510, configured to acquire a search term;
the word segmentation module 520 is configured to segment the search sentence to obtain a plurality of search words;
an importance determining module 530, configured to determine importance parameters of each search term respectively;
a search expression construction module 540 for constructing at least one search expression based on the importance parameter of each search term; and
An information recall module 550 for performing information recall based on the at least one search expression.
Specific implementations of the above modules may refer to the foregoing methods and accompanying drawings, and will not be repeated here. For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, the functions of the various modules may be implemented in the same one or more pieces of software and/or hardware when implementing the present disclosure. The device of the foregoing embodiment is used for implementing the corresponding information recall method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which is not described herein.
Based on the same inventive concept, corresponding to the method of any embodiment, the present disclosure further provides a head-mounted display device, including: the information recall method according to any one of the embodiments described above is implemented when the processor executes the program.
Fig. 6 is a schematic diagram showing a hardware structure of a more specific head-mounted display device according to the present embodiment, where the device may include: a processor 2010, a memory 2020, an input/output interface 2030, a communication interface 2040, and a bus 2050. Wherein the processor 2010, the memory 2020, the input/output interface 2030 and the communication interface 2040 enable a communication connection therebetween within the device via the bus 2050.
The processor 2010 may be implemented as a general-purpose CPU (Central Processing Unit ), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing associated programs to implement the solutions provided by the embodiments of the present disclosure.
The Memory 2020 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), a static storage device, a dynamic storage device, or the like. Memory 2020 may store an operating system and other application programs, and when the embodiments of the present specification are implemented in software or firmware, the associated program code is stored in memory 2020 and executed by processor 2010.
The input/output interface 2030 is used for connecting input/output devices to realize information input and output. The input/output device may be configured in the device as a component, or may be externally connected to the device to provide corresponding functions. Wherein the input devices may include microphones, various types of sensors, etc., and the output devices may include displays, speakers, vibrators, indicator lights, etc.
The communication interface 2040 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
The bus 2050 includes a pathway to transfer information between various components of the device (e.g., the processor 2010, the memory 2020, the input/output interface 2030, and the communication interface 2040).
It is noted that although the above-described device illustrates only the processor 2010, memory 2020, input/output interface 2030, communication interface 2040, and bus 2050, the device may include other components necessary to achieve proper operation in a particular implementation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.
The electronic device of the foregoing embodiment is configured to implement the corresponding information recall method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which is not described herein.
Based on the same inventive concept, corresponding to any of the above embodiments of the method, the present disclosure further provides a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the information recall method according to any of the above embodiments.
The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.
The storage medium of the foregoing embodiments stores computer instructions for causing the computer to perform the task processing method as described in any one of the foregoing embodiments, and has the advantages of the corresponding method embodiments, which are not described herein.
Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the disclosure, including the claims, is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined under the idea of the present disclosure, the steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present disclosure as described above, which are not provided in details for the sake of brevity.
Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure the embodiments of the present disclosure. Furthermore, the devices may be shown in block diagram form in order to avoid obscuring the embodiments of the present disclosure, and this also accounts for the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform on which the embodiments of the present disclosure are to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.
While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.
The disclosed embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Accordingly, any omissions, modifications, equivalents, improvements, and the like, which are within the spirit and principles of the embodiments of the disclosure, are intended to be included within the scope of the disclosure.

Claims (12)

1. An information recall method, comprising:
acquiring a search statement;
word segmentation is carried out on the search sentences to obtain a plurality of search words;
determining importance parameters of each search word respectively;
constructing at least one search expression based on importance parameters of each search term; and
and information recall is performed based on the at least one search expression.
2. The method of claim 1, wherein the separately determining importance parameters for each search term comprises:
determining the word weight of each search word respectively; and
Whether each search term is a provincial search term or a non-provincial search term is determined based on the term weight of each search term.
3. The method of claim 2, wherein the determining whether each search term is a provincial search term or a non-provincial search term based on the term weight of each search term, respectively, comprises:
determining a number N of non-provincial search terms based on the number of the plurality of search terms; and
and selecting N search words with the greatest word weight from the plurality of search words as the non-provincial search words, and taking the rest search words as the provincial search words.
4. The method of claim 2, wherein the constructing at least one query expression based on importance parameters of respective search terms comprises:
constructing at least one additional set of search terms based on the sparable search terms of the plurality of search terms; wherein the at least one additional set of search terms comprises an empty set; and
and combining all the non-provincial search words in the plurality of search words and the provincial search words in the additional search word sets for each additional search word set respectively to obtain a search expression.
5. The method of claim 2, further comprising:
Determining synonyms of each search term respectively; wherein,
the constructing at least one search expression based on the importance parameters of the respective search terms includes: at least one search expression is constructed based on the importance parameter of each search term and the synonym of each search term.
6. The method of claim 5, wherein the constructing at least one search expression based on the importance parameter of each search term and the synonym of each search term comprises:
constructing at least one key search word set based on all non-provincial search words or synonyms thereof in the plurality of search words;
constructing at least one additional set of search terms based on the provincial search term or its synonyms of the plurality of search terms; wherein the at least one additional set of search terms comprises an empty set; and
and combining the non-provincial search words in the key search word sets and the provincial search words in the additional search word sets to obtain a search expression for the combination of each key search word set and each additional search word set.
7. The method of claim 1, further comprising:
determining scores of a plurality of search results obtained through information recall based on at least one scoring parameter, respectively; wherein the at least one scoring parameter comprises: at least one of a corresponding score of a search result from a search engine, a click rate score of the search result over a predetermined period of time, a rank score corresponding to the search result, a closeness distance score between unremoved search terms in the search result, and a relevance score between the search result and a search term; and
And selecting the search results with the highest score as recalled information.
8. The method of claim 7, wherein the determining the score for the plurality of search results from the information recall based on the at least one scoring parameter comprises:
for each search result, respectively determining the sub-score of each scoring parameter in the at least one scoring parameter corresponding to the search result; and
and respectively carrying out weighted summation on the sub-scores of the search results corresponding to the scoring parameters of each item aiming at each search result to obtain the score of the search result.
9. An information recall device, comprising:
the search statement acquisition module is used for acquiring search statements;
the word segmentation module is used for segmenting the search sentence to obtain a plurality of search words;
the importance determining module is used for determining importance parameters of each search word respectively;
the search expression construction module is used for constructing at least one search expression based on the importance parameters of each search word; and
and the information recall module is used for carrying out information recall based on the at least one search expression.
10. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the information recall method according to any one of claims 1-8 when the program is executed.
11. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the information recall method of any one of claims 1-8.
12. A computer program product comprising computer program instructions which, when run on a computer, cause the computer to perform the information recall method of any one of claims 1-8.
CN202311640457.4A 2023-12-01 2023-12-01 Information recall method and related equipment Pending CN117609428A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311640457.4A CN117609428A (en) 2023-12-01 2023-12-01 Information recall method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311640457.4A CN117609428A (en) 2023-12-01 2023-12-01 Information recall method and related equipment

Publications (1)

Publication Number Publication Date
CN117609428A true CN117609428A (en) 2024-02-27

Family

ID=89947774

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311640457.4A Pending CN117609428A (en) 2023-12-01 2023-12-01 Information recall method and related equipment

Country Status (1)

Country Link
CN (1) CN117609428A (en)

Similar Documents

Publication Publication Date Title
JP7343568B2 (en) Identifying and applying hyperparameters for machine learning
US11782915B2 (en) Searchable index
CN110162695B (en) Information pushing method and equipment
US10387437B2 (en) Query rewriting using session information
JP5597255B2 (en) Ranking search results based on word weights
JP5913736B2 (en) Keyword recommendation
WO2020077824A1 (en) Method, apparatus, and device for locating abnormality, and storage medium
WO2013121181A1 (en) Method of machine learning classes of search queries
US9507853B1 (en) Synonym identification based on search quality
CN110737756B (en) Method, apparatus, device and medium for determining answer to user input data
US10146872B2 (en) Method and system for predicting search results quality in vertical ranking
JP2015500525A (en) Method and apparatus for information retrieval
US20180197531A1 (en) Domain terminology expansion by sensitivity
CN110990533A (en) Method and device for determining standard text corresponding to query text
CN112487283A (en) Method and device for training model, electronic equipment and readable storage medium
CN104615723B (en) The determination method and apparatus of query word weighted value
CN115630144A (en) Document searching method and device and related equipment
CN111144098B (en) Recall method and device for extended question
CN112214663A (en) Method, system, device, storage medium and mobile terminal for obtaining public opinion volume
CN110287284B (en) Semantic matching method, device and equipment
CN116383340A (en) Information searching method, device, electronic equipment and storage medium
US20230143777A1 (en) Semantics-aware hybrid encoder for improved related conversations
US20210026889A1 (en) Accelerated large-scale similarity calculation
CN117609428A (en) Information recall method and related equipment
AU2021289542B2 (en) Refining a search request to a content provider

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination