CN114116956A

CN114116956A - Retrieval method and device

Info

Publication number: CN114116956A
Application number: CN202010905669.0A
Authority: CN
Inventors: 李闯
Original assignee: Shanghai Xiaoi Robot Technology Co Ltd
Current assignee: Shanghai Xiaoi Robot Technology Co Ltd
Priority date: 2020-09-01
Filing date: 2020-09-01
Publication date: 2022-03-01

Abstract

The embodiment of the application provides a retrieval method, a retrieval device, electronic equipment and a computer-readable storage medium, and solves the problem that the retrieval accuracy cannot be improved under the condition of not reducing the retrieval recall rate. The retrieval method comprises the following steps: acquiring retrieval content input by a user; performing keyword extraction processing on the retrieval content to obtain at least one keyword; identifying a part of speech of each of the keywords; searching the keywords and the parts of speech thereof in a knowledge base according to a preset part of speech condition to obtain a search result; the part-of-speech condition comprises a plurality of part-of-speech classifications, and each part-of-speech classification corresponds to one retrieval priority or does not participate in matching information.

Description

Retrieval method and device

Technical Field

The present application relates to the field of retrieval technologies, and in particular, to a retrieval method, an apparatus, an electronic device, and a computer-readable storage medium.

Background

At present, massive structured and unstructured knowledge data exist in governments and enterprises, and data retrieval and search engines become necessary tools for users to find knowledge. And the standard for measuring the quality of the retrieval result is mainly the accuracy and the recall rate. The accuracy rate is also called precision rate, and is mainly used for evaluating the accuracy of the search result, i.e. the ratio of the number of documents related to the keyword in the search result to the number of documents unrelated to the keyword in the search result. The recall rate is also called recall rate, and is mainly used for evaluating the comprehensiveness of the search result, namely the ratio of the number of documents related to the keywords in the search result to the number of all documents related to the keywords in the knowledge base. The current full-text retrieval technology based on keyword matching is to match word strings directly input by a user in a knowledge base, or to divide words input by the user to obtain keywords, and then to match in the knowledge base directly according to the keywords to obtain a retrieval result containing one or more word strings or keywords.

Therefore, in the current full-text retrieval technology based on keyword matching, in order to ensure the retrieval accuracy, some retrieval results related to the keywords may be discarded, thus reducing the recall rate. In order to improve the recall rate, some retrieval results irrelevant to the keywords may be retrieved, so that the accuracy rate is reduced, namely the retrieval accuracy rate is low or the recall rate is low, and the retrieval accuracy rate cannot be improved under the condition of not reducing the retrieval recall rate.

Disclosure of Invention

In view of this, embodiments of the present application provide a retrieval method, an apparatus, an electronic device, and a computer-readable storage medium, which solve the problem that the accuracy of retrieval cannot be improved without reducing the retrieval recall rate.

According to an aspect of the present application, a retrieval method provided by an embodiment of the present application includes: acquiring retrieval content input by a user; performing keyword extraction processing on the retrieval content to obtain at least one keyword; identifying a part of speech of each of the keywords; searching the keywords and the parts of speech thereof in a knowledge base according to a preset part of speech condition to obtain a search result; the part-of-speech condition comprises a plurality of part-of-speech classifications, and each part-of-speech classification corresponds to one retrieval priority or does not participate in matching information.

According to another aspect of the present application, an embodiment of the present application provides a retrieval apparatus including: the acquisition module is configured to acquire retrieval contents input by a user; the extraction module is configured to perform keyword extraction processing on the retrieval content to obtain at least one keyword; an identification module configured to identify a part of speech of each of the keywords; the retrieval module is configured to retrieve the keywords and the parts of speech thereof in a knowledge base according to a preset part of speech condition to obtain a retrieval result; the part-of-speech condition comprises a plurality of part-of-speech classifications, and each part-of-speech classification corresponds to one retrieval priority and/or does not participate in matching information.

According to another aspect of the present application, an embodiment of the present application provides an electronic device including: a processor; and a memory having stored therein computer program instructions which, when executed by the processor, cause the processor to perform the retrieval method as set forth in any of the preceding claims.

According to another aspect of the present application, an embodiment of the present application provides a computer-readable storage medium having stored thereon computer program instructions, which, when executed by a processor, cause the processor to perform the retrieval method as described in any one of the preceding.

According to the retrieval method, the retrieval device, the electronic equipment and the computer readable storage medium, at least one keyword is obtained according to retrieval contents input by a user, the part of speech of each keyword is identified, and the keyword and the part of speech thereof are retrieved in a knowledge base according to a preset part of speech condition to obtain a retrieval result. The part-of-speech condition includes a plurality of part-of-speech classifications, each of which corresponds to a retrieval priority or does not participate in matching information. By enabling each part-of-speech classification to correspond to one retrieval priority, the most important part-of-speech can be set to be the highest priority, retrieval is carried out according to the retrieval priority corresponding to the part-of-speech classification, retrieval can be carried out according to the keyword corresponding to the part-of-speech with the highest retrieval priority, the retrieval result is obtained, and the retrieval accuracy is high. When the part of speech with the highest priority does not exist or the search is carried out according to the keywords corresponding to the part of speech with the highest priority and no search result is obtained, the search can be carried out according to the keywords with the next priority until the search result is obtained, so that the ratio of the number of the files related to the keywords in the search result to the number of the files unrelated to the keywords in the search result is improved, the ratio of the number of the files related to the keywords in the search result to the number of all the files related to the keywords in the knowledge base is not reduced, and the search accuracy is improved while the search recall rate is not reduced.

Drawings

Fig. 1 is a schematic flowchart illustrating a retrieval method according to an embodiment of the present application.

Fig. 2 is a schematic flowchart illustrating a retrieval method according to another embodiment of the present application.

Fig. 3 is a schematic flowchart illustrating a retrieval method according to another embodiment of the present application.

Fig. 4 is a schematic flowchart illustrating a retrieval method according to another embodiment of the present application.

Fig. 5 is a schematic flowchart illustrating a retrieval method according to another embodiment of the present application.

Fig. 6 is a schematic flowchart illustrating a retrieval method according to another embodiment of the present application.

Fig. 7 is a flowchart illustrating a retrieval method according to another embodiment of the present application.

Fig. 8 is a schematic structural diagram of a retrieval apparatus according to an embodiment of the present application.

Fig. 9 is a schematic structural diagram of a retrieval apparatus according to another embodiment of the present application.

Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a schematic flowchart illustrating a retrieval method according to an embodiment of the present application. As shown in fig. 1, the retrieval method further includes the steps of:

step 101: and acquiring the retrieval content input by the user.

When a user searches, he inputs search contents, for example, when the user wants to search for contents related to a credit card, he may input search contents of "credit card", or "credit card, deposit card", or "credit card deposit card", and when the user wants to search how to handle a credit card, he may input search contents of "how to handle a credit card? ", may also be" transacting credit card ", and may also be" transacting white credit card ". That is, the user can input a word string, a phrase, a sentence, or punctuation marks, etc. in the search box, without limiting the input format of the user.

Step 102: and performing keyword extraction processing on the retrieval content to obtain at least one keyword.

Since the search content input by the user may be a word string, a phrase, a sentence, or a punctuation mark, the content input by the user needs to be extracted to obtain a keyword. For example, when the search content input by the user is "credit card", the obtained keyword is "credit card". When the search content input by the user is 'credit card, deposit card' or 'credit card and deposit card', the obtained keywords are 'credit card' and 'deposit card'. When the search content input by the user is 'how to handle a credit card' or 'handle a credit card', the obtained keywords are 'handle' and 'credit card'. When the search content input by the user is "transacted white credit card", the obtained keywords are "transacted", "white" and "credit card". That is, the search content input by the user is extracted as a single word.

Step 103: a part-of-speech of each keyword is identified.

And after the keywords are obtained, identifying the parts of speech of the keywords. For example, the keywords are "transacted", "white", and "credit card", the part of speech of the identified keyword "transacted" is a verb, the part of speech of the identified keyword "white" is an adjective, and the keyword of the identified "credit card" is a noun.

Step 104: and searching the keywords and the parts of speech thereof in a knowledge base according to a preset part of speech condition to obtain a search result, wherein the part of speech condition comprises a plurality of part of speech classifications, and each part of speech classification corresponds to a search priority or does not participate in matching information.

The knowledge base may be a knowledge cluster comprising knowledge data. The preset part-of-speech condition may be a preset search rule regarding parts-of-speech. The part-of-speech condition may include a plurality of part-of-speech classifications having high to low retrieval priorities, one for each retrieval priority. For example, part-of-speech classifications may include nouns, verbs, and adjectives, the nouns may be the highest search priority, the verbs may be lower than the nouns, and the adjectives may be lower than the adverbs. The part-of-speech classification corresponding to the matching information is not involved in the search, for example, the part-of-speech classification may further include an adverb, and the adverb may not be involved in the search.

In one embodiment, the part-of-speech condition may include the following partial or full part-of-speech classifications with high to low retrieval priority: a first part of speech, a second part of speech, a third part of speech, and a fourth part of speech. For example, the part-of-speech condition may include only the first part-of-speech, may also include the first part-of-speech and the second part-of-speech, and may further include the second part-of-speech and the fourth part-of-speech, the type and number of the part-of-speech classifications included in the part-of-speech condition may be selected according to a specific application scenario, and the type and number of the part-of-speech classifications included in the part-of-speech condition are not specifically limited in the present application.

In one embodiment, the first part of speech includes nouns, the second part of speech includes verbs, the third part of speech includes adjectives, and the fourth part of speech includes adverbs.

In one embodiment, the first part of speech includes nouns and verbs, while the second part of speech includes adjectives and adverbs, the third part of speech includes pronouns, and the fourth part of speech includes conjunctions. The types and the numbers of the parts of speech included in the first part of speech, the second part of speech, the third part of speech and the fourth part of speech can be selected according to a specific application scenario, and the application does not specifically limit the types and the numbers of the parts of speech included in the first part of speech, the second part of speech, the third part of speech and the fourth part of speech.

In an embodiment, the part-of-speech condition further includes the following partial or complete part-of-speech classifications corresponding to the matching information not involved: conjunctions, prepositions, and pronouns. It should be understood that the part-of-speech classification corresponding to the non-matching information may also include a linguistic word, etc., and the part-of-speech classification corresponding to the non-matching information may be selected according to a specific application scenario, and the application does not specifically limit the type and number of the part-of-speech classification corresponding to the non-matching information.

Therefore, according to the retrieval method provided by the embodiment of the application, at least one keyword is obtained according to the retrieval content input by the user, the part of speech of each keyword is identified, and the keywords and the parts of speech thereof are retrieved in the knowledge base according to the preset part of speech conditions to obtain the retrieval result. The part-of-speech condition includes a plurality of part-of-speech classifications, each of which corresponds to a retrieval priority or does not participate in matching information. By enabling each part-of-speech classification to correspond to one retrieval priority, the most important part-of-speech can be set to be the highest priority, retrieval is carried out according to the retrieval priority corresponding to the part-of-speech classification, retrieval can be carried out according to the keyword corresponding to the part-of-speech with the highest retrieval priority, the retrieval result is obtained, and the retrieval accuracy is high. When the part of speech with the highest priority does not exist or the search is carried out according to the keywords corresponding to the part of speech with the highest priority and no search result is obtained, the search can be carried out according to the keywords with the next priority until the search result is obtained, so that the ratio of the number of the files related to the keywords in the search result to the number of the files unrelated to the keywords in the search result is improved, the ratio of the number of the files related to the keywords in the search result to the number of all the files related to the keywords in the knowledge base is not reduced, and the search accuracy is improved while the search recall rate is not reduced.

Fig. 2 is a schematic flowchart illustrating a retrieval method according to another embodiment of the present application. Step 104 in the embodiment shown in FIG. 1: the method for searching the keywords and the parts of speech thereof in the knowledge base according to the preset parts of speech conditions comprises the following steps:

step 201: and matching the keywords corresponding to one part of speech classification with the knowledge content in the knowledge base according to the retrieval priority of the part of speech classification.

The knowledge content can be a knowledge unit, and the knowledge unit is a basic unit forming a knowledge base and is a concept or thing which can independently express the attribute or the relation of a basic thinking object. Knowledge content may also be a combination of multiple knowledge units.

Specifically, the keywords corresponding to the part-of-speech classification with the highest search priority may be matched with the knowledge content in the knowledge base. For example, according to the search content input by the user, three keywords are extracted, the parts of speech corresponding to the three keywords are respectively a first part of speech, a second part of speech and a third part of speech with the search priority from high to low, the first part of speech is a noun, the second part of speech is a verb, the third part of speech is an adjective, and the keywords corresponding to the noun can be matched with the knowledge content in the knowledge base.

The keywords corresponding to the part-of-speech classification of any search priority can be matched with the knowledge content in the knowledge base. For example, according to the search content input by the user, three keywords are extracted, the parts of speech corresponding to the three keywords are respectively a first part of speech, a second part of speech and a third part of speech with the search priority from high to low, the first part of speech is a noun, the second part of speech is a verb, the third part of speech is an adjective, and the keywords corresponding to the verb can be matched with the knowledge content in the knowledge base.

In one embodiment, matching the keywords corresponding to one part of speech classification with the knowledge content in the knowledge base according to the retrieval priority of the part of speech classification comprises the following steps: and when the keywords corresponding to the part of speech classification are not included in at least one keyword, matching the keywords with the retrieval priority lower than the part of speech classification with the knowledge content in the knowledge base.

For example, the part-of-speech conditions include the following four part-of-speech classifications with high to low retrieval priority: the first part of speech is a noun, the second part of speech is a verb, the third part of speech is an adjective, and the fourth part of speech is an adverb. According to the search content 'quick transacting' input by the user, two keywords 'quick' and 'transacting' are extracted, wherein the part of speech corresponding to the first keyword 'quick' is an adverb, and the part of speech corresponding to the second keyword 'transacting' is a verb. That is, if the two keywords extracted according to the search content input by the user do not include the keyword whose part of speech is a noun, the keyword whose search priority is lower than the first part of speech is matched with the knowledge content in the knowledge base, that is, the keyword corresponding to the second part of speech is matched with the knowledge content in the knowledge base, that is, the keyword "transacts" is matched with the knowledge content in the knowledge base.

If the keyword 'transaction' is unsuccessfully matched with the knowledge content in the knowledge base, continuously matching the keywords with the retrieval priority lower than the second part of speech with the knowledge content in the knowledge base, namely matching the keywords with the part of speech being adjectives with the knowledge content in the knowledge base, but not matching the keywords with the part of speech being adjectives in two keywords extracted according to the retrieval content input by the user, matching the keywords with the retrieval priority lower than the third part of speech with the knowledge content in the knowledge base, namely matching the keywords corresponding to the fourth part of speech with the knowledge content in the knowledge base, namely matching the keywords corresponding to the adverb part of speech with the knowledge content in the knowledge base, namely quickly matching the keywords with the knowledge content in the knowledge base.

For another example, the part-of-speech conditions include the following three part-of-speech classifications with high to low retrieval priority: the first part of speech, the second part of speech and the third part of speech, wherein the first part of speech is noun and verb, the second part of speech is adjective, and the third part of speech is adverb. According to the search content 'transact credit card quickly' input by the user, three keywords 'transact' and 'credit card' are extracted, wherein the first keyword 'transact' corresponds to the part of speech as an adverb, the second keyword 'transact' corresponds to the part of speech as a verb, and the third keyword 'credit card' corresponds to the part of speech as a noun. Firstly, matching the keywords corresponding to the first part of speech with the knowledge content in the knowledge base, namely matching the keywords 'credit card' and 'transaction' with the knowledge content in the knowledge base.

If the matching of the keywords 'credit card' and 'transaction' with the knowledge content in the knowledge base is unsuccessful, the keywords with the retrieval priority lower than the first part of speech are continuously matched with the knowledge content in the knowledge base, namely the keywords with the part of speech being adjectives are matched with the knowledge content in the knowledge base, but the keywords with the part of speech being adjectives are not included in the three keywords extracted according to the retrieval content input by the user, the keywords with the retrieval priority lower than the second part of speech are matched with the knowledge content in the knowledge base, namely the keywords corresponding to the third part of speech are matched with the knowledge content in the knowledge base, namely the keywords corresponding to the adverb are matched with the knowledge content in the knowledge base, namely the keywords are quickly matched with the knowledge content in the knowledge base.

When at least one keyword does not comprise the keyword corresponding to the part of speech classification, the keyword with the retrieval priority lower than the part of speech classification is matched with the knowledge content in the knowledge base, so that the retrieval method is suitable for various retrieval contents input by a user, the condition that the retrieval is stopped or no retrieval result is directly output due to the fact that at least one keyword does not comprise the keyword corresponding to the part of speech classification can be avoided, the retrieval accuracy is improved, and the use experience of the user is improved.

Step 202: and when the keyword corresponding to the part of speech classification is unsuccessfully matched with the knowledge content in the knowledge base, matching the keyword with the retrieval priority lower than the part of speech classification with the knowledge content in the knowledge base.

Specifically, when the keyword corresponding to the part of speech classification with the highest retrieval priority is matched with the knowledge content in the knowledge base and the keyword corresponding to the part of speech classification with the highest retrieval priority is unsuccessfully matched with the knowledge content in the knowledge base, the keyword corresponding to the part of speech classification with the second priority may be continuously matched with the knowledge content in the knowledge base. For example, according to the search content input by the user, three keywords are extracted, the parts of speech corresponding to the three keywords are respectively a first part of speech, a second part of speech and a third part of speech with the search priority from high to low, the first part of speech is a noun, the second part of speech is a verb, the third part of speech is an adjective, and when the keyword corresponding to the noun with the highest search priority is matched with the knowledge content in the knowledge base and the matching is unsuccessful, the keyword corresponding to the verb with the second priority can be continuously used for matching with the knowledge content in the knowledge base.

When the keywords corresponding to the part of speech classification of any retrieval priority are matched with the knowledge content in the knowledge base and the keywords corresponding to the part of speech classification of the current retrieval priority are unsuccessfully matched with the knowledge content in the knowledge base, the keywords corresponding to the part of speech classification lower than the current retrieval priority can be continuously matched with the knowledge content in the knowledge base. For example, according to the search content input by the user, three keywords are extracted, the parts of speech corresponding to the three keywords are respectively a first part of speech, a second part of speech and a third part of speech with the search priority from high to low, the first part of speech is a noun, the second part of speech is a verb, the third part of speech is an adjective, when the matching is unsuccessful from the keyword corresponding to the verb and the knowledge content in the knowledge base, the keyword corresponding to the adjective with the search priority lower than the current search priority can be continuously matched with the knowledge content in the knowledge base.

In one embodiment, step 104 in the embodiment shown in FIG. 1: according to the preset part-of-speech conditions, the step of searching the keywords and the parts-of-speech thereof in the knowledge base further comprises the following steps:

step 203: and when the keyword corresponding to the part of speech classification is successfully matched with the knowledge content in the knowledge base, adding the knowledge content into the retrieval result.

For example, the keywords "credit card", "credit card" and the knowledge content "credit card and credit card" in the knowledge base are the credit certificates issued by the consumer who is qualified for credit by the commercial bank or credit card company. The card holder can overdraft within the specified limit, and the knowledge content can be added into a retrieval result obtained by retrieving through a credit card.

When the keyword corresponding to one part of speech classification is successfully matched with the knowledge content in the knowledge base, the knowledge content is added into the retrieval result, so that all the successfully matched knowledge content is added into the retrieval result, the comprehensiveness of the retrieval result is ensured, and the retrieval recall rate is improved.

Step 204: and judging whether all the keywords which are set to participate in matching are matched.

Specifically, the keywords set to participate in matching may be keywords other than keywords corresponding to part-of-speech classifications corresponding to non-matching information, among all keywords extracted according to search contents input by the user. Executing step 202 to match the keywords with the retrieval priority lower than the part of speech classification with the knowledge content in the knowledge base, if the matching is successful, executing step 203, if the matching is unsuccessful, continuously judging whether all the keywords which are set to participate in the matching are matched, and if the keywords which are set to participate in the matching are not matched, executing step 202.

Step 205: all keywords are determined to be unsuccessfully matched with the knowledge content in the knowledge base.

Specifically, if the result of the determination in step 204 is that all the keywords set to participate in matching are matched, it indicates that all the keywords are matched with the knowledge content in the knowledge base and the matching is unsuccessful, and therefore, it is determined that all the keywords are unsuccessfully matched with the knowledge content in the knowledge base.

When the matching of the keywords corresponding to one part of speech classification with the knowledge content in the knowledge base is unsuccessful, the keywords with the retrieval priority lower than the one part of speech classification are matched with the knowledge content in the knowledge base, the knowledge content with the high relevance degree with the keywords corresponding to the one part of speech classification is matched as far as possible, the situation that the retrieval is stopped or no retrieval result is directly output due to the fact that the keywords corresponding to the one part of speech classification are unsuccessfully matched with the knowledge content in the knowledge base can not occur, the retrieval accuracy is improved, and meanwhile the use experience of a user is improved.

Fig. 3 is a schematic flowchart illustrating a retrieval method according to another embodiment of the present application. Step 201 in the embodiment shown in fig. 2 comprises the following steps:

step 301: and matching the keywords corresponding to the part of speech classification with a preset field.

The knowledge base comprises at least one item of knowledge content, the knowledge content comprises a plurality of preset fields, and each preset field corresponds to one matching priority. The preset field can be all or part of the following fields with high matching priority: the knowledge data file name, knowledge all-level titles, knowledge keywords, knowledge linguistic data, knowledge abstracts, knowledge summary and knowledge full text. And the preset fields are sequenced according to the importance degrees of the preset fields, so that the retrieval efficiency and accuracy are further improved.

The file name of the knowledge data may be the name of a file storing one or more knowledge contents, where the file may be a word file or a pdf file, and the form of the file is not specifically limited in the present application. The knowledge level title may be a title of the knowledge content, for example, the knowledge content is "credit card definition: credit cards, also called debit cards, are credit certificates issued by commercial banks or credit card companies to consumers who qualify for credit. The credit card definition can be the title of the knowledge content, and the title of the knowledge content can be embodied in the forms of thickening, inclining, having the character size larger than that of other contents, or being displayed in a single line. The knowledge keyword may be a keyword extracted from the knowledge content, for example, the knowledge content is "credit card and credit card", which is a credit certificate issued by a commercial bank or credit card company to a consumer who is qualified for credit ", and the keyword may be" credit card, bank ". The knowledge corpus may be a language material explaining or remarking the knowledge content, for example, the knowledge content is "credit card also called credit card", which is a credit certificate issued by a commercial bank or a credit card company to a credit-qualified consumer, "the knowledge corpus may be" credit card definition "or" transact credit card "or" how to transact credit card ", and the like. The knowledge summary may be a section that summarizes important parts of the knowledge content. The knowledge summary may be a part of summarizing the knowledge content. The full knowledge text may be all parts about the knowledge content.

In one embodiment, the plurality of predetermined fields include the following fields with matching priorities from high to low: the knowledge data file name, knowledge all-level titles, knowledge keywords, knowledge linguistic data, knowledge abstracts, knowledge summary and knowledge full text. The matching priority order of the preset fields can be sorted according to the importance degree of the preset fields, and the preset fields can be set as many as possible.

The preset fields are sorted according to the importance degrees of the preset fields, so that the retrieval efficiency and accuracy are improved, the preset fields are provided as many as possible, the knowledge content of a knowledge base is enriched, and the retrieval content input by a user is prevented from being remote and incapable of obtaining a retrieval result.

It should be understood that the preset field may also be other fields, the type and number of the preset field may be selected according to a specific application scenario, and the type and number of the preset field are not specifically limited in this application.

It should also be understood that the matching priority order of the multiple preset fields may also be selected according to a specific application scenario, and the matching priority order of the preset fields is not specifically limited in the present application.

Step 302: and when the matching is successful, judging that the keywords corresponding to the part of speech classification are successfully matched with the knowledge content in the knowledge base.

Specifically, when the keyword corresponding to the part of speech classification is successfully matched with a preset field, the keyword corresponding to the part of speech classification is judged to be successfully matched with the knowledge content in the knowledge base. Because the preset fields have different priorities, as long as the preset field corresponding to one priority is successfully matched, the keyword corresponding to the part of speech classification is judged to be successfully matched with the knowledge content in the knowledge base, and the keyword cannot be continuously matched with the preset field with the matching priority lower than that of the preset field. For example, the plurality of preset fields include the following fields with matching priorities from high to low: and when the keyword corresponding to the part of speech classification is successfully matched with the knowledge data file name, judging that the keyword corresponding to the part of speech classification is successfully matched with the knowledge content in the knowledge base, and not matching with the knowledge titles at all levels.

When the keywords corresponding to one part of speech classification are successfully matched with one preset field, the keywords corresponding to the one part of speech classification are judged to be successfully matched with the knowledge content in the knowledge base, and the keywords do not need to be continuously matched with the preset field with the matching priority lower than the one preset field, so that the retrieval range is reduced, and the retrieval efficiency is improved.

In one embodiment, step 201 in the embodiment shown in FIG. 2 further comprises the steps of: matching the keywords corresponding to one part of speech classification with the knowledge content in the knowledge base further comprises the following steps:

step 303: and when the matching is unsuccessful, matching the keywords corresponding to the part of speech classification with a preset field with the matching priority lower than that of the preset field.

Specifically, when the matching of the keyword corresponding to the part of speech classification and a preset field is unsuccessful, the keyword corresponding to the part of speech classification is matched with the preset field with the matching priority lower than that of the preset field. For example, the plurality of preset fields include the following fields with matching priorities from high to low: and when the matching with the titles at all levels of knowledge is unsuccessful, the keywords are continuously matched with the keywords of knowledge until the matching is successful.

Step 304: and judging whether all preset fields which are set to participate in matching are matched.

Specifically, as shown in step 301, the type and number of the preset fields may be selected according to a specific application scenario, and the preset fields set to participate in matching may be preset fields set to participate in matching by a user or a developer of a search function in advance. In step 303: when the keyword corresponding to the part of speech classification is matched with a preset field with a matching priority lower than that of the preset field, if the matching is successful, executing step 302, and if the matching is unsuccessful, executing step 304: and judging whether all preset fields set to participate in matching are matched, and if all preset fields set to participate in matching have non-matched preset fields, continuing to execute the step 303.

Step 305: and judging that the keyword corresponding to the part of speech classification is unsuccessfully matched with the knowledge content in the knowledge base.

Specifically, if the step 304 is executed, and the determination result is that all preset fields set to participate in matching are matched, it indicates that the keyword corresponding to one part of speech classification is unsuccessfully matched with the knowledge content in the knowledge base, and therefore, it is determined that the keyword corresponding to one part of speech classification is unsuccessfully matched with the knowledge content in the knowledge base. Fig. 4 is a schematic flow chart of a retrieval method according to another embodiment of the present application. The step 104 of obtaining the search result in the embodiment shown in fig. 1 may include the following steps:

step 401: and matching the keywords and the parts of speech thereof in a knowledge base according to a preset part of speech condition, and outputting a matching result, wherein the matching result comprises at least one matching item.

The preset part-of-speech condition may be a preset search rule regarding parts-of-speech. The part-of-speech condition may include a plurality of part-of-speech classifications having high to low retrieval priorities, one for each retrieval priority. For example, part-of-speech classifications may include nouns, verbs, and adjectives, the nouns may be the highest search priority, the verbs may be lower than the nouns, and the adjectives may be lower than the adverbs.

The matching item can be a knowledge unit, and the knowledge unit is a basic unit forming a knowledge base and is a concept or thing which can independently express the attribute or the relation of the basic thinking object. The matching entry may also be a combination of multiple knowledge units. If the matching item is to be displayed, only part of the content of the matching item may be displayed, or the whole content of the matching item may be displayed.

Specifically, the keywords corresponding to one part of speech classification may be matched with the knowledge content in the knowledge base according to the retrieval priority of the part of speech classification, when the keywords corresponding to the one part of speech classification are unsuccessfully matched with the knowledge content in the knowledge base, the keywords having the retrieval priority lower than the one part of speech classification may be matched with the knowledge content in the knowledge base, and when the keywords corresponding to the one part of speech classification are successfully matched with the knowledge content in the knowledge base, the matching result may be output.

Step 402: and sequencing at least one matched item in the matching result and outputting a retrieval result.

Specifically, the matching items may be sorted according to a time sequence of successful matching of the matching items, or sorted according to attributes of the matching items, for example, sorted according to sizes of the matching items, where a small matching item is sorted before a large matching item, and a large matching item is sorted after the large matching item, for example, a matching item with a size of 5 megabytes is arranged before a matching item with a size of 8 megabytes, and a matching item with a size of 8 megabytes is arranged before a matching item with a size of 10 megabytes. The sorting rule for sorting at least one matching entry in the matching result can be selected according to a specific application scenario, and the sorting rule for sorting at least one matching entry in the matching result is not specifically limited in the present application.

By sequencing at least one matching item in the matching result, the matching items with larger association degree with the keywords can be arranged in front, and are preferentially output or output together and then preferentially displayed to the user, so that the retrieval accuracy is improved, and the use experience of the user is improved.

Fig. 5 is a schematic flowchart illustrating a retrieval method according to another embodiment of the present application. The matching result output in step 401 in the embodiment shown in fig. 4 may be implemented by the implementation in the embodiment shown in fig. 2 or fig. 3, that is, the matching result output in step 401 may be the retrieval result in step 203 in the embodiment shown in fig. 2 or fig. 3.

Fig. 6 is a schematic flowchart illustrating a retrieval method according to another embodiment of the present application. Step 402 in the embodiment shown in fig. 4 or fig. 5 comprises the following steps:

step 601: and calculating the total score of at least one matching item in the matching result.

Specifically, the score may be calculated by a preset scoring rule, for example, the score may be calculated according to the number of keywords included in the multiple matching entries, for example, the matching result includes 5 matching entries, a first matching entry includes 12 keywords, the score is 12, a second matching entry includes 8 keywords, the score is 8, a third matching entry includes 7 keywords, the score is 7, a fourth matching entry includes 5 keywords, the score is 5, a fifth matching entry includes 3 keywords, and the score is 3.

It should be understood that the calculation rule of the score and the size of the score may be selected according to the specific application scenario, and the calculation rule of the score and the size of the score are not specifically limited in the present application.

Step 602: the at least one matching item is ranked according to its score.

The at least one matching item is sorted according to the score of the at least one matching item, where the high-score arrangement is before, the low-score arrangement is after, or the low-score arrangement is before, and the high-score arrangement is after, and the specific sorting rule may be selected according to the calculation rule of the score and the specific application scenario, and the application does not specifically limit the sorting rule.

Fig. 7 is a flowchart illustrating a retrieval method according to another embodiment of the present application. Step 601 in the embodiment shown in fig. 6 comprises the following steps:

step 701: and counting a plurality of scores of a plurality of scoring elements respectively corresponding to the matched items, wherein the scoring elements comprise parts of speech corresponding to the keywords.

The score element is an element that evaluates how much the matching item scores. For example, the scoring components may include the part-of-speech corresponding to the keyword, the total number of occurrences of the keyword in the matching entry, and the size of the matching entry. Based on the above scoring components, the scoring rule of the scoring components is exemplified as follows: the matching items comprise 5 scores of the keywords of the part of speech of the noun, 3 scores of the keywords of the part of speech of the verb, 2 scores of the keywords of the part of speech of the adjective and 1 score of the keywords of the part of speech of the adverb; the total number of the keywords appearing in the matching items is more than 20 and 10 points, the total number is less than or equal to 20 and more than 15 and 8 points, the total number is less than or equal to 15 and more than 10 and 6 points, the total number is less than or equal to 10 and more than 5 and 4 points, and the total number is less than or equal to 5 and 2 points; the size of the matching entry is 10 minutes of 5 megabytes or less, 8 minutes of 10 megabytes or more, 6 minutes of 15 megabytes or less, 4 minutes of 20 megabytes or more, and 2 minutes of 20 megabytes or more.

Based on the above scoring rules of the scoring elements, the score of each scoring element of the matching result is obtained by statistics, for example, as follows: the matching result comprises 3 matching entries, the first matching entry comprises a keyword of a noun part of speech and a keyword of a verb part of speech, the score of the scoring element of the part of speech corresponding to the keyword is 5 points plus 3 points, namely 8 points, the total number of the keywords appearing in the matching entry is 12, the score of the scoring element of the total number of the keywords appearing in the matching entry is 6 points, the size of the matching entry is 25 megabytes, and the score of the scoring element of the size of the matching entry is 2 points; the second matching entry comprises a keyword of a noun part of speech and a keyword of an adjective part of speech, the score of the scoring element of the part of speech corresponding to the keyword is 5 points plus 2 points, namely 7 points, the total number of the keywords appearing in the matching entry is 18, the score of the scoring element of the total number of the keywords appearing in the matching entry is 8 points, the size of the matching entry is 8 megabytes, and the score of the scoring element of the size of the matching entry is 8 points; the third matching entry comprises a keyword of the part of speech of a noun and a keyword of the part of speech of an adverb, the score of the scoring element of the part of speech corresponding to the keyword is 5 points plus 1 point, namely 6 points, the total number of the keywords appearing in the matching entry is 6, the score of the scoring element of the total number of the keywords appearing in the matching entry is 4 points, the size of the matching entry is 8 megabytes, and the score of the scoring element of the size of the matching entry is 8 points.

Step 702: and calculating the total score of the matching item in a weighting and summing mode according to the preset weight corresponding to the scoring elements and the scores of the scoring elements corresponding to the matching item.

Specifically, each scoring element corresponds to a preset weight, and the total score of the matched items can be calculated in a weighted summation mode.

Based on the score of each scoring element of the matching result obtained by statistics in the above example, the total score of the matching item is as follows: the part of speech corresponding to the keyword is weighted to be 50%, the total number of the keywords appearing in the matched entry is weighted to be 30%, the size of the matched entry is weighted to be 20%, and the total of the first matched entry is: divide by 8 + 50% +6 + 30% +2 + 20% + 6.2, the total of the first matched entries: score 7 + 50% +8 + 30% +8 + 20% ═ 7.5, the total score of the first matched entries: 50% in 6 parts + 30% in 4 parts + 20% in 8 parts-5.8 parts.

In one embodiment, the higher the retrieval priority, the higher the score of the part-of-speech classification. For example, the part-of-speech condition may include the following partial or full part-of-speech classifications with high to low retrieval priority: the first part of speech, the second part of speech and the third part of speech, then the matching item includes the keyword of the first part of speech with the highest first score when scoring, the matching item includes the keyword of the second part of speech with the second score lower than the first score when scoring, and the matching item includes the keyword of the third part of speech with the third score lower than the second score when scoring. In one embodiment, the first part-of-speech is a noun, the second part-of-speech is a verb, and the third part-of-speech is an adjective, then the matching entry includes a first score that is highest when the keyword of the part-of-speech of the noun is included, the matching entry includes a second score that is lower than the first score when the keyword of the part-of-speech of the verb is included, and the matching entry includes a third score that is lower than the second score when the keyword of the part-of-speech of the adjective is included.

Fig. 8 is a schematic structural diagram of a retrieval apparatus according to an embodiment of the present application. As shown in fig. 8, the search device 80 includes: an obtaining module 801 configured to obtain retrieval content input by a user; an extraction module 802 configured to perform keyword extraction processing on the search content to obtain at least one keyword; an identification module 803 configured to identify a part of speech of each keyword; and a retrieval module 804 configured to retrieve the keyword and the part of speech thereof in the knowledge base according to a preset part of speech condition to obtain a retrieval result, wherein the part of speech condition includes a plurality of part of speech classifications, and each part of speech classification corresponds to a retrieval priority and/or does not participate in matching information.

In an embodiment, the retrieval module 804 is further configured to: matching keywords corresponding to one part of speech classification with knowledge contents in a knowledge base according to the retrieval priority of the part of speech classification, and matching keywords with retrieval priority lower than the one part of speech classification with the knowledge contents in the knowledge base when the keywords corresponding to the one part of speech classification are unsuccessfully matched with the knowledge contents in the knowledge base; and when the keyword corresponding to the part of speech classification is successfully matched with the knowledge content in the knowledge base, adding the knowledge content into the retrieval result.

In an embodiment, the retrieval module 804 is further configured to: and when the keywords corresponding to the part of speech classification are not included in at least one keyword, matching the keywords with the retrieval priority lower than the part of speech classification with the knowledge content in the knowledge base.

In an embodiment, the retrieval module 804 is further configured to: matching the keywords corresponding to the part of speech classification with a preset field; and when the matching is successful, judging that the keywords corresponding to the part of speech classification are successfully matched with the knowledge content in the knowledge base, wherein the knowledge base comprises at least one item of knowledge content, the knowledge content comprises a plurality of preset fields, and each preset field corresponds to a matching priority.

In an embodiment, the retrieval module 804 is further configured to: and when the matching is unsuccessful, matching the keywords corresponding to the part of speech classification with a preset field with the matching priority lower than that of the preset field.

In an embodiment, the retrieval module 804 is further configured to: and matching the keywords and the parts of speech thereof in a knowledge base according to a preset part of speech condition, and outputting a matching result, wherein the matching result comprises at least one matching item.

Fig. 9 is a schematic structural diagram of a retrieval apparatus according to an embodiment of the present application. The retrieval means 80 further includes: an ordering module 901. The sorting module 901 is configured to: and sequencing at least one matched item in the matching result and outputting a retrieval result.

In one embodiment, the sorting module 901 includes: a calculation score unit 9011 and an entry sort unit 9012. The calculation scoring unit 9011 is configured to: a score is calculated for at least one matching entry in the matching results. The entry sorting unit 9012 is configured to: the at least one matching item is ranked according to its score.

In an embodiment, the calculation scoring unit 9011 is further configured to: counting a plurality of scores of a plurality of scoring elements respectively corresponding to the matched items, wherein the scoring elements comprise parts of speech corresponding to the keywords; and calculating the total score of the matching item in a weighting and summing mode according to the preset weight corresponding to the scoring elements and the scores of the scoring elements corresponding to the matching item.

In an embodiment, the calculation scoring unit 9011 is further configured to: the higher the retrieval priority, the higher the score of part-of-speech classification.

Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 10, the electronic apparatus 100 includes: one or more processors 1001 and memory 1002; and computer program instructions stored in the memory 1002 which, when executed by the processor 1001, cause the processor 1001 to perform a retrieval method as in any of the embodiments described above.

The processor 1001 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 100 to perform desired functions.

Memory 1002 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer readable storage medium and executed by the processor 1001 to implement the steps in the retrieval method of the various embodiments of the present application described above and/or other desired functions. Information such as common keywords, common search content, common knowledge data, and the like may also be stored in the computer-readable storage medium.

In one example, the electronic device 100 may further include: an input device 1003 and an output device 1004, which are interconnected by a bus system and/or other form of connection mechanism (not shown in fig. 10).

For example, when the electronic device 100 is a stand-alone device, the input device 1003 may be a communication network connector for receiving the collected input signal from an external removable device. The input device 1003 may include, for example, a keyboard, a mouse, a microphone, and the like.

The output device 1004 may output various information to the outside, and may include, for example, a display, a speaker, a printer, and a communication network and a remote output apparatus connected thereto.

Of course, for the sake of simplicity, only some of the components related to the present application in the electronic apparatus 100 are shown in fig. 10, and components such as a bus, an input device/output interface, and the like are omitted. In addition, electronic device 100 may include any other suitable components depending on the particular application.

In addition to the above-described methods and apparatuses, embodiments of the present application may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps of the retrieval method of any of the above-described embodiments.

The computer program product may include program code for carrying out operations for embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform the steps in the retrieval method of the various embodiments of the present application.

A computer-readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a random access memory ((RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.

The block diagrams of the devices and apparatuses referred to in this application are only given as illustrative examples and are not intended to require or imply that the devices and apparatuses must be connected, arranged, or configured in the manner shown in the block diagrams. These devices and apparatuses may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modifications, equivalents and the like that are within the spirit and principle of the present application should be included in the scope of the present application.

Claims

1. A retrieval method, comprising:

acquiring retrieval content input by a user;

performing keyword extraction processing on the retrieval content to obtain at least one keyword;

identifying a part of speech of each of the keywords; and

searching the keywords and the parts of speech thereof in a knowledge base according to a preset part of speech condition to obtain a search result;

the part-of-speech condition comprises a plurality of part-of-speech classifications, and each part-of-speech classification corresponds to one retrieval priority or does not participate in matching information.

2. The retrieval method of claim 1, wherein the part-of-speech condition comprises a plurality of part-of-speech classifications, each of the part-of-speech classifications corresponding to a retrieval priority;

the searching the keywords and the parts of speech thereof in the knowledge base according to the preset parts of speech conditions comprises the following steps:

matching the keywords corresponding to one part of speech classification with knowledge contents in the knowledge base according to the retrieval priority of the part of speech classification; and

and when the keyword corresponding to the part of speech classification is unsuccessfully matched with the knowledge content in the knowledge base, matching the keyword with the retrieval priority lower than the part of speech classification with the knowledge content in the knowledge base.

3. The method of claim 2, wherein the retrieving the keywords and their parts of speech in the knowledge base according to the predetermined part of speech condition further comprises:

and when the keyword corresponding to the part of speech classification is successfully matched with the knowledge content in the knowledge base, adding the knowledge content into the retrieval result.

4. The method of claim 2, wherein matching keywords corresponding to one of the part-of-speech classifications with knowledge content in the knowledge base according to the retrieval priority of the part-of-speech classification comprises:

and when the keywords corresponding to the part of speech classification are not included in the at least one keyword, matching the keywords with the retrieval priority lower than the part of speech classification with the knowledge content in the knowledge base.

5. The retrieval method of claim 2, wherein the knowledge base comprises at least one knowledge content, the knowledge content comprises a plurality of preset fields, and each preset field corresponds to a matching priority;

wherein the matching the keyword corresponding to one part of speech classification with the knowledge content in the knowledge base comprises:

matching the keyword corresponding to the part of speech classification with one preset field; and

and when the matching is successful, judging that the keyword corresponding to the part of speech classification is successfully matched with the knowledge content in the knowledge base.

6. The method of claim 5, wherein matching the keywords corresponding to one of the part-of-speech classifications with knowledge content in the knowledge base further comprises:

and when the matching is unsuccessful, matching the keyword corresponding to the part of speech classification with the preset field with the matching priority lower than that of the preset field.

7. The retrieval method of claim 1, wherein the part-of-speech condition comprises a plurality of part-of-speech classifications, each of the part-of-speech classifications corresponding to a retrieval priority;

the method comprises the following steps of retrieving the keywords and the parts of speech thereof in a knowledge base according to a preset part of speech condition, and obtaining a retrieval result, wherein the retrieval result comprises the following steps:

matching the keywords and the parts of speech thereof in a knowledge base according to a preset part of speech condition, and outputting a matching result, wherein the matching result comprises at least one matching item; and

sorting the at least one matching item in the matching result, and outputting a retrieval result;

wherein said sorting said at least one matching entry in said matching result comprises:

calculating a total score of the at least one matching entry in the matching result; and

and sorting the at least one matching item according to the score of the at least one matching item.

8. The retrieval method of claim 7, wherein the calculating the total score of the at least one matching entry in the matching result comprises:

counting a plurality of scores of a plurality of scoring elements respectively corresponding to the matching items, wherein the scoring elements comprise the part of speech corresponding to the keyword; and

and calculating the total score of the matched item in a weighted summation mode according to the preset weight corresponding to the scoring elements and the scores of the scoring elements corresponding to the matched item.

9. A retrieval apparatus, comprising:

the acquisition module is configured to acquire retrieval contents input by a user;

the extraction module is configured to perform keyword extraction processing on the retrieval content to obtain at least one keyword;

an identification module configured to identify a part of speech of each of the keywords; and

the retrieval module is configured to retrieve the keywords and the parts of speech thereof in a knowledge base according to a preset part of speech condition to obtain a retrieval result; the part-of-speech condition comprises a plurality of part-of-speech classifications, and each part-of-speech classification corresponds to one retrieval priority and/or does not participate in matching information.

10. An electronic device, comprising:

a processor; and

a memory having stored therein computer program instructions which, when executed by the processor, cause the processor to perform the retrieval method of any of claims 1 to 8.

11. A computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform a retrieval method as recited in any one of claims 1 to 8.