CN114281965A - Information retrieval method, device, electronic equipment and storage medium - Google Patents

Information retrieval method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114281965A
CN114281965A CN202111394897.7A CN202111394897A CN114281965A CN 114281965 A CN114281965 A CN 114281965A CN 202111394897 A CN202111394897 A CN 202111394897A CN 114281965 A CN114281965 A CN 114281965A
Authority
CN
China
Prior art keywords
target
candidate
determining
entity
target search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111394897.7A
Other languages
Chinese (zh)
Inventor
朱嘉琪
卢佳俊
柴春光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111394897.7A priority Critical patent/CN114281965A/en
Publication of CN114281965A publication Critical patent/CN114281965A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides an information retrieval method, an information retrieval device, an electronic device and a storage medium, and particularly relates to the technical field of artificial intelligence such as natural language processing and deep learning. The specific scheme is as follows: determining a target search word corresponding to the search sentence; retrieving a preset knowledge graph based on the target retrieval word to obtain a candidate triple associated with the target retrieval word; determining the weight of each candidate triple according to the matching degree of each candidate triple and the target search word and the corresponding knowledge type of each candidate triple; updating the target search term according to the weight of each candidate triple; and determining a target retrieval result based on the updated target retrieval words. Therefore, the accurate and reliable target retrieval result can be determined by reasoning and updating the retrieval words based on the knowledge types corresponding to the candidate triples and the matching degree of the retrieval words with the target retrieval words, so that the cost required by information retrieval is reduced, and the accuracy and reliability of the information retrieval are improved.

Description

Information retrieval method, device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to the field of artificial intelligence technologies such as natural language processing and knowledge profiles, and in particular, to an information retrieval method, apparatus, electronic device, and storage medium.
Background
With the vigorous development of computer technology, the fields of deep learning, natural language processing, knowledge maps and the like are rapidly developed, and the application of knowledge inference technology is more and more extensive. For example, in the field of search, when there is no search result matching a search term in a database, it is generally necessary to obtain an accurate search result by using knowledge inference.
In the related technology in the retrieval field, the reliability and accuracy of knowledge inference mainly depend on the structured knowledge base, but the complexity and cost of obtaining the structured knowledge base are high. Therefore, if the accuracy and reliability of the search are improved, a problem to be solved is required at present.
Disclosure of Invention
The disclosure provides an information retrieval method, an information retrieval device, an electronic device and a storage medium.
In one aspect of the present disclosure, an information retrieval method is provided, including:
determining a target search word corresponding to the search sentence;
retrieving a preset knowledge graph based on the target retrieval word to obtain a candidate triple associated with the target retrieval word;
determining the weight of each candidate triple according to the matching degree of each candidate triple and the target search word and the knowledge type corresponding to each candidate triple;
updating the target search term according to the weight of each candidate triple;
and determining a target retrieval result based on the updated target retrieval words.
In another aspect of the present disclosure, there is provided an information retrieval apparatus including:
the first determining module is used for determining a target search term corresponding to the search statement;
the acquisition module is used for retrieving a preset knowledge graph based on the target retrieval word so as to acquire a candidate triple associated with the target retrieval word;
the second determining module is used for determining the weight of each candidate triple according to the matching degree of each candidate triple and the target search word and the knowledge type corresponding to each candidate triple;
the updating module is used for updating the target search term according to the weight of each candidate triple;
and the third determining module is used for determining a target retrieval result based on the updated target retrieval words.
In another aspect of the present disclosure, an electronic device is provided, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of information retrieval as described in embodiments of the above aspect.
In another aspect of the present disclosure, a non-transitory computer-readable storage medium storing a computer program is provided, the computer program being configured to cause a computer to perform an information retrieval method according to an embodiment of the above-described aspect.
In another aspect of the present disclosure, a computer program product is provided, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the information retrieval method according to the embodiment of the above aspect.
According to the information retrieval method, the information retrieval device, the electronic equipment and the storage medium, the target retrieval word corresponding to the retrieval statement can be determined firstly, then the preset knowledge map is retrieved based on the target retrieval word to obtain the candidate triples related to the target retrieval word, the weight of each candidate triplet is determined according to the matching degree of each candidate triplet and the target retrieval word and the knowledge type corresponding to each candidate triplet, then the target retrieval word is updated according to the weight of each candidate triplet, and the target retrieval result is determined based on the updated target retrieval word. Therefore, in the process of retrieval, the target retrieval words can be retrieved in the knowledge map to obtain associated candidate triples, and then the target retrieval words are inferred and updated according to the knowledge types corresponding to the candidate triples and the matching degree of the target retrieval words, so that accurate and reliable target retrieval results can be determined, the cost required by information retrieval is reduced, and the accuracy and reliability of the information retrieval are improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a schematic flow chart of an information retrieval method according to an embodiment of the present disclosure;
fig. 1A is a schematic diagram of a candidate triplet provided in an embodiment of the present disclosure;
fig. 2 is a schematic flow chart of an information retrieval method according to another embodiment of the present disclosure;
fig. 2A is a schematic diagram of a preset knowledge map library according to an embodiment of the present disclosure;
FIG. 2B is a schematic diagram of a default knowledge-graph according to an embodiment of the present disclosure;
fig. 2C is a schematic diagram of an information retrieval process according to an embodiment of the disclosure;
fig. 3 is a schematic flowchart of an information retrieval method according to another embodiment of the present disclosure;
FIG. 3A is a schematic diagram of a default knowledge-graph according to an embodiment of the present disclosure;
fig. 4 is a schematic flowchart of an information retrieval method according to another embodiment of the present disclosure;
fig. 4A is a schematic diagram of an information retrieval process according to an embodiment of the disclosure;
fig. 4B is a schematic diagram of an information retrieval framework according to an embodiment of the disclosure;
fig. 5 is a schematic structural diagram of an information retrieval apparatus according to another embodiment of the present disclosure;
fig. 6 is a block diagram of an electronic device for implementing an information retrieval method according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Artificial intelligence is the subject of research that makes computers simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning technology, a deep learning technology, a big data processing technology, a knowledge map technology and the like.
Natural language processing is the computer processing, understanding and use of human languages (such as chinese, english, etc.), which is a cross discipline between computer science and linguistics, also commonly referred to as computational linguistics. Since natural language is the fundamental mark that humans distinguish from other animals. Without language, human thinking has not been talk about, so natural language processing embodies the highest task and context of artificial intelligence, that is, only when a computer has the capability of processing natural language, the machine has to realize real intelligence.
A knowledge graph is essentially a semantic network, and is a graph-based data structure, consisting of nodes and edges. In the knowledge graph, each node represents an entity existing in the real world, and each edge is a relationship between the entities. Generally, a knowledge graph is a relationship network obtained by connecting all kinds of information together, and provides the ability to analyze problems from the perspective of relationships.
An information retrieval method, an apparatus, an electronic device, and a storage medium of the embodiments of the present disclosure are described below with reference to the accompanying drawings.
The information retrieval method of the embodiment of the disclosure can be executed by the information retrieval device provided by the embodiment of the disclosure, and the device can be configured in electronic equipment.
Fig. 1 is a schematic flow chart of an information retrieval method according to an embodiment of the present disclosure.
As shown in fig. 1, the information retrieval method may include the following steps:
step 101, determining a target search term corresponding to a search statement.
The target search term may be one or multiple, and this disclosure does not limit this.
The retrieval statement may be a text type or a speech type, and the speech type statement may be processed first to obtain a text type retrieval statement, and the like.
It can be understood that after the search sentence is obtained, word segmentation processing, entity recognition and other processing may be performed on the search sentence to determine the corresponding target search word.
For example, if the search statement is "X that fast male", the target search term determined by processing the search statement may be: WeiX, quick male.
It should be noted that the above examples are merely illustrative, and are not intended to limit the search term, the target search term, the manner of determining the target search term, and the like in the embodiments of the present disclosure.
And 102, retrieving a preset knowledge graph based on the target retrieval word to acquire a candidate triple associated with the target retrieval word.
The preset knowledge graph can be a knowledge graph related to the target retrieval word. The present disclosure is not limited thereto.
In addition, a candidate triple, may include [ knowledge type; an entity; text ]. The knowledge type may be various, such as time, place, people, events, and so on. The present disclosure is not limited thereto.
It is to be understood that there may be one or more candidate triples obtained and associated with the target search term, and this disclosure does not limit this.
For example, under the condition that the retrieval statement is "who the city river weir is built", and the target retrieval term is "city river weir" or "build", the preset knowledge graph is retrieved, and the candidate triples related to "city river weir" or "build" are obtained as shown in fig. 1A, as can be seen from fig. 1A, the candidate triples are respectively: 256 years before the yuan; qin Zhao Xiangwang; text 1], [ 256 years before the metric); a fish mouth water diversion dike; text 1], [ 256 years before the metric); a spillway of a sand dam; text 1], [ 256 years before the metric); plum ice; text 2], [ 256 years before the metric); the success rate is high; text 2], [ 256 years before the metric); plum ice; text 2 ].
Wherein the text 1 may be "Qin ZhaowangIn fifty-one years (256 years before the era), Tai' e Li ice and his son in Shu county of Qin could absorb the water treatment experience of the predecessors, take the lead of the local population, and build the famous water conservancy project of the Dujiang weir by the host. The overall planning of the Yangtze weir is to divide Minjiang water flow into two flows, wherein one flow is introduced into Chengjiang plain, so that flood diversion and disaster reduction can be realized, water can be introduced into fields, and harm is changed into benefit. The main body engineering comprisesFish mouth water diversion dikeSpillway of flying sand weirHebao bottle mouthWater inlet ". Text 2 may be "256 years ago b.c. and" the county of Qin county "Tai Hu guard during the period of warLi BingDujiang weir hydraulic engineering constructed by Oudenro and located in SichuanChengduOn the west side of the city of the river weir on the west of the plainRegainAnd 56 km away from the root.
It should be noted that the above examples are only illustrative, and should not be taken as limitations on target search terms, candidate triples, and the like in the embodiments of the present disclosure.
And 103, determining the weight of each candidate triple according to the matching degree of each candidate triple and the target search word and the knowledge type corresponding to each candidate triple.
It is understood that the higher the matching degree of the candidate triple with the target search word, the greater the weight of the candidate triple.
For example, if the target term is: weix, fast male, and the obtained candidate triple 1: [ 2007; happy man's voice; in 2007, the national general finals, official shows, were obtained by attending happy men's voice ], the candidate triple 2: [ 2007; happy man's voice; wei X participates in the competition in the Chengdu singing area of Happy Men to obtain the army in the Chengdu singing area and participates in the national Jumper game, and the score of the national Jumper season is obtained in the Jumper game.
Then, according to semantic matching, similarity calculation and other manners, the matching degrees of the candidate triple 1 and the candidate triple 2 with the target search term are respectively determined, for example, through semantic matching, the matching degree of the candidate triple 1 is determined to be 80%, and the matching degree of the candidate triple 2 is determined to be 90%. The knowledge types of the candidate triple 1 and the candidate triple 2 are: in 2007, it may be determined that the two corresponding knowledge types are the same, and thus it may be determined that the weights of the candidate triples 1 and 2 are related to the matching degree, for example, the weight of the candidate triplet 1 may be 0.45, and the weight of the candidate triplet 2 may be 0.55.
It should be noted that the above examples are only illustrative, and should not be taken as limitations on the number, content, matching degree, knowledge type, and weight of each candidate triple in the embodiments of the present disclosure.
And 104, updating the target search term according to the weight of each candidate triple.
Optionally, the knowledge type and the entity included in the candidate triple with the weight greater than the threshold value may be used together with the original target search term as the updated target search term. Or, the candidate triples may be arranged according to the weight, and the knowledge types and entities included in the previous preset number of candidate triples, and the original target search term are used as the updated target search term. Or the knowledge type and the entity contained in the candidate triple with the largest weight and the original target search word can be used as the updated target search word together. The present disclosure is not limited thereto.
For example, if the weight of the candidate triple 1 is 0.8, the weight of the candidate triple 2 is 0.05, and the weight of the candidate triple 3 is 0.15, where the weight of the candidate triple 1 is the largest, the target term may be updated according to the candidate triple 1. For example, the knowledge type and the entity contained in the candidate triple 1 may be used as a new target search term, and together with the original target search term, the new target search term may be used as an updated target search term, and the like, which is not limited by the present disclosure.
Alternatively, different knowledge types contained in the candidate triple can be used as the new target search term. For example, the weights of the candidate triple 1 and the candidate triple 2 are both 0.5, the knowledge type in the candidate triple 1 is "Hunan", and the knowledge type in the candidate triple 2 is "2007". If the original target search term is "fast male", the updated target search term may be: in 2007, Hunan Kuaiman.
It should be noted that the above example is only an illustrative example, and cannot be taken as a limitation on the number and weight of the total candidate triples and the updated target search term in the embodiment of the present disclosure.
It can be understood that, in the embodiment of the present disclosure, the target search term may be searched in the preset knowledge graph to obtain the associated candidate triples, and then the target search term is updated according to the weight of each candidate triplet, so that the updated target search term is more accurate and reliable. That is, in the process of updating the target search term, a large-scale knowledge base is not required to be relied on, the search term can be inferentially updated based on the knowledge type corresponding to the candidate triple and the matching degree with the target search term, so that the target search term is updated, the large-scale knowledge base is not required to be constructed in the process, and the complex work and cost for constructing the large-scale knowledge model base are reduced.
And 105, determining a target retrieval result based on the updated target retrieval words.
For example, the knowledge type corresponding to the candidate triple with the largest weight is time "2007", and if the target search term determined according to the search statement is "wei X fast male", the updated target search term may be "2007 fast male". And then, the 'fast male 2007' can be used for searching in a preset knowledge graph, so that a target searching result is determined.
It should be noted that the above examples are merely illustrative, and should not be taken as limitations on target search terms, target search results, and the like after updating in the embodiments of the present disclosure.
According to the embodiment of the disclosure, a target search word corresponding to a search statement may be determined, then a preset knowledge map is searched based on the target search word to obtain candidate triples associated with the target search word, then the weight of each candidate triplet is determined according to the matching degree of each candidate triplet and the target search word and the knowledge type corresponding to each candidate triplet, then the target search word is updated according to the weight of each candidate triplet, and a target search result is determined based on the updated target search word. Therefore, in the process of retrieval, the target retrieval words can be retrieved in the knowledge map to obtain associated candidate triples, and then the target retrieval words are inferred and updated according to the knowledge types corresponding to the candidate triples and the matching degree of the target retrieval words, so that accurate and reliable target retrieval results can be determined, the cost required by information retrieval is reduced, and the accuracy and reliability of the information retrieval are improved.
In the above embodiment, in the process of searching, the target search word may be searched in the knowledge graph to obtain associated candidate triples, then the weight of each candidate triplet is determined according to the matching degree between the candidate triplet and the target search word and the corresponding knowledge type, then the target search word is updated based on the weight, and then the updated target search word is used for searching, so that the determined target search word is more accurate and reliable. In the actual implementation process, the preset knowledge graph library may also be searched according to the main entity and the candidate answer entity determined by the retrieval statement to generate a preset knowledge graph, and the above process is described in detail with reference to fig. 2.
Fig. 2 is a schematic flow chart of an information retrieval method provided in an embodiment of the present disclosure, and as shown in fig. 2, the information retrieval method may include the following steps:
step 201, preprocessing the obtained search statement to determine the main entity information, the target entity information and the core keyword corresponding to the search statement.
The main entity information, the target entity information, and the core keyword corresponding to the search term can be determined by preprocessing, such as SP (substance predicate) analysis, LAT (target type) identification, constraint identification, and concept identification, on the acquired search term.
It can be understood that by performing SP analysis processing on the search statement, the corresponding main entity and core keyword can be determined; the target type of the retrieval statement, the target type of the main entity and the like can be determined through LAT identification; entities, core keywords and the like can be determined through limited identification; the main entity information and the like can be determined through concept recognition. The present disclosure is not limited thereto.
Optionally, the primary entity information may include at least one of the following: a primary entity identity and a type of primary entity.
The primary entity identifier may be information, a tag, etc. that uniquely characterizes the primary entity. For example, for "apple" it may have a variety of meanings. If fruit is represented, its corresponding identifier may be 11222; when "apple" in the electronic product is represented, the corresponding main entity identification may be: 22113, and so on. The present disclosure is not limited thereto.
In addition, the type of the main entity may be divided into a concept entity and a non-concept entity, which is not limited in this disclosure.
It is to be understood that a concept class entity may be an entity having a specific reference object. For example, for the concept class entity "emperor", it may be: tang Tai Zong, Song Tai Zu, etc.; and the "fast man" is not a specific object, i.e. a non-conceptual class entity. The present disclosure is not limited thereto.
Optionally, the target entity information may include at least one of the following: the target entity identification and the type of the target entity.
The target entity identifier may also be a unique identifier for characterizing the entity, and the type of the target entity may also be divided into a concept entity and a non-concept entity, which are not described herein again.
The core keyword may be a word representing an action, such as a verb predicate, or may be another word. For example, in the search sentence "all rivers weir is built", the core keyword is "build". The present disclosure is not limited thereto.
Step 202, determining a target search term corresponding to the search statement according to the main entity information, the target entity information and the core keyword.
Optionally, in the case that the type in the target entity information is a concept class, the target search term may be determined according to the core keyword and the identifier in the main entity information.
For example, the retrieval statement is "created when the city river weir is which emperor is in place", wherein the target entity information is "emperor", which is a concept entity, the core keyword "created", and the main entity is "city river weir", and then the target retrieval word can be determined as follows: and (5) constructing and constructing the river weir.
It should be noted that the above examples are merely illustrative, and should not be taken as limitations on the search statement, the main entity information, the target entity information, the core keyword, and the like in the embodiments of the present disclosure.
Optionally, under the condition that the type in the target entity information is a non-concept type, the target search term is determined according to the identifier in the main entity information and the identifier in the target entity information.
For example, the search statement is "X that fast male", wherein "fast male" in the target entity information is a non-conceptual entity, and then the target search term may be determined according to the identifier "X" in the main entity information and the identifier "fast male" in the target entity information: WeiX, quick male.
It should be noted that the above examples are merely illustrative, and should not be taken as limitations on the search statement, the main entity information, the target entity information, the core keyword, and the like in the embodiments of the present disclosure.
Therefore, in the embodiment of the disclosure, the target search term can be determined in different modes according to the type in the target entity information, so that the determined target search term can be more accurate and reliable.
Step 203, determine the main entity and candidate answer entity corresponding to the search statement.
The number of candidate answer entities corresponding to the search statement may be one or multiple, and this disclosure does not limit this.
It is understood that the candidate answer entity may be determined according to the target entity information, or may be determined in other manners, which is not limited in this disclosure.
For example, the search statement is that "the city river weir is built when which emperor is in place", the main entity is "the city river weir", the target entity is "the emperor", and the candidate answer entity may be entity information corresponding to "the emperor", for example: qin Zhaoxiang, Qinshihuang, Qin Huiwen king, etc., which are not limited by the disclosure.
And step 204, searching each path from the main entity to each candidate answer entity from a preset knowledge map library.
It is understood that the preset knowledge map library may be generated by analyzing and processing the content in the encyclopedic knowledge library, such as entity recognition, semantic analysis, and the like, to determine the entity, knowledge type, text, and the like contained therein, and then based on the [ entity, knowledge type, text ] triples. The present disclosure is not limited thereto.
The preset knowledge map library may store a large number of knowledge maps, which may include various entities and relationships between the entities, and the like, which is not limited in this disclosure.
It is understood that the path from the main entity to the candidate answer entity searched from the preset knowledge graph may be one or multiple, and the disclosure is not limited thereto.
For example, in the predetermined knowledge map library shown in fig. 2A, if the main entity is B and the candidate answer entity is F, it can be determined from fig. 2A that the paths from the main entity "B" to the candidate answer entity "F" are respectively: BCF, BDCF, BCEF.
It should be noted that the preset knowledge map library is only partially shown for illustrative purposes, and cannot be used as a limitation to the preset knowledge map library, the main entity, the candidate answer entity and the like in the embodiment of the present disclosure.
Step 205, generating a preset knowledge graph based on each path.
Any path may include nodes and edges, different paths may include common nodes, and multiple paths may be fused based on the common nodes and edges to generate a preset knowledge graph. The present disclosure is not limited thereto.
For example, the search sentence is "dunjiang weir is built when which emperor is in place", the main entity is "dunjiang weir", the candidate answer entity is "qin zhaoxiangwang", and then a path from "dunjiang weir" to "qin zhaowang" can be searched in a preset knowledge graph library. If a route 1 and a route 2 exist from the city Jianghua to the Qin Zhaoxiangwang, wherein a text 1 exists between the city Jianghua and the Qin Zhaoxiangwang in the route 1 "Qin ZhaowangFifty one year (256 years before the first of the Gongyuan)) The Tai-guard plum ice and the son thereof in the Qinshime take the water control experience of the predecessors, take the lead of the local people and host the establishment of famous water conservancy projects of the Dujiang weir. The integral planning of the Yangjiang weir divides the Mingjiang water flow into two flows, wherein one flow is introduced into the Chengjiang plain, so that flood diversion and disaster reduction can be realized, water can be introduced into the field, and the harm is changed into the benefit. In the route 2, the text 2 ' 256 years before the Jun ' exists between the Dujiang weir and the plum ice ', and the Tai guard of Shujun county in the Qin county in the period of warLi BingDujiang weir hydraulic engineering constructed by Oudenro and located in SichuanChengduOn the west side of the city of the river weir on the west of the plainRegale River (Jiang)Up, 56 km apart; between the plum ice and the qin zhao xiang wang, there is a text 3 "256 years before the christian-251 years before the christian, which is for the hollyhock guard, and the early-stage irrigation engineering dun weir in china is built by the host at the exit of the Minjiang in the city of the dunjiang weir in the near Sichuan province. Because the common 'Dujianwan' and 'Qin Zhaoxiangwang' exist between the route 1 and the route 2, the preset knowledge graph as shown in fig. 2B can be generated based on the route 1 and the route 2.
It should be noted that the preset knowledge map library and the preset knowledge map spectrogram are only partially shown and only schematically illustrated, and cannot be used as limitations of the preset knowledge map library, the preset knowledge map and the like in the embodiment of the present disclosure.
And step 206, retrieving the preset knowledge graph based on the target retrieval word to obtain a candidate triple associated with the target retrieval word.
And step 207, determining the weight of each candidate triple according to the matching degree of each candidate triple and the target search term and the knowledge type corresponding to each candidate triple.
And step 208, updating the target search term according to the weight of each candidate triple.
Step 209, determining a target search result based on the updated target search term.
It should be noted that specific contents and implementation manners of step 206 to step 209 may refer to descriptions of other embodiments of the present disclosure, and are not described herein again.
It is understood that the information retrieval method provided by the present disclosure may be applied to any information retrieval scenario, such as a question and answer system, a browser retrieval process, and the like, which is not limited by the present disclosure.
The following describes the information retrieval process provided by the present disclosure in detail by taking fig. 2C as an example.
For example, in a question-and-answer scenario, if a search statement, that is, a question statement, is "created when a town emperor is located at a town," the search statement is processed, so that a processing procedure for acquiring a target search statement, that is, an answer may be as shown in fig. 2C.
As can be seen from fig. 2C, the retrieval statement "created when the city river weir is a emperor in place" may be preprocessed, for example, "SP analysis, LAT recognition, limit recognition, and concept recognition" are performed to determine that the main entity is the city river weir, "the core verb is" created, "and the target entity" emperor "is a concept entity. And then, searching in a preset knowledge graph to obtain a candidate triple associated with the Dujiang weir construction: 256 years before the yuan; qin Zhao Xiangwang; text 1], [ 256 years before the metric); a fish mouth water diversion dike; text 1], [ 256 years before the metric); a spillway of a sand dam; text 1], [ 256 years before the metric); plum ice; text 2], [ 256 years before the metric); the success rate is high; text 2], [ 256 years before the metric); plum ice; text 2 ].
Wherein, text 1 is "Qin ZhaowangIn fifty-one years (256 years before the era), Tai' e Li ice and his son in Shu county of Qin could absorb the water treatment experience of the predecessors, take the lead of the local population, and build the famous water conservancy project of the Dujiang weir by the host. The overall planning of the Yangtze weir is to divide Minjiang water flow into two flows, wherein one flow is introduced into Chengjiang plain, so that flood diversion and disaster reduction can be realized, water can be introduced into fields, and harm is changed into benefit. The main body engineering comprisesFish mouth water diversion dikeSpillway of flying sand weirAnd a water inlet of the vase mouth. Text 2 is "256 years ago, the war time of Qin nation Shujun Tai guardLi BingDujiang weir hydraulic engineering constructed by Oudenro and located in SichuanChengduOn the west side of the city of the river weir on the west of the plainRegainAnd 56 km away from the root.
Then, the concept knowledge check can be carried out on the concept entity 'emperor', the check result is 'emperor-Qin Zhaowang', the time knowledge check is carried out, the check result is '256 years before the first element-Qin Zhaowang in-place time', then reasoning correlation calculation can be carried out, such as 'li Bing- > Qin Zhaowang', 'li Bing- > Qin Chu' and '… …', and finally, the target retrieval result 'Qin Zhaowang' is determined.
It should be noted that the above examples are merely illustrative, and are not intended to limit the information retrieval method, procedure, and the like in the embodiments of the present disclosure.
The embodiment of the disclosure may pre-process the obtained search sentence to determine main entity information, target entity information, and a core keyword corresponding to the search sentence, then determine a target search term corresponding to the search sentence according to the main entity information, the target entity information, and the core keyword, then determine a main entity and a candidate answer entity corresponding to the search sentence, search each path from the main entity to each candidate answer entity from a preset knowledge map library, then generate a preset knowledge map based on each path, search the preset knowledge map based on the target search term to obtain candidate triples associated with the target search term, then determine a weight of each candidate triplet according to a matching degree of each candidate triplet with the target search term and a knowledge type corresponding to each candidate triplet, and then, updating the target search term according to the weight of each candidate triple, and determining a target search result based on the updated target search term. Therefore, the accurate target search term can be determined through the main entity information, the target entity information and the core key words corresponding to the search sentences, then the associated candidate triple can be obtained by combining the generated preset knowledge map, and the target search term is updated based on the candidate triple, so that the determined target search result is more accurate and reliable, and the accuracy and the reliability of information search are further improved.
According to the embodiment, the accurate target search term can be determined through the main entity information, the target entity information and the core key words corresponding to the search sentences, then the associated candidate triple can be obtained by combining the generated preset knowledge graph, and the target search term is updated based on the candidate triple, so that the determined target search result is more accurate and reliable. In a possible implementation manner, the weight of any candidate triplet may be determined according to the initial weight and the associated weight of any candidate triplet in the preset knowledge graph, and the process described above with reference to fig. 3 is described below.
Fig. 3 is a schematic flow chart of an information retrieval method provided in an embodiment of the present disclosure, and as shown in fig. 3, the information retrieval method may include the following steps:
step 301, determining a target search term corresponding to the search statement.
Step 302, retrieving a preset knowledge graph based on the target retrieval word to obtain a candidate triple associated with the target retrieval word.
It should be noted that specific contents and implementation manners of step 301 and step 302 may refer to descriptions of other embodiments of the present disclosure, and are not described herein again.
Step 303, determining the number of paths of any candidate triplet included in the preset knowledge map and the text source in any candidate triplet.
For example, in the preset knowledge graph shown in fig. 3A, any candidate triple is "CD" in the graph, and as can be seen from fig. 3A, in the preset knowledge graph, the number of paths including any candidate triple "CD" is: 4.
it should be noted that the above examples are only illustrative, and should not be taken as limitations on the preset knowledge graph, any candidate triple, and the like in the embodiments of the present disclosure.
In addition, any candidate triple contains a corresponding text, and a text source in any candidate triple can be determined according to the text.
There may be many cases of text sources, such as an article, a book, an encyclopedia knowledge base, etc., which are not limited in this disclosure.
For example, the text in any of the candidate triples is "256 years before the Gregorian, the county of the Francisco county, Taihe during the time of the warringLi BingDujiang weir hydraulic engineering constructed by Oudenro and located in SichuanChengduOn the west side of the city of the river weir on the west of the plainRegainUp to 56 km away, the text may be retrieved to determine the text source, for example, the text source may be: an encyclopedia knowledge base.
It should be noted that the above examples are only illustrative, and should not be taken as limitations on the text, the text source, and the like in any candidate triple in the embodiments of the present disclosure.
In addition, the text source in any candidate triple may be one or multiple, and this disclosure does not limit this.
Step 304, determining an initial weight of any candidate triple according to the number of paths containing any candidate triple, the text source in any candidate triple, and the matching degree of any candidate triple and the target search word.
It will be appreciated that the greater the number of paths containing any candidate triplet, the greater the initial weight of that candidate triplet; or, the more text sources in any candidate triple, the greater the initial weight of the candidate triple; or, the higher the matching degree of any candidate triple with the target search word is, the more the initial weight of any candidate triple is. The present disclosure is not limited thereto.
For example, if the number of paths including any candidate triple 1 is 3, and the text sources in any candidate triple 1 are text 1 and text 2, the matching degree of any candidate triple 1 and the target search word is 0.8. For any candidate triple 2, the number of paths including any candidate triple 2 is 1, the text source in any candidate triple 2 is text 3, and the matching degree of any candidate triple 2 and the target search word is 0.5. Then the initial weight of any candidate triplet 1 may be determined to be slightly greater, such as 0.5, 0.6, etc.; it may be determined that the initial weight of any candidate triplet 2 is somewhat less, such as may be 0.3, 0.2, etc.
It should be noted that the above example is only an illustrative example, and cannot be taken as a limitation on the way of determining the initial weight of any candidate triple in the embodiment of the present disclosure.
Step 305, determining the association weight corresponding to any candidate triple according to the association relationship between the rest candidate triples with the same knowledge type as any candidate triple and any candidate triple.
For example, if the knowledge type of any candidate triple 1 is "2007", the knowledge type of the candidate triple 2 in the remaining candidate triples is "2007", the knowledge type of the candidate triple 3 is "2 months 2007", and "2 months 2007" may infer that the year to which the candidate triple 1 belongs is "2007", it may be determined that the associated weight of the candidate triple 1 is 0.3, or 0.4, and so on.
Or, if the knowledge type of any candidate triple 1 is "2007", the knowledge type of the candidate triple 2 in the remaining candidate triples is "2007", the knowledge type of the candidate triple 3 is "2 months 2007", and the knowledge type of the candidate triple 4 is "6 months and 21 days 2007", where "2 months 2007" and "6 months and 21 days 2007" can both infer that the year to which the candidate triple 1 belongs is "2007", it may be determined that the associated weight corresponding to the candidate triple 1 is slightly larger, such as may be 0.7, 0.65, and so on.
Or if the knowledge type of any candidate triple 1 is "2007", the knowledge type of the candidate triple 2 in the remaining candidate triples is "2006", and the knowledge type of the candidate triple 3 is "2008-2 month", it may be determined that the associated weight corresponding to any candidate triple 1 is 0, or may also be a smaller value, such as 0.01, 0.005, and the like.
It should be noted that the above example is only an illustrative example, and cannot be taken as a limitation on a manner of determining an association weight corresponding to any candidate triple in the embodiment of the present disclosure.
Optionally, if the value of the knowledge type in the first candidate triple is a subset of the value of the knowledge type in any candidate triple, it is determined that the initial weight corresponding to the first candidate triple is included in the association weight corresponding to any candidate triple.
For example, the knowledge type in any candidate triple takes a value of "2007", and the knowledge type in the first candidate triple takes a value of "2007 6 months", and since "2007" may include "2007 6 months", that is, "2007 6 months" in the first candidate triple is a subset of "2007" in any candidate triple, it may be determined that the initial weight corresponding to the first candidate triple is included in the association weight corresponding to any candidate triple. For example, if the initial weight of the first candidate triplet is 0.2 and the initial weight of any candidate triplet is 0.3, the associated weight of any candidate triplet may be (0.2+0.3), i.e., 0.5.
It should be noted that the above example is only an illustrative example, and cannot be taken as a limitation on a manner of determining an association weight corresponding to any candidate triple in the embodiment of the present disclosure.
Step 306, determining the weight of any candidate triple according to the initial weight and the associated weight.
It is understood that the initial weight and the associated weight of any candidate triple may be added, and the result is the weight of any candidate triple. Or, the initial weight and the associated weight may be fused in proportion, and the obtained result is the weight of any candidate triple, and the like. The present disclosure is not limited thereto.
Optionally, the weight of any candidate triplet may be represented by the following formula (1):
Figure BDA0003369965570000071
wherein the content of the first and second substances,
Figure BDA0003369965570000072
the associated weight of x is represented by,
Figure BDA0003369965570000073
is the initial weight of x.
For example, the knowledge type of the candidate triple 1 takes values as follows: in 2007, the knowledge type of the candidate triple 2 takes the following values: in month 6 of 2007, the knowledge type of the candidate triple 3 takes the following values: in 2006, where "month 6 2007" is a subset of "2007", then according to the formula (1), the association weight of the candidate triple 1 is as follows
Figure BDA0003369965570000081
That is, the initial weight corresponding to the candidate triple 2 is the initial weight corresponding to the candidate triple 1
Figure BDA0003369965570000082
The sum of the two is the weight of the candidate triple 1. The present disclosure is not limited thereto.
Therefore, in the embodiment of the present disclosure, when determining the weight of any candidate triplet, the initial weight of any candidate triplet may be determined according to the number of paths including any candidate triplet in a preset knowledge graph, the text source of any candidate triplet, and the matching degree of any candidate triplet and the target search term, and then the associated weight of any candidate triplet is determined according to the knowledge type of any candidate triplet, so that the determined weight of any candidate triplet is more accurate and reliable, and a condition is provided for improving the reliability and accuracy of information retrieval.
And 307, combining the entities in any candidate triple with the weight greater than the threshold value with the target search term to generate a new target search term.
The threshold may be a preset value, such as 0.6, 0.75, etc., which is not limited in this disclosure.
For example, the set threshold is 0.8, the weight of any candidate triple is 0.85, which is greater than the threshold, and if the entity in any candidate triple is Minjiang and the target term is Dujiang Wei and build, the generated new target term may be Minjiang, Dujiang Wei and build.
It should be noted that the above examples are only illustrative, and should not be taken as limitations on the threshold, the entity in any candidate triple, the target search term, and the like in the embodiments of the present disclosure.
And 308, determining a target retrieval result based on the updated target retrieval words.
It should be noted that specific contents and implementation manners of step 308 may refer to descriptions of other embodiments of the present disclosure, and are not described herein again.
In the embodiment of the disclosure, a target search word corresponding to a search statement may be determined, then a preset knowledge graph may be searched based on the target search word to obtain a candidate triple associated with the target search word, then the number of paths of any candidate triple included in the preset knowledge graph and a text source in any candidate triple may be determined, then an initial weight of any candidate triple may be determined according to the number of paths including any candidate triple, the text source in any candidate triple and a matching degree of any candidate triple and the target search word, then an association weight corresponding to any candidate triple may be determined according to an association relationship between the remaining candidate triples having the same knowledge type as any candidate triple and any candidate triple, and then the weight of any candidate triple may be determined according to the initial weight and the association weight, and then combining the entities in any candidate triple with the weight larger than the threshold value with the target search word to generate a new target search word, and determining a target search result based on the updated target search word. Therefore, when the weight of any candidate triple is determined, the preset knowledge map and the knowledge type of any candidate triple are fully considered, so that the accuracy and the reliability of weight determination are improved, the updated target search word can be more accurate and reliable when the target search word is updated according to the weight of the candidate triple, and the accuracy and the reliability of information retrieval are further improved.
It can be understood that after the target search term is determined, the confidence of the target search result can be determined according to the historical data corresponding to the target search term, and the target search result is displayed when the confidence meets the threshold condition. The above process is described in detail below with reference to fig. 4.
Fig. 4 is a schematic flowchart of an information retrieval method provided in an embodiment of the present disclosure, and as shown in fig. 4, the information retrieval method may include the following steps:
step 401, determining a target search term corresponding to the search statement.
And step 402, retrieving a preset knowledge graph based on the target retrieval word to obtain a candidate triple associated with the target retrieval word.
And step 403, determining the weight of each candidate triple according to the matching degree of each candidate triple and the target search word and the knowledge type corresponding to each candidate triple.
Step 404, updating the target search term according to the weight of each candidate triple.
It should be noted that specific contents and implementation manners of steps 401 to 404 may refer to descriptions of other embodiments of the present disclosure, and are not described herein again.
Step 405, determining a target retrieval result based on the updated target retrieval words.
Optionally, the entity key value pair library may be retrieved based on the updated target search term, and the target entity is determined as the target search result when the entity key value pair library includes the target entity corresponding to the updated target search term.
The entity key-value pair library may be set in advance, and may include a plurality of entity key-value pairs. For example, the terms "qin dynasty first emperor-win politics", "first artificial intelligence program for defeating the weiqi world champion-alpha dog", and so on, are not limited by the disclosure.
For example, the search statement is "the fast male champion of wei X", the target search term is "the fast male champion of wei X", the updated target search term determined by the above processing is "the fast male champion of 2007", then the entity corresponding to the fast male champion of 2007 "may be searched in the entity key value pair library, and if the entity key value pair is" the fast male champion of 2007-chen XX ", it may be determined that" chen XX "is the target search result.
It should be noted that the above examples are only illustrative, and should not be taken as limitations on entity key value pairs, target search terms, and the like in the embodiments of the present disclosure.
Or, under the condition that the entity key value pair library does not contain the target entity corresponding to the updated target search word, retrieving the preset knowledge graph based on the updated target search word so as to determine the target search result corresponding to the updated target search word.
For example, the retrieval statement is "the city weir is built when which emperor is in place", if the updated target retrieval word is "the city weir builds the gong for 256 years", and the entity key value pair database does not contain the target entity corresponding to the updated target retrieval word, then the preset knowledge graph can be retrieved based on the updated target retrieval word "the city weir builds the gong for 256 years". If the preset knowledge map is as shown in fig. 2A, the corresponding target retrieval result can be determined to be "qin sho xiangwang" by retrieving the knowledge map, and if "qin sho xiangwang" is retrieved for fifty one year (256 years before the first christian era) ", by checking the" time knowledge ", that is, the in-place time of qin sho xiangwang.
It should be noted that the above examples are merely illustrative, and should not be taken as limitations on target search terms, target search results, and the like after updating in the embodiments of the present disclosure.
Therefore, in the embodiment of the disclosure, after the updated target search term is determined, the entity key value database is searched, or the preset knowledge graph is searched, so as to obtain the target search result, and thus the determined target search result is more accurate and reliable.
And 406, acquiring historical search result display page data corresponding to the updated target search term.
The historical search result shows page data, which may be data of the historical search result corresponding to the target search term. For example, the historical search times, and the like of the target search term may be used, which is not limited in this disclosure.
In addition, the historical search result presents page data, which may be in a table form, or may also be in a text form, or may also be in a chart form, and the like, and this disclosure does not limit this.
Step 407, determining the frequency of occurrence of the target search result in the historical search result presentation page.
The frequency of the target search result in the historical search result display page can be determined by searching in the historical search result display page. The present disclosure is not limited thereto.
And step 408, displaying the search result operation data in the page data according to the occurrence frequency and the historical search results, and determining the confidence of the target search result.
The search result operation data may be data for operating a search result, for example, click operation, browsing operation, and the like, which is not limited in this disclosure.
For example, the search sentence is "created when the city weir is the emperor of which position", if the target search word is determined to be "qin zhao xiangwang", if the frequency of occurrence of "qin zhao xiangwang" in the historical search result presentation page data is: 10000, 9900 times of corresponding operations of clicking and browsing; and the occurrence frequency of the 'Qinhuang' is as follows: 111, the "click" operation is 2 times, and the "browse" operation is 1 time, the confidence of the target search result may be determined to be high, for example, the confidence may be 0.9, 0.88, and the like, which is not limited by the present disclosure.
It should be noted that the above examples are only illustrative, and should not be taken as limitations on the frequency of occurrence of target search results, operation data of search results, confidence level, and the like in the embodiments of the present disclosure.
And step 409, displaying the target search result in the search result display page under the condition that the confidence coefficient is greater than the threshold value.
The threshold may be a preset value, such as 0.9, 0.88, etc., which is not limited in this disclosure.
It is understood that, in the case that the confidence is greater than the threshold, the reliability of the target search result may be considered to be higher, and the target search result may be presented in the search result presentation page. In the case where the confidence is less than or equal to the threshold, the target search result may be considered less reliable. The present disclosure is not limited thereto.
The information retrieval process provided by the present disclosure is described below in conjunction with fig. 4A.
As can be seen from fig. 4A, the retrieval statement "created when the city river weir is a emperor in place" may be preprocessed, for example, "SP analysis, LAT recognition, limit recognition, and concept recognition" may be performed to determine: the main entity is 'Dujiang weir', the core verb 'construct', and the target entity 'emperor' is a concept entity. Then, searching can be carried out in a preset knowledge graph to obtain a candidate triple associated with the construction of the city river weir, then, concept knowledge check can be carried out on a concept entity ' emperor ', the check result is ' emperor-Qin Zhaowang ', time knowledge check is carried out, the check result is ' 256 years before the gong-Qin Zhaowang-in-place time ', and then inference correlation calculation can be carried out, such as ' li Bing- > Qin Zhaowang, li Bing- > Qin Chu Huang, … …. And then, historical search result display page data can be acquired, the confidence of the 'Qin Zhaoxiangwang' is determined according to the occurrence frequency of the target search result in the historical search result display page and the operation data of the search result, the target search result can be displayed in the search result display page under the condition that the confidence of the target search result is greater than the threshold, for example, the target search result is determined to be 'Qin Zhaoxiwang' under the condition that the confidence is greater than the threshold.
It should be noted that the above examples are merely illustrative, and should not be taken as limitations on information retrieval processes and the like in the embodiments of the present disclosure.
It is understood that the information retrieval method provided by the present disclosure may be applied to any information retrieval framework, and the information retrieval framework shown in fig. 4B is taken as an example for description below.
As shown in fig. 4B, preprocessing such as SP analysis, LAT recognition, limit recognition, and concept recognition is performed on the search term to identify the main entity information, the target entity information, the core keyword, and the target search term. And then, the input main entity information can be retrieved in the entity library through the entity retrieval module so as to output the main entity identification. The text extraction module can process the retrieval sentence by utilizing a pre-trained model so as to determine a main entity and a candidate answer entity corresponding to the retrieval sentence. The text retrieval module can perform retrieval according to the entity information. The knowledge inference module may include: the system comprises a graph construction unit, a knowledge tuple extraction unit, a tuple aggregation scoring unit and a multi-hop/implicit knowledge reasoning unit. The map construction unit can generate a preset knowledge map based on the main entity, the candidate answer entity and a preset knowledge map library. The knowledge group extracting unit can retrieve a preset knowledge graph according to the target retrieval word so as to obtain the associated candidate triple. And the tuple aggregation scoring unit can determine the weight of each candidate triple according to the initial weight and the associated weight of each candidate triple. The multi-hop/implicit knowledge inference unit can update the target search terms based on the weight of each candidate triple to determine a target search result, namely an answer.
It should be noted that the above examples are merely illustrative, and should not be taken as limitations on the information retrieval framework and the like in the embodiments of the present disclosure.
The embodiment of the disclosure may determine a target search word corresponding to a search statement, then search a preset knowledge map based on the target search word to obtain candidate triples associated with the target search word, then determine a weight of each candidate triplet according to a matching degree of each candidate triplet with the target search word and a knowledge type corresponding to each candidate triplet, then update the target search word according to the weight of each candidate triplet, then may obtain historical search result presentation page data corresponding to the updated target search word, then determine a frequency of occurrence of a target search result in a historical search result presentation page, then may present search result operation data in the page data according to the frequency of occurrence and the historical search result, determine a confidence of the target search result, and in case that the confidence is greater than a threshold, and displaying the target search result in the search result display page. Therefore, after the target search result is determined, the confidence of the target search result can be determined according to the historical search data, and the target search result is displayed under the condition that the confidence is greater than the threshold, so that the reliability and the accuracy of the target search result are improved.
In order to implement the above embodiments, the present disclosure also provides an information retrieval apparatus.
Fig. 5 is a schematic structural diagram of an information retrieval device according to an embodiment of the present disclosure.
As shown in fig. 5, the information retrieval apparatus 500 includes: a first determination module 510, an acquisition module 520, a second determination module 530, an update module 540, and a third determination module 550.
The first determining module 510 is configured to determine a target search term corresponding to the search statement.
An obtaining module 520, configured to retrieve a preset knowledge graph based on the target search term to obtain a candidate triple associated with the target search term.
A second determining module 530, configured to determine a weight of each candidate triple according to a matching degree between each candidate triple and the target search term and a knowledge type corresponding to each candidate triple.
And an updating module 540, configured to update the target search term according to the weight of each candidate triple.
And a third determining module 550, configured to determine a target search result based on the updated target search term.
Optionally, the first determining module includes:
the first determining unit is used for preprocessing the acquired retrieval statement to determine main entity information, target entity information and core keywords corresponding to the retrieval statement;
and the second determining unit is used for determining a target search word corresponding to the search statement according to the main entity information, the target entity information and the core key word.
Optionally, the master entity information includes at least one of the following: a primary entity identifier and a type of the primary entity;
the target entity information comprises at least one of the following: the target entity identification and the type of the target entity.
Optionally, the second determining unit is specifically configured to:
under the condition that the type in the target entity information is a concept class, determining the target search word according to the core keyword and the identifier in the main entity information;
or, under the condition that the type in the target entity information is a non-concept type, determining the target search term according to the identifier in the main entity information and the identifier in the target entity information.
Optionally, the first determining module is further configured to:
determining a main entity and a candidate answer entity corresponding to the retrieval statement;
searching each path from the main entity to each candidate answer entity from a preset knowledge map library;
and generating the preset knowledge graph based on each path.
Optionally, the second determining module includes:
the third determining unit is used for determining the number of paths of any candidate triple contained in the preset knowledge graph and a text source in any candidate triple;
a fourth determining unit, configured to determine an initial weight of any candidate triple according to the number of paths including any candidate triple, a text source in any candidate triple, and a matching degree between any candidate triple and the target search term;
a fifth determining unit, configured to determine an association weight corresponding to the any candidate triple according to an association relationship between the remaining candidate triples that have the same knowledge type as the any candidate triple and the any candidate triple;
a sixth determining unit, configured to determine a weight of the any candidate triple according to the initial weight and the associated weight.
Optionally, the fifth determining unit is specifically configured to:
and if the value of the knowledge type in the first candidate triple is a subset of the value of the knowledge type in any candidate triple, determining that the associated weight corresponding to any candidate triple comprises the initial weight corresponding to the first candidate triple.
Optionally, the update module is specifically configured to:
and combining the entities in any candidate triple with the weight larger than the threshold value with the target search word to generate a new target search word.
Optionally, the third determining module is specifically configured to:
searching an entity key value pair library based on the updated target search term, and confirming the target entity as a target search result under the condition that the entity key value pair library comprises the target entity corresponding to the updated target search term;
and under the condition that the entity key value pair library does not contain a target entity corresponding to the updated target search word, retrieving the preset knowledge graph based on the updated target search word so as to determine a target search result corresponding to the updated target search word.
Optionally, the first determining module is further configured to:
acquiring historical search result display page data corresponding to the updated target search term;
determining the occurrence frequency of the target search result in the historical search result display page;
determining the confidence of the target search result according to the occurrence frequency and search result operation data in the historical search result display page data;
and displaying the target search result in a search result display page under the condition that the confidence coefficient is greater than a threshold value.
The functions and specific implementation principles of the modules in the embodiments of the present disclosure may refer to the embodiments of the methods, and are not described herein again.
The information retrieval device of the embodiment of the disclosure may determine a target search term corresponding to a search statement, then retrieve a preset knowledge map based on the target search term to obtain candidate triples associated with the target search term, determine a weight of each candidate triplet according to a matching degree of each candidate triplet with the target search term and a knowledge type corresponding to each candidate triplet, then update the target search term according to the weight of each candidate triplet, and determine a target retrieval result based on the updated target search term. Therefore, in the process of retrieval, the target retrieval words can be retrieved in the knowledge graph to obtain associated candidate triples, and then the target retrieval words are updated according to the knowledge types corresponding to the candidate triples and the matching degree of the target retrieval words, so that accurate and reliable target retrieval results can be determined, the cost required by information retrieval is reduced, and the accuracy and the reliability of the information retrieval are improved.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 executes the respective methods and processes described above, such as the information retrieval method. For example, in some embodiments, the information retrieval method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM 603 and executed by the computing unit 601, one or more steps of the information retrieval method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the information retrieval method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.
According to the technical scheme, the target search word corresponding to the search sentence can be determined, then the preset knowledge map is searched based on the target search word to obtain the candidate triples related to the target search word, the weight of each candidate triplet is determined according to the matching degree of each candidate triplet and the target search word and the knowledge type corresponding to each candidate triplet, then the target search word is updated according to the weight of each candidate triplet, and the target search result is determined based on the updated target search word. Therefore, in the process of retrieval, the target retrieval words can be retrieved in the knowledge graph to obtain associated candidate triples, and then the target retrieval words are inferred and updated according to the knowledge types corresponding to the candidate triples and the matching degree of the target retrieval words, so that accurate and reliable target retrieval results can be determined, the cost required by information retrieval is reduced, and the accuracy and the reliability of the information retrieval are improved.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (23)

1. An information retrieval method, comprising:
determining a target search word corresponding to the search sentence;
retrieving a preset knowledge graph based on the target retrieval word to obtain a candidate triple associated with the target retrieval word;
determining the weight of each candidate triple according to the matching degree of each candidate triple and the target search word and the knowledge type corresponding to each candidate triple;
updating the target search term according to the weight of each candidate triple;
and determining a target retrieval result based on the updated target retrieval words.
2. The method of claim 1, wherein the determining a target term corresponding to the search sentence comprises:
preprocessing the acquired retrieval statement to determine main entity information, target entity information and core keywords corresponding to the retrieval statement;
and determining a target search word corresponding to the search statement according to the main entity information, the target entity information and the core key word.
3. The method of claim 2, wherein the master entity information comprises at least one of: a primary entity identifier and a type of the primary entity;
the target entity information comprises at least one of the following: the target entity identification and the type of the target entity.
4. The method of claim 3, wherein the determining the target search term corresponding to the search statement according to the main entity information, the target entity information and the core keyword comprises:
under the condition that the type in the target entity information is a concept class, determining the target search word according to the core keyword and the identifier in the main entity information;
or, under the condition that the type in the target entity information is a non-concept type, determining the target search term according to the identifier in the main entity information and the identifier in the target entity information.
5. The method of claim 1, wherein before the retrieving the preset knowledge graph based on the target retrieval word, further comprising:
determining a main entity and a candidate answer entity corresponding to the retrieval statement;
searching each path from the main entity to each candidate answer entity from a preset knowledge map library;
and generating the preset knowledge graph based on each path.
6. The method of claim 1, wherein the determining the weight of each candidate triple according to the matching degree of each candidate triple with the target search term and the knowledge type corresponding to each candidate triple comprises:
determining the number of paths of any candidate triple contained in the preset knowledge map and a text source in any candidate triple;
determining an initial weight of any candidate triple according to the number of the paths containing any candidate triple, a text source in any candidate triple and the matching degree of any candidate triple and the target search word;
determining an association weight corresponding to any candidate triple according to the association relation between the rest candidate triples with the same knowledge type as the any candidate triple and the any candidate triple;
and determining the weight of any candidate triple according to the initial weight and the associated weight.
7. The method of claim 6, wherein said determining the association weight corresponding to said any candidate triple according to the association relationship with said any candidate triple among the remaining candidate triples having the same knowledge type as said any candidate triple comprises:
and if the value of the knowledge type in the first candidate triple is a subset of the value of the knowledge type in any candidate triple, determining that the associated weight corresponding to any candidate triple comprises the initial weight corresponding to the first candidate triple.
8. The method of claim 6, wherein said updating said target term according to the weight of each said candidate triplet comprises:
and combining the entities in any candidate triple with the weight larger than the threshold value with the target search word to generate a new target search word.
9. The method of any one of claims 1-8, wherein determining a target search result based on the updated target search term comprises:
searching an entity key value pair library based on the updated target search term, and confirming the target entity as a target search result under the condition that the entity key value pair library comprises the target entity corresponding to the updated target search term;
and under the condition that the entity key value pair library does not contain a target entity corresponding to the updated target search word, retrieving the preset knowledge graph based on the updated target search word so as to determine a target search result corresponding to the updated target search word.
10. The method of any one of claims 1-8, wherein after determining the target search result based on the updated target search term, further comprising:
acquiring historical search result display page data corresponding to the updated target search term;
determining the occurrence frequency of the target search result in the historical search result display page;
determining the confidence of the target search result according to the occurrence frequency and search result operation data in the historical search result display page data;
and displaying the target search result in a search result display page under the condition that the confidence coefficient is greater than a threshold value.
11. An information retrieval apparatus comprising:
the first determining module is used for determining a target search term corresponding to the search statement;
the acquisition module is used for retrieving a preset knowledge graph based on the target retrieval word so as to acquire a candidate triple associated with the target retrieval word;
the second determining module is used for determining the weight of each candidate triple according to the matching degree of each candidate triple and the target search word and the knowledge type corresponding to each candidate triple;
the updating module is used for updating the target search term according to the weight of each candidate triple;
and the third determining module is used for determining a target retrieval result based on the updated target retrieval words.
12. The apparatus of claim 11, wherein the first determining means comprises:
the first determining unit is used for preprocessing the acquired retrieval statement to determine main entity information, target entity information and core keywords corresponding to the retrieval statement;
and the second determining unit is used for determining a target search word corresponding to the search statement according to the main entity information, the target entity information and the core key word.
13. The apparatus of claim 12, wherein the master entity information comprises at least one of: a primary entity identifier and a type of the primary entity;
the target entity information comprises at least one of the following: the target entity identification and the type of the target entity.
14. The apparatus of claim 13, wherein the second determining unit is specifically configured to:
under the condition that the type in the target entity information is a concept class, determining the target search word according to the core keyword and the identifier in the main entity information;
or, under the condition that the type in the target entity information is a non-concept type, determining the target search term according to the identifier in the main entity information and the identifier in the target entity information.
15. The apparatus of claim 11, wherein the first determining module is further configured to:
determining a main entity and a candidate answer entity corresponding to the retrieval statement;
searching each path from the main entity to each candidate answer entity from a preset knowledge map library;
and generating the preset knowledge graph based on each path.
16. The apparatus of claim 11, wherein the second determining means comprises:
the third determining unit is used for determining the number of paths of any candidate triple contained in the preset knowledge graph and a text source in any candidate triple;
a fourth determining unit, configured to determine an initial weight of any candidate triple according to the number of paths including any candidate triple, a text source in any candidate triple, and a matching degree between any candidate triple and the target search term;
a fifth determining unit, configured to determine an association weight corresponding to the any candidate triple according to an association relationship between the remaining candidate triples that have the same knowledge type as the any candidate triple and the any candidate triple;
a sixth determining unit, configured to determine a weight of the any candidate triple according to the initial weight and the associated weight.
17. The apparatus of claim 16, wherein the fifth determining unit is specifically configured to:
and if the value of the knowledge type in the first candidate triple is a subset of the value of the knowledge type in any candidate triple, determining that the associated weight corresponding to any candidate triple comprises the initial weight corresponding to the first candidate triple.
18. The apparatus of claim 16, wherein the update module is specifically configured to:
and combining the entities in any candidate triple with the weight larger than the threshold value with the target search word to generate a new target search word.
19. The apparatus of any of claims 11-18, wherein the third determining module is specifically configured to:
searching an entity key value pair library based on the updated target search term, and confirming the target entity as a target search result under the condition that the entity key value pair library comprises the target entity corresponding to the updated target search term;
and under the condition that the entity key value pair library does not contain a target entity corresponding to the updated target search word, retrieving the preset knowledge graph based on the updated target search word so as to determine a target search result corresponding to the updated target search word.
20. The apparatus of any of claims 11-18, wherein the first determining module is further configured to:
acquiring historical search result display page data corresponding to the updated target search term;
determining the occurrence frequency of the target search result in the historical search result display page;
determining the confidence of the target search result according to the occurrence frequency and search result operation data in the historical search result display page data;
and displaying the target search result in a search result display page under the condition that the confidence coefficient is greater than a threshold value.
21. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.
22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-10.
23. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-10.
CN202111394897.7A 2021-11-23 2021-11-23 Information retrieval method, device, electronic equipment and storage medium Pending CN114281965A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111394897.7A CN114281965A (en) 2021-11-23 2021-11-23 Information retrieval method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111394897.7A CN114281965A (en) 2021-11-23 2021-11-23 Information retrieval method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114281965A true CN114281965A (en) 2022-04-05

Family

ID=80869770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111394897.7A Pending CN114281965A (en) 2021-11-23 2021-11-23 Information retrieval method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114281965A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116719954A (en) * 2023-08-04 2023-09-08 中国人民解放军海军潜艇学院 Information retrieval method, electronic equipment and storage medium
CN117556061A (en) * 2023-11-20 2024-02-13 曾昭涵 Text output method and device, electronic equipment and storage medium
CN117573842B (en) * 2024-01-12 2024-04-30 阿里云计算有限公司 Document retrieval method and automatic question-answering method
CN117556061B (en) * 2023-11-20 2024-05-24 曾昭涵 Text output method and device, electronic equipment and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116719954A (en) * 2023-08-04 2023-09-08 中国人民解放军海军潜艇学院 Information retrieval method, electronic equipment and storage medium
CN116719954B (en) * 2023-08-04 2023-10-17 中国人民解放军海军潜艇学院 Information retrieval method, electronic equipment and storage medium
CN117556061A (en) * 2023-11-20 2024-02-13 曾昭涵 Text output method and device, electronic equipment and storage medium
CN117556061B (en) * 2023-11-20 2024-05-24 曾昭涵 Text output method and device, electronic equipment and storage medium
CN117573842B (en) * 2024-01-12 2024-04-30 阿里云计算有限公司 Document retrieval method and automatic question-answering method

Similar Documents

Publication Publication Date Title
KR102564144B1 (en) Method, apparatus, device and medium for determining text relevance
CN107368468B (en) Operation and maintenance knowledge map generation method and system
JP7253593B2 (en) Training method and device for semantic analysis model, electronic device and storage medium
US10496749B2 (en) Unified semantics-focused language processing and zero base knowledge building system
EP3910492A2 (en) Event extraction method and apparatus, and storage medium
WO2015093540A1 (en) Phrase pair gathering device and computer program therefor
CN108875040A (en) Dictionary update method and computer readable storage medium
US11494420B2 (en) Method and apparatus for generating information
CN114281965A (en) Information retrieval method, device, electronic equipment and storage medium
CN112579727A (en) Document content extraction method and device, electronic equipment and storage medium
JP2023519049A (en) Method and apparatus for obtaining POI status information
CN114595686A (en) Knowledge extraction method, and training method and device of knowledge extraction model
CN113656587A (en) Text classification method and device, electronic equipment and storage medium
CN113011155A (en) Method, apparatus, device, storage medium and program product for text matching
CN111966792A (en) Text processing method and device, electronic equipment and readable storage medium
US20230052623A1 (en) Word mining method and apparatus, electronic device and readable storage medium
EP3992814A2 (en) Method and apparatus for generating user interest profile, electronic device and storage medium
CN113807102B (en) Method, device, equipment and computer storage medium for establishing semantic representation model
CN114239583B (en) Method, device, equipment and medium for training entity chain finger model and entity chain finger
CN113868508B (en) Writing material query method and device, electronic equipment and storage medium
CN115719066A (en) Search text understanding method, device, equipment and medium based on artificial intelligence
CN112182235A (en) Method and device for constructing knowledge graph, computer equipment and storage medium
CN116069914B (en) Training data generation method, model training method and device
CN116628004B (en) Information query method, device, electronic equipment and storage medium
CN114064847A (en) Text detection method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination