CN117520487A - Knowledge question-answering path searching method and related device - Google Patents

Knowledge question-answering path searching method and related device Download PDF

Info

Publication number
CN117520487A
CN117520487A CN202210915024.4A CN202210915024A CN117520487A CN 117520487 A CN117520487 A CN 117520487A CN 202210915024 A CN202210915024 A CN 202210915024A CN 117520487 A CN117520487 A CN 117520487A
Authority
CN
China
Prior art keywords
entity
path
vocabulary
names
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210915024.4A
Other languages
Chinese (zh)
Inventor
曾立
路金成
游齐恒
刘时正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202210915024.4A priority Critical patent/CN117520487A/en
Publication of CN117520487A publication Critical patent/CN117520487A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Abstract

The application discloses a knowledge question-answering path searching method and a related device, wherein the method comprises the following steps: the search device acquires a question sentence and a knowledge graph of a user; the searching device acquires at least two entity names according to the target vocabulary. The searching device calculates the score of each entity name in the at least two entity names according to the knowledge graph and the at least two entity names, and the score of each entity name is used for representing the matching degree of the entity name and the target vocabulary. The searching device determines a starting point candidate set from the knowledge graph according to the scores of at least two entity names, and the starting point corresponds to the target vocabulary; and determining at least one answer path corresponding to the question according to the starting point candidate set, the question and the knowledge graph. When the method is adopted to determine the starting point candidate set, the condition that entities with different prefixes and identical suffixes are excluded in an SSI mode can be avoided, and therefore a more comprehensive answer path can be obtained.

Description

Knowledge question-answering path searching method and related device
Technical Field
The application relates to the field of information technology, in particular to a knowledge question-answering path searching method and a related device.
Background
The evolution of information technology has driven the evolution of internet technology from links between Web pages to connections between data (person-to-person, person-to-object, person-to-knowledge, knowledge-to-knowledge), the internet evolving towards semantic networks in the Web's parent Berners-Lee concept. The semantic network is essentially a network of knowledge on which a user can query (i.e., input a natural language question), and the query results (i.e., knowledge that has been processed and inferred) are graphically returned. The query process is knowledge question and answer, and the knowledge question and answer engine is a foundation and a bridge for realizing intelligent semantic retrieval.
The knowledge question-and-answer engine is different from a conventional internet search engine. For the problem of user input, the search engine finds out all web pages containing user keywords through reverse index table and text matching based on the existing massive text, and displays the web pages to the user after sorting. In contrast, the knowledge question-answering engine returns a completely matched path based on the existing knowledge graph, and the user can directly obtain answers to the questions.
As shown in fig. 1a, in order to support the knowledge question-answering capability, the knowledge question-answering engine needs to perform data preprocessing, such as word segmentation, on the natural language question input by the user, and then query the starting point from the knowledge graph according to the preprocessing result, that is, find the entity closest to the preprocessing result from the graph. After the steps are completed, the query is required to be mapped on the graph structure corresponding to the knowledge graph, and the paths matched with the questions are searched. And finally, the whole search path is the answer required by the user.
The entity closest to the preprocessing result can be found from the atlas by adopting a state index set (SSI), which is a character string matching technology based on a prefix tree, specifically, after mapping entity names in the knowledge atlas into a digital sequence, a prefix tree is established, wherein leaf nodes of the prefix tree store the digital sequence and names of all corresponding entities; and then matching the character string of the preprocessing result with the nodes in the prefix tree, and determining an entity corresponding to the node meeting the matching condition as a query starting point. The method is based on prefix tree matching, and only character strings matched by the prefix tree can be screened out during matching, and other character strings with reasonable conditions cannot be screened out, for example, the prefixes of two character strings are not matched and the prefixes are identical, but the two character strings are not matched according to SSI mode. And because the SSI can completely convert the character string into numbers, the text semantic information carried by the character string is lost, and the text semantic information is not utilized during matching.
Disclosure of Invention
The application provides a knowledge question-answering path searching method and a related device, which can avoid the situation that entities with different prefixes and identical suffixes are excluded by adopting an SSI mode when a starting point candidate set is determined, so that a more comprehensive answer path can be obtained.
In a first aspect, the present application provides a knowledge question-answer path searching method, including:
the search device acquires a question sentence and a knowledge graph of a user; the searching device acquires at least two entity names according to a target vocabulary, wherein the at least two entity names are names of entities in a knowledge graph, and the target vocabulary is the first vocabulary in a question sentence; any entity name in the at least two entity names comprises or is a target vocabulary. The searching device calculates the score of each entity name in the at least two entity names according to the knowledge graph and the at least two entity names, and the score of each entity name is used for representing the matching degree of the entity name and the target vocabulary. The searching device determines a starting point candidate set from the knowledge graph according to the scores of at least two entity names, and the starting point corresponds to the target vocabulary; and determining at least one answer path corresponding to the question according to the starting point candidate set, the question and the knowledge graph.
The answer path is a path of an entity containing the target attribute in the knowledge graph, wherein the starting point is an entity in the starting point candidate set, and the end point is an entity in the knowledge graph.
The method can be regarded as that at least two entity names are obtained according to the target vocabulary, the score of each entity name in the at least two entity names is obtained through calculation according to the at least two entity names and the knowledge graph, and the starting point candidate set is determined from the knowledge graph according to the score of each entity name.
In one possible implementation, the method of the present application further includes:
the searching device obtains a vocabulary inverted list according to the knowledge graph; the vocabulary inverted list comprises corresponding relations between a plurality of vocabularies and a plurality of entity names, one vocabulary corresponds to one or more entity names, each entity name in the plurality of entity names comprises a vocabulary corresponding to the entity name, or each entity name is a vocabulary corresponding to the entity name;
the searching device obtains at least two entity names according to the target vocabulary, and the searching device comprises:
the searching device traverses the word inverted list according to the target word so as to obtain at least two entity names.
Entity names corresponding to the target vocabulary can be found out rapidly based on the vocabulary inverted list, so that the path searching efficiency is improved.
In one possible implementation manner, the searching device calculates a score of each entity name in the at least two entity names according to the knowledge graph and the at least two entity names, including:
the searching device obtains the weight of the target vocabulary; and calculating the score of each entity name according to the weight of the target vocabulary, the number of the vocabulary in each entity name, the average value of the number of the vocabulary of all entity names in the knowledge graph and the occurrence frequency of the target vocabulary in each entity name.
The score of each entity name is calculated through the weight of the target vocabulary, the number of times of the target vocabulary in each entity name, the number of vocabularies in each entity name and the average value of the number of vocabularies of all entities, and the score calculation of each entity name is optimized through establishing an inverted list index taking vocabularies as units, so that the efficiency of determining the starting point candidate set is improved, and the accuracy of the starting point candidate set is improved.
In one possible implementation, the searching means determines the starting candidate set from the knowledge-graph according to the scores of at least two entity names, including:
the searching device ranks the scores of at least two entity names in order from big to small; and respectively matching the k entity names with the scores ranked at the front with the target vocabulary according to a first preset matching rule so as to obtain matched entity names. The search means determines that the starting candidate set includes entities in the knowledge-graph that are named as matching entity names.
And sorting the at least two entity names based on the scores of the at least two entity names, respectively matching k entity names with the front sorting with the target vocabulary according to a preset matching rule, and determining the entity with the matched name with the target vocabulary as the starting point of the answer path to obtain a starting point candidate set.
In one possible implementation manner, the searching device determines at least one answer path corresponding to the question sentence according to the starting point candidate set, the question sentence and the knowledge graph, and includes:
the searching device acquires the prepositioning collection and the target attribute words from the question sentence; the medium vocabulary set comprises vocabularies except the target vocabularies and the target attribute words in the question sentences; the searching device determines at least one first path from the knowledge graph according to the starting point candidate set and the intermediate prepositioning set, wherein the starting point of each first path is an entity in the starting point candidate set, and the end point of each first path is an entity in the knowledge graph, the name of which is matched with the last vocabulary in the intermediate vocabulary set; the searching device determines at least one answer path according to the at least one first path and the target attribute word, wherein the attribute of the end point of the answer path comprises the target attribute.
Wherein each first path includes entities whose names match all of the words in the set of intermediate words. For example, the intermediate vocabulary set includes 3 vocabularies, and names of 3 entities in each first path are respectively matched with the 3 vocabularies in the intermediate vocabulary set.
In one possible implementation manner, the searching means determines at least one first path from the knowledge-graph according to the starting candidate set and the intermediate prepositioning set, including:
The searching device takes an entity in the starting point candidate set as a starting point, and acquires a plurality of sub-path sets based on the intermediate vocabulary set; the method comprises the steps that a starting point of each sub-path in an ith sub-path set in a plurality of sub-path sets is a first entity, an end point of each sub-path is a second entity, and depth between the first entity and the second entity in a knowledge graph is not greater than search depth; the second entity is an entity with names matched with the ith vocabulary in the intermediate vocabulary set in the knowledge graph; when i=1, the first entity is an origin candidate set entity; when i is greater than 1, the first entity is the end point of the sub-path in the i-1 th sub-path set; the jth vocabulary in the intermediate vocabulary set is adjacent to the jth+1th vocabulary in the questioning sentence, and the jth vocabulary is in front of the jth+1th vocabulary in the questioning sentence; j is not more than N-1, N is the number of the medium vocabulary set vocabulary; at least one first path is acquired from the plurality of sub-path sets.
The depth between the first entity and the second entity refers to the length of a path with a start point and an end point between the first entity and the second entity, respectively. The paths between the first entity and the second entity are the starting point and the end point: the first entity-third entity-second entity, the length of this path is 2.
In one example, for a sub-path in the ith sub-path set, the corresponding search depth is a-b-c, where a is the maximum value of the preset answer path length, b is the length of the path from the start of the answer path to the first entity, and c is N-i. When there are multiple paths from the start of the answer path to the first entity, b is the minimum of the lengths in these paths. Since the first entities of the sub-paths in the different sub-path sets may not be the same, the search depth used in conducting the search is also different. It should be understood that the search depth adopted when the first entity starts to perform the search is the search depth corresponding to the first entity.
In another example, the search depth is a preset threshold, such as 2.
In the first path, there may be at least one entity between two entities matching the starting point and the 1 st vocabulary in the intermediate vocabulary set, or in the first path, there may be at least one entity between two entities matching two adjacent vocabularies in the intermediate vocabulary set, that is, in at least one first path, there may be first paths with different lengths, so that the answer path lengths determined based on the first paths are also different, the length of the answer path does not need to be always the same as the number of vocabularies in the question sentence, and the matching manner adopted by the vocabularies in the intermediate vocabulary set and the entity names of the first paths is fuzzy matching or approximate matching, so, compared with the prior art, an answer path which cannot be found by the prior knowledge engine can be found, and all paths similar to the meaning of the question sentence can be output.
In one possible implementation, starting with an entity in a starting candidate set, obtaining a plurality of sub-path sets based on an intermediate vocabulary set includes:
in the knowledge graph, starting from the end point of any one sub-path P in the ith sub-path set in the multiple sub-path sets, searching in the range of the searching depth, and matching the searched name of the entity O with the (i+1) th vocabulary in the medium vocabulary set according to a second preset matching rule; if the name of the entity O is matched with the (i+1) th vocabulary, acquiring a path between the end point of the sub-path P and the entity O; wherein the i+1th sub-path set includes paths between the end point of the sub-path P and the entity O.
In one possible implementation, acquiring a path between an end point of the sub-path P and the entity O includes:
determining a downstream node of the first node according to a bit string of the first node, wherein in the bit string of the first node, the position of a bit with a value of a first value in the bit string of the first node is used for indicating the number of the downstream node of the first node, and a path between the end point of the sub path P and the entity O comprises the first node and the downstream node of the first node; the first node is any entity except the entity O in the path between the end point of the sub-path P and the entity O.
By introducing the bit string of the first node, the searching device does not need to store all downstream nodes of the first node, only needs to store the bit characteristics of the first node, and the memory occupation of the searching device is reduced.
In one possible implementation, the search within the range of search depths is performed in a Deep First Search (DFS) +a Broad First Search (BFS) search mode.
In one possible implementation, determining at least one answer path according to at least one first path and the target attribute word includes:
acquiring an entity containing the target attribute according to the bit string of the target attribute word, wherein in the bit string of the target attribute word, bits with values of second values are positioned at the position of the bit string of the target attribute word and used for indicating the number of the entity containing the target attribute; BFS is carried out in a search depth range from the end point of a second path in the knowledge graph, and the end point of an answer path is determined according to the searched entity and the entity containing the target attribute, wherein the second path is one of at least one first path, and the end point of the answer path comprises the intersection of the searched entity and the entity containing the target attribute; and acquiring a path from the end point of the second path to the end point of the answer path, and determining at least one answer path according to the second path and the path from the end point of the second path to the end point of the answer path.
By introducing the bit string of the attribute word, the search device does not need to store the attributes of all nodes, and only needs to store the bit string of the attribute word, so that the memory occupation of the search device can be reduced, and the efficiency of searching the end point of the answer path can be improved.
In a second aspect, the present application provides a search apparatus, including:
the acquisition unit is used for acquiring the question sentences and the knowledge graph of the user; acquiring at least two entity names according to a target vocabulary, wherein the at least two entity names are names of entities in a knowledge graph, and the target vocabulary is a first vocabulary in a question sentence; any entity name in the at least two entity names comprises a target vocabulary or is a target vocabulary;
the computing unit is used for computing and obtaining the score of each entity name in the at least two entity names according to the knowledge graph and the at least two entity names, wherein the score of each entity name is used for representing the matching degree of the entity name and the target vocabulary;
the determining unit is used for determining a starting point candidate set from the knowledge graph according to the scores of at least two entity names, and the starting point corresponds to the target vocabulary; and determining at least one answer path corresponding to the question according to the starting point candidate set, the question and the knowledge graph.
In one possible implementation manner, the determining unit is further configured to obtain a vocabulary inverted list according to the knowledge graph; the vocabulary inverted list comprises corresponding relations between a plurality of vocabularies and a plurality of entity names, one vocabulary corresponds to one or more entity names, each entity name in the plurality of entity names comprises a vocabulary corresponding to the entity name, or each entity name is a vocabulary corresponding to the entity name;
in the aspect of acquiring at least two entity names according to the target vocabulary, the acquiring unit is specifically configured to:
traversing the word inverted list according to the target word to obtain at least two entity names.
In one possible implementation, the computing unit is specifically configured to:
acquiring the weight of a target vocabulary; and calculating the score of each entity name according to the weight of the target vocabulary, the number of the vocabulary in each entity name, the average value of the number of the vocabulary of all entity names in the knowledge graph and the occurrence frequency of the target vocabulary in each entity name.
In one possible implementation manner, the determining unit is specifically configured to, in determining the starting candidate set from the knowledge-graph according to the scores of at least two entity names:
Sorting the scores of at least two entity names in order from big to small; respectively matching k entity names with the scores ranked at the front with the target vocabulary according to a first preset matching rule to obtain matched entity names; determining the starting candidate set includes entities in the knowledge-graph that are named as matching entity names.
In one possible implementation manner, in determining at least one answer path corresponding to the question according to the starting candidate set, the question and the knowledge graph, the determining unit is specifically configured to:
acquiring a preposition collection and a target attribute word from a question sentence; the medium vocabulary set comprises vocabularies except the target vocabularies and the target attribute words in the question sentences; determining at least one first path from the knowledge graph according to the starting point candidate set and the intermediate vocabulary set, wherein the starting point of each first path is an entity in the starting point candidate set, and the end point of each first path is an entity in the knowledge graph, the name of which is matched with the last vocabulary in the intermediate vocabulary set; and determining at least one answer path according to the at least one first path and the target attribute word, wherein the attribute of the end point of the answer path comprises the target attribute.
In one possible implementation manner, the determining unit is specifically configured to, in determining at least one first path from the knowledge-graph according to the starting candidate set and the mesogen set:
taking an entity in the starting point candidate set as a starting point, and acquiring a plurality of sub-path sets based on the intermediate vocabulary set; the method comprises the steps that a starting point of each sub-path in an ith sub-path set in a plurality of sub-path sets is a first entity, an end point of each sub-path is a second entity, and depth between the first entity and the second entity in a knowledge graph is not greater than search depth; the second entity is an entity with names matched with the ith vocabulary in the intermediate vocabulary set in the knowledge graph; when i=1, the first entity is an origin candidate set entity; when i is greater than 1, the first entity is the end point of the sub-path in the i-1 th sub-path set; the jth vocabulary in the intermediate vocabulary set is adjacent to the jth+1th vocabulary in the questioning sentence, and the jth vocabulary is in front of the jth+1th vocabulary in the questioning sentence; j is not more than N-1, N is the number of the medium vocabulary set vocabulary; at least one first path is acquired from the plurality of sub-path sets.
In one possible implementation manner, in taking an entity in the starting point candidate set as a starting point, the determining unit is specifically configured to:
In the knowledge graph, starting from the end point of any one sub-path P in the ith sub-path set in the multiple sub-path sets, searching in the range of the searching depth, and matching the searched name of the entity O with the (i+1) th vocabulary in the medium vocabulary set according to a second preset matching rule; if the name of the entity O is matched with the (i+1) th vocabulary, acquiring a path between the end point of the sub-path P and the entity O; wherein the i+1th sub-path set includes paths between the end point of the sub-path P and the entity O.
In one possible implementation, in terms of acquiring a path between the end point of the sub-path P and the entity O, the determining unit is specifically configured to:
determining a downstream node of the first node according to a bit string of the first node, wherein in the bit string of the first node, the position of a bit with a value of a first value in the bit string of the first node is used for indicating the number of the downstream node of the first node, and a path between the end point of the sub path P and the entity O comprises the first node and the downstream node of the first node; the first node is any entity except the entity O in the path between the end point of the sub-path P and the entity O.
In one possible implementation, the search is performed within a range of search depths in a dfs+bfs search manner.
In one possible implementation manner, the determining unit is specifically configured to, in determining at least one answer path according to at least one first path and the target attribute word:
acquiring an entity containing the target attribute according to the bit string of the target attribute word, wherein in the bit string of the target attribute word, bits with values of second values are positioned at the position of the bit string of the target attribute word and used for indicating the number of the entity containing the target attribute; BFS is carried out in a search depth range from the end point of a second path in the knowledge graph, and the end point of an answer path is determined according to the searched entity and the entity containing the target attribute, wherein the second path is one of at least one first path, and the end point of the answer path comprises the intersection of the searched entity and the entity containing the target attribute; and acquiring a path from the end point of the second path to the end point of the answer path, and determining at least one answer path according to the second path and the path from the end point of the second path to the end point of the answer path.
In a third aspect, the present application provides a search apparatus comprising a processor and a memory. The memory is used for storing program codes. The processor is configured to invoke program code stored in the memory to perform the method of the first aspect or any of the possible implementations of the first aspect.
In a fourth aspect, the present application provides a computer storage medium comprising computer instructions which, when run on an electronic device, cause the electronic device to perform a method as provided by any one of the possible implementations of the first aspect.
In a fifth aspect, the present application provides a computer program product for, when run on a computer, causing the computer to perform the method as provided by any one of the possible implementations of the first aspect.
It will be appreciated that the apparatus of the second aspect, the apparatus of the third aspect, the computer storage medium of the fourth aspect or the computer program product of the fifth aspect provided above are each adapted to perform the method provided in any of the first aspects. Therefore, the advantages achieved by the method can be referred to as the advantages of the corresponding method, and will not be described herein.
Drawings
FIG. 1a is a schematic diagram of a knowledge question-answering process;
FIG. 1b is a schematic diagram of a system architecture according to an embodiment of the present disclosure;
fig. 2 is a flow chart of a knowledge question-answer path searching method according to an embodiment of the present application;
fig. 3a is a schematic diagram of a path searching process according to an embodiment of the present application;
FIG. 3b is a schematic diagram of another path searching process according to an embodiment of the present disclosure;
FIG. 3c is a schematic diagram of a DFS search process;
FIG. 3d is a schematic diagram of a DFS+BFS search process;
FIG. 3e is a schematic diagram illustrating a bit string process of a node according to an embodiment of the present disclosure;
FIG. 3f is a schematic diagram of an answer path searching process according to an embodiment of the present disclosure;
FIG. 3g is a schematic diagram of an answer path according to an embodiment of the disclosure;
fig. 4 is a schematic structural diagram of a search device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of another search device according to an embodiment of the present application.
Detailed Description
The terms "first," "second," "third," and "fourth" and the like in the description and in the claims of this application and in the drawings, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
"plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
Embodiments of the present application are described below with reference to the accompanying drawings.
Referring to fig. 1b, fig. 1b is a schematic diagram of a system architecture according to an embodiment of the present application. As shown in fig. 1b, the system comprises at least one terminal device 101 and a computing device 102.
The terminal device 101 is a portable device used by a user, such as a notebook computer, a desktop computer, a tablet computer, a smart phone, a smart watch, a smart bracelet, and the like.
Computing device 102 is a server, a server cluster, a cloud server, a cloud computing service center, or other form of device with computing capabilities.
In one example, the terminal device 101 sends a question request to the computing device 102, the question request including a question statement. The computing device 102 performs word segmentation processing on the question sentence, and obtains at least two entity names from the knowledge graph according to a target vocabulary in the question sentence, wherein the target vocabulary is a first vocabulary in the question sentence, and any entity name in the at least two entity names comprises the target vocabulary or is the target vocabulary. The computing device 102 calculates a score of each entity name in the at least two entity names according to the knowledge graph and the at least two entity names, where the score of each entity name is used to characterize a matching degree of the entity name and the target vocabulary. Computing device 102 determines a starting candidate set from the knowledge-graph based on the score for each entity name; and determining at least one answer path corresponding to the question according to the starting point candidate set, the question and the knowledge graph. The computing device 102 sends at least one answer path corresponding to the question sentence to the terminal device in response to the question request.
In another example, the terminal device 101 obtains at least one answer path corresponding to the question sentence according to the operation performed by the computing device 102.
It can be seen that, in the scheme of the present application, after obtaining at least two entity names including or being the target vocabulary, the scores of the at least two entity names are calculated according to the knowledge graph and the at least two entity names, and the starting point candidate set is determined from the knowledge graph according to the scores of the at least two entity names. Compared with the SSI mode, the method can avoid the situation that entities with different prefixes and identical suffixes are excluded, and therefore a more comprehensive answer path can be obtained.
The implementation flow of the present application is specifically described below.
Referring to fig. 2, fig. 2 is a flow chart of a knowledge question-answer path searching method according to an embodiment of the present application. As shown in fig. 2, the method includes:
s201, the search device acquires a question sentence and a knowledge graph, and acquires at least two entity names according to the target vocabulary.
The searching means may be the computing device 102, or may be the terminal device 101.
The search device divides words of the question sentence to obtain at least two words. The word segmentation method adopted can be a dictionary-based word segmentation method, an understanding-based word segmentation method or a statistical-based word segmentation method. Of course, other word segmentation methods are also possible, and are not limited herein. The searching device acquires a target vocabulary from at least two vocabularies, wherein the target vocabulary is the vocabulary which is the forefront in the question sentence in at least two divided words. The searching device acquires a plurality of entity names from the knowledge graph according to the target vocabulary, and each entity name in the plurality of entity names contains or is the target vocabulary.
In one example, the searching device obtains a plurality of entity names from the knowledge graph according to the target vocabulary, specifically: and comparing the target vocabulary with the names of each entity in the knowledge graph to determine whether the names of the entities contain the target vocabulary or whether the names of the entities are the target vocabulary.
In another example, the search apparatus stores in advance an inverted index table including correspondence between a plurality of words and a plurality of entity names, each of the plurality of entity names including a word corresponding to the entity name, or each of the plurality of entity names being a word corresponding to the entity name. After determining the target vocabulary, the searching device queries the word inverted index table according to the target vocabulary to obtain an entity with the name containing the target vocabulary or an entity with the name as the target vocabulary. By traversing the inverted index table according to the target vocabulary, the names of each entity in the target vocabulary and the knowledge graph do not need to be compared one by one, so that useless calculation is reduced, and efficiency is improved.
S202, the searching device calculates the score of each entity name in the at least two entity names according to the knowledge graph and the at least two entity names.
The searching device obtains the weight of the target vocabulary, calculates the score of each entity name in the plurality of entity names according to the weight of the target vocabulary, the number of vocabularies in each entity name, the average value of the number of vocabularies of all entity names in the knowledge graph and the frequency of occurrence of the target vocabulary in each entity name.
In one example, the searching apparatus may obtain weights of words in all entity names in the knowledge graph with other devices; or after the searching device acquires the knowledge graph, word segmentation is carried out on the name of each entity in the knowledge graph, and the weight of each vocabulary of the entity name is calculated. Taking the target vocabulary as an example, the weight of the target vocabulary may be expressed as:
idf=ln (D-v+0.5) -ln (v+0.5) formula (1)
Wherein D is the number of entities in the knowledge graph, v is the number of entities in the knowledge graph, which are named as entities containing the target vocabulary. According to the formula (1), the weights of words in all entity names in the knowledge graph can be calculated.
In one example, the score for any one of the plurality of entity names may be expressed as:
score=idf value (1+a)/(value+a (1-b (1-d/avg_d))) formula (2)
Wherein a and b are super-parameters, a is greater than or equal to 0, b is greater than or equal to 0 and less than or equal to 1. The larger b, the greater the impact of the number of words in the entity name on the score. The larger a, the greater the impact on score of the number of occurrences of the target vocabulary in the entity name. value is the number of times the target vocabulary appears in the entity name, d is the number of vocabularies in the entity name, avg_d is the average value of the number of vocabularies of all entities in the knowledge graph.
It should be noted that, the number of words in the entity name refers to the number of words included in the entity name, for example, two words, respectively "AAU" and "3910" are included in the entity name "AAU 3910".
And S203, the searching device determines a starting point candidate set from the knowledge graph according to the scores of at least two entity names.
It should be understood that the starting points in the starting point candidate set are starting points of answer paths corresponding to question sentences, and these starting points correspond to target words.
In the above manner, the search means may obtain a score of each entity name of the plurality of first entities. The searching device sorts the names of the entities according to the score of each entity according to the order from big to small, and selects the names of k entities with the top score from the sorting result. Where k is an integer greater than 0. The value of k may be manually set, or the searching device may select a value as k within a preset range, or may be determined in other manners, which is not limited herein. And the searching device acquires the entities corresponding to the k entity names from the knowledge graph according to the k entity names. Since there may be entities of the same name in the knowledge graph, the number of entities corresponding to the k entity names respectively is greater than or equal to k.
And the searching device matches the entities corresponding to the k entity names with the target vocabulary according to a first preset matching rule so as to obtain m entities. The starting point candidate set includes m entities. The first preset matching rule comprises at least one fuzzy matching rule.
In one example, the fuzzy matching rules may include, but are not limited to, at least one of:
the editing distance between the entity name and the target vocabulary does not exceed a threshold value x; or,
the editing distance between the preset word and the entity name in the target word does not exceed a threshold y; or,
the entity name consists of the original vocabulary and digits.
It should be understood that the threshold x and the threshold y may be set manually or may be set by the search device itself within a preset range.
It should be noted that the edit distance between words refers to the minimum number of editing operations required to change one word to another word. The edit distance may also be referred to as a ly Wen Sitan string distance. For example, the edit distance between the word "mustrip" and the word "Muster" is 2, and the edit distance between the word "mustrip" and the word "Musler" is 3.
In one example, if there are a plurality of fuzzy matching rules, the searching device sets different priorities for the plurality of fuzzy matching rules; the higher the priority of the fuzzy matching rule, the more difficult the entity name is to be matched with the target vocabulary.
For example, the plurality of fuzzy matching rules includes a fuzzy matching rule a, a fuzzy matching rule B and a fuzzy matching rule C. The priority of the fuzzy matching rule A is higher than that of the fuzzy matching rule B, and the priority of the fuzzy matching rule B is higher than that of the fuzzy matching rule C. When matching, firstly matching the names of all the searched entities with a target vocabulary according to a fuzzy matching rule A; if the entity with the name matched with the target vocabulary exists when matching is carried out according to the fuzzy matching rule A, the names of all the searched entities do not need to be matched with the target vocabulary according to the fuzzy matching rule B and the fuzzy matching rule C. And if the names of all the searched entities are not matched with the target vocabulary according to the fuzzy matching rule A, matching the names of all the searched entities with the target vocabulary according to the fuzzy matching rule B. If the entity with the name matched with the target vocabulary exists when matching is carried out according to the fuzzy matching rule B, the names of all the searched entities do not need to be matched with the target vocabulary according to the fuzzy matching rule C. And if the names of all the searched entities are not matched with the target vocabulary according to the fuzzy matching rule B, matching the names of all the searched entities with the target vocabulary according to the fuzzy matching rule C.
The priorities of the fuzzy matching rules are set manually.
By adopting the matching mode, whether the target vocabulary is matched with the entity name or not can be judged quickly.
For example, assume a total of 15555 entities in the knowledge graph, where 32 entity names contain the word "AAU". Among the 32 entity names, in addition to the entity name "AAU", entity names such as "AAU3910" and "AAU3911" are included. The entity name "AAU" includes one word "AAU", the number of words of the entity name "AAU" is 1, and the entity names "AAU3910" and "AAU3911" each include two words, namely "AAU", "3910" and "AAU", "3911", respectively. The vocabulary numbers of entity names "AAU3910" and "AAU3911" are both 2.
The searching device calculates the weight of the vocabulary AAU, and the obtained weight is as follows:
idf=ln(15555-32+0.5)-ln(32+0.5)=6.169
the entity name 'AAU' only comprises one word 'AAU', the corresponding value of the entity name 'AAU' is 1, and d is 1; the entity names "AAU3910" and "AAU3911" both contain two words, and the value corresponding to the entity names "AAU3910" and "AAU3911" is 1 and d is 2. The search means determines the parameters a and b in the formula (2) to be 1.5 and 0.75, respectively. avg_d is obtained by integrating all entities of the knowledge graph, and avg_d is 4.003.
After determining the specific values of the parameters, the searching means calculates scores of the 32 entity name names. The score of the name "AAU" is:
6.169*1*(1+1.5)/(1+1.5*(1-0.75*(1-1/4.003)))=9.313
the scores of the entities named "AAU3910" and "AAU3911" are both:
6.169*1*(1+1.5)/(1+1.5*(1-0.75*(1-2/4.003)))=7.962
as can be seen from the above calculation process, the greater the number of words contained in the entity name, the lower the corresponding score. The above-mentioned calculation process of the scores of the other entity names among the 32 entity names is not developed here.
After obtaining the scores of the 32 entity names, the searching device sorts the scores of the 32 entity names according to the order from big to small, and selects the top 3 entity names, namely 'AAU', 'AAU 3910' and 'AAU 3911', respectively; the searching device matches 3 entity names with the vocabulary AAU according to a preset matching rule to obtain matched entity names, the searching device obtains the entities corresponding to the matched entity names from the knowledge graph, and the starting point candidate set comprises the entities corresponding to the matched entity names.
S204, the searching device determines at least one answer path corresponding to the question sentence according to the starting point candidate set, the question sentence and the knowledge graph.
Wherein the intermediate vocabulary set includes at least one vocabulary. The vocabulary included in the intermediate vocabulary set is the vocabulary except the target vocabulary and the target attribute words in the question sentence.
The answer path of the question sentence includes two parts: the first path and a path between the end of the first path to the end of the answer path. Wherein the first path is a path between an entity in the starting point candidate set to an entity whose name matches the last word in the intermediate word set. The searching device determines at least one answer path corresponding to the question according to the starting point candidate set and can divide the answer path into two steps.
The first step, the searching device determines at least one first path in the knowledge graph.
The searching device takes the entity in the starting point candidate set as a starting point, and acquires a plurality of sub-path sets based on the intermediate vocabulary set, wherein the number of the sub-path sets is smaller than or equal to the number of the entities in the starting point candidate set. The method comprises the steps that a starting point of each sub-path in an ith sub-path set in a plurality of sub-path sets is a first entity, an end point of each sub-path is a second entity, the depth between the first entity and the second entity in a knowledge graph is not larger than the searching depth, and the second entity is an entity with names matched with an ith vocabulary in an intermediate vocabulary set in the knowledge graph; when i=1, the first entity is an entity in the starting point candidate set; when i is greater than 1, the first entity is the end point of the sub-path in the i-1 th sub-path set; the jth vocabulary in the intermediate vocabulary set is adjacent to the jth+1th vocabulary in the questioning sentence, and the jth vocabulary is in front of the jth+1th vocabulary in the questioning sentence; j is not more than N-1, N is the number of the medium vocabulary set vocabulary; at least one first path is acquired from the plurality of sub-path sets.
The depth between the first entity and the second entity refers to the length of a path with a start point and an end point between the first entity and the second entity, respectively. The paths between the first entity and the second entity are the starting point and the end point: the first entity-third entity-second entity, the length of this path is 2.
In one example, for a sub-path in the ith sub-path set, the corresponding search depth is a-b-c, where a is the maximum value of the preset answer path length, b is the length of the path from the first entity to the start of the answer path, and c is N-i. When there are multiple paths from the start of the answer path to the first entity, b is the minimum of the lengths in these paths. Since the first entities of the sub-paths in the different sub-path sets may not be the same, the search depth used in conducting the search is also different. It should be understood that the search depth adopted when the first entity starts to perform the search is the search depth corresponding to the first entity.
In another example, the search depth is a preset threshold, such as 2.
In the knowledge graph, the searching device searches in the range of the searching depth by taking the end point of any one sub-path P in the ith sub-path set in the multiple sub-path sets as a starting point, and matches the searched name of the entity O with the (i+1) th vocabulary in the medium vocabulary set according to a second preset matching rule; if the name of the entity O is matched with the (i+1) th vocabulary, the searching device acquires the path from the end point of the sub-path P to the entity O. Wherein the i+1th sub-path set includes paths between the end point of the sub-path P and the entity O.
Specifically, the searching device searches for an entity within a range of search depth from the entity Q in the starting point candidate set in the knowledge graph; matching the searched name of the entity E1 with the 1 st vocabulary in the medium vocabulary set; if so, the search means obtains a path between entity E1 and entity Q and determines that the first set of sub-paths includes a path between entity E1 and entity Q. Wherein the number of nodes on the path between entity E1 to entity Q is greater than or equal to 2. If the searching device does not search the entity or the searched entity name is not matched with the 1 st vocabulary in the intermediate vocabulary set, the searching device acquires the entity different from the entity Q from the starting point candidate set, and repeatedly executes the operation until all the entities in the starting point candidate set are executed, so that the first sub-path set can be obtained.
After the first sub-path set is obtained, searching for an entity within a range of search depth from the end point of the sub-path P1 in the first sub-path set; matching the searched name of the entity E2 with the 2 nd vocabulary in the medium vocabulary set; if so, the searching device acquires a path between the entity E2 and the entity P1 and determines that the second sub-path set comprises the path between the entity E2 and the entity P1; wherein the number of entities on the path between entity E2 to entity P1 is greater than or equal to 2. If the searching device does not search the entity or the searched entity name is not matched with the 2 nd vocabulary of Chinese in the preposition collection, the searching device acquires a sub-path different from the sub-path P1 from the first sub-path set, and repeatedly executes the operation until all the sub-paths in the first sub-path set are executed, so that a second sub-path set can be obtained.
Repeating the steps until all the vocabularies in the medium vocabulary set are matched, and obtaining a plurality of sub-path sets.
For example, fig. 3a is a part of a knowledge graph. Assuming that the starting candidate set includes entity a, the intermediate vocabulary set includes vocabulary B, vocabulary C, and vocabulary D. Starting from entity a, a search is performed within a search depth range, where the search depth is 2, and the search means may search for entity 1, entity 2, entity 3, entity 4, entity B1 and entity B2. The searching device respectively matches the names of the entities with the vocabulary B, and the obtained matching result is: the names of the entity B1 and the entity B2 are matched with the vocabulary B (i.e. the first vocabulary in the intermediate vocabulary set), and the names of the other entities are not matched with the vocabulary B. The search means obtain paths between entity a and entities B1 and B2, respectively, the paths comprising entity a-entity 1-entity B1, entity a-entity 3-entity B2, entity a-entity 2-entity 4-entity B2. For other entities in the starting point candidate set, the processing may be performed as described above. After processing all nodes in the starting point candidate set, a first set of sub-paths may be obtained. Wherein the first set of sub-paths comprises paths between entity a and entities B1 and B2, respectively.
The search means starts from the entity B1 and the entity B2, respectively, and searches within a search depth range. The searching means may search for entity 4, entity 5, entity 6, entity C1, entity C2, entity C3 and entity 11. The search device matches the names of these entities with the vocabulary C (i.e., the second vocabulary in the intermediate vocabulary set), and the matching result obtained is: the name of entity C1, the name of entity C2 and the name of entity C3 are matched with the vocabulary C, and the other entity names are not matched with the vocabulary C. The search means obtain a path between entity C1 and entity B1 (i.e. entity B1-entity C1), a path between entity C2 and entity B2 (i.e. entity B2-entity 5-entity C2) and a path between entity C3 and entity B2 (i.e. entity B2-entity 5-entity C3). And processing the end points of other sub-paths in the first sub-path set according to the processing mode of the entity B2 and the entity B2. In the case where the end points of all the sub-paths in the first sub-path set are processed in the above manner, the searching means may obtain the second sub-path set. Wherein the second set of sub-paths comprises a path between entity C1 to entity B1, a path between entity C2 to entity B2 and a path between entity C3 to entity B2.
The search means starts from entity C1, entity C2 and entity C3, respectively, and searches within the search depth range. The searching means may search for entity 11, entity 7, entity 8, entity 9, entity 10, entity D1 and entity D2. The search device matches the names of these entities with the vocabulary D (i.e., the third vocabulary in the intermediate vocabulary set), and the matching result obtained is: the names of the entity D1 and the entity D2 are matched with the vocabulary D, and the other entity names are not matched with the vocabulary D. The search means obtain a path between entity D1 and entity C2 (i.e. entity C2-entity 7-entity D1), and a path between entity D2 and entity C3 (i.e. entity C3-entity 5 entity D2). And processing the end points of other sub-paths in the second sub-path set according to the processing mode of the entity C2 and the entity C3. In the case where the end points of all the sub-paths in the second sub-path set are processed in the above manner, the searching means may obtain a third sub-path set. Wherein the third set of sub-paths comprises a path between entity C2 to entity D1 and a path between entity C3 to entity D2.
The searching device determines two first paths according to the first sub-path set, the second sub-path set and the third sub-path set, wherein the two first paths are respectively: entity A-entity 3-entity B2-entity 5-entity C2-entity 7-entity D1, and entity A-entity 3-entity B2-entity 5-entity C3-entity 8-entity D2.
By way of further example, fig. 3b is a portion of a knowledge graph. Assuming that the starting candidate set includes an entity a, the intermediate vocabulary set includes a vocabulary B, a vocabulary C, and a vocabulary D, and the length of the preset answer path is 6. Starting from entity a, searches are performed within a corresponding search depth range, where the search depth is 6-0-2=4, where 0 represents the length of the path from entity a to entity a and 2 represents 2 words after the intermediate vocabulary set B. The searching means may search for entity 1, entity 2, entity 3, entity 4, entity 5, entity 6, entity 11, entity B1, entity B2, entity B3, entity C1, entity C2 and entity C3. The searching device respectively matches the names of the entities with the vocabulary B, and the obtained matching result is: the names of entity B1, entity B2, and entity B3 are matched with vocabulary B (i.e., the first vocabulary in the intermediate vocabulary set), and the names of the other entities are not matched with vocabulary B. The search means obtain paths between entity a and entity B1, entity B2 and entity B3, respectively, the paths comprising entity a-entity 1-entity B1, entity a-entity B2, entity a-entity 2-entity 4-entity B3. For other entities in the starting point candidate set, the processing may be performed as described above. After processing all nodes in the starting point candidate set, a first set of sub-paths may be obtained. Wherein the first set of sub-paths comprises paths for entity a to entity B1, entity B2 and entity B3, respectively.
The search means starts from the entity B1 and searches within a corresponding search depth range, here a search depth of 6-2-1=3. Where 2 denotes the path length from entity B1 to entity a, and 1 denotes that there is 1 word behind the word C in the intermediate word set. The searching means may search for the entity C1 and the entity 11. The search device matches these entity names with the vocabulary C (i.e., the second vocabulary in the intermediate vocabulary set), respectively, and the obtained matching result is: the name of entity C1 matches vocabulary C and the name of entity 11 does not match vocabulary C. The search means acquires a path between the entity B1 to the entity C1: entity B1-entity C1. The search means starts from the entity B2 and searches within a corresponding search depth range, here a search depth of 6-1-1=4. Where the first "1" indicates the length of the path of entity B2 to entity a and the second "1" indicates that there is 1 word behind the vocabulary C in the intermediate vocabulary set. The searching means may search for entity 3, entity 5, entity C2, entity C3, entity 7, entity 8 and entity 9. The search device respectively matches the names of the entities with the vocabulary C, and the obtained matching result is: the name of entity C2 and the name of entity C3 are matched with vocabulary C, and the other entity names are not matched with vocabulary C. The search means obtain a path between entity B2 and entity C2 (i.e. entity B2-entity 3-entity 5-entity C2) and a path between entity C3 and entity B2 (i.e. entity B2-entity 3-entity 5-entity C3). The search means starts from entity B3 and searches within a corresponding search depth range, here a search depth of 6-3-1=2. Where 3 denotes the path length from entity B3 to entity a and 1 denotes that there is 1 word behind the word C in the intermediate vocabulary set. The search means may search for entity 6 and entity C4. The search device respectively matches the names of the entities with the vocabulary C, and the obtained matching result is: the name of entity C4 matches vocabulary C, and the other entity names do not match vocabulary C. The search means obtain a path between entity B3 to entity C4 (i.e. entity B3-entity 6-entity C4). And processing the end points of other sub-paths in the first sub-path set according to the processing mode of the entity B2 and the entity B2. The searching means may obtain the second set of sub-paths when the end points of all sub-paths in the first set of sub-paths are processed in the above-described manner. Wherein the second set of sub-paths comprises a path between entity C1 and entity B1, a path between entity C2 and entity B2, a path between entity C3 and entity B2, and a path between entity C4 and entity B3.
The search means starts from entity C1 and searches within a corresponding search depth range, here a search depth of 6-3-0=3. Where 3 denotes the path length from entity C1 to entity A, 0 denotes the last word in the set of intermediate words, which is followed by no words. The searching means may search for the entity 11. The search means matches the name of the entity 11 with the vocabulary D, and the obtained matching result is: the name of the entity 11 does not match the vocabulary D. The search means starts from entity C2 and entity C3, respectively, and searches within a corresponding search depth range, where the search depth is 6-4-0=2. Where 4 represents the path length from entity C2 to entity A and the path length from entity C3 to entity A, 0 represents the last word in the set of words D which is the intermediate word, and no words follow. The searching means may search for entity 7, entity 8, entity 9, entity 10, entity D1 and entity D2. The searching device respectively matches the names of the entities with the vocabulary D, and the obtained matching result is as follows: the names of the entity D1 and the entity D2 are matched with the vocabulary C, and the names of the other entities are not matched with the vocabulary D. The search device obtains a path between entity C2 to entity D1 (i.e., entity C2-entity 7-entity D1) and a path between entity C3 to entity D2 (i.e., entity C3-entity 8-entity D2). The search means starts from entity C4 and searches within a corresponding search depth range, here a search depth of 6-5-0=1. Where 5 denotes the path length from entity C4 to entity A, 0 denotes the last word in the set of intermediate words, which is followed by no words. The searching means may search for the entity D3. The searching device matches the name of the entity D3 with the vocabulary D, and the obtained matching result is: the name of entity D3 matches the vocabulary D. The search means obtain the path of entity C4 to entity D3 (i.e. entity C4-entity D3). The searching means may obtain a third set of sub-paths when the end points of all sub-paths in the second set of sub-paths are processed in the above-described manner. Wherein the third set of sub-paths comprises a path from entity C2 to entity D1, a path from entity C3 to entity D2, and a path from entity C4 to entity D3.
The searching device determines three first paths according to the first sub-path set, the second sub-path set and the third sub-path set, wherein the three first paths are respectively: entity A entity B2-entity 3-entity 5-entity C2-entity 7-entity D1, entity A entity B2-entity 3-entity 5-entity C3-entity 8-entity D2, and entity A-entity 2-entity 4-entity B3-entity 6-entity C4-entity D3.
In one possible example, the searching means may use a DFS method or a dfs+bfs method when searching from an entity.
The following illustrates the two ways:
the manner of DFS is described below.
As shown in fig. 3C, one entity includes four downstream entities, entity a, entity B, entity C, and entity D, respectively. Wherein, the downstream entity of the entity A comprises an entity E, and the downstream entity of the entity E comprises an entity X; the downstream entities of entity B include entity F, entity G and entity H, the downstream entity of entity F includes entity Y, the downstream entity of entity C, the downstream entity of entity D, the downstream entity of entity G and the downstream entity of entity H are not illustrated in fig. 3B, and are replaced with ellipses. When searching by adopting the DFS mode, the searching sequence is as follows: entity A-entity E-entity X-entity B-entity F-entity Y-entity G-entity H downstream of entity G downstream entity of entity H-entity C-entity D downstream of entity C. After searching an entity, the searching device matches the name of the entity with the corresponding vocabulary in the intermediate vocabulary set.
The manner in which dfs+bfs is described below.
As shown in fig. 3D, one entity includes four downstream entities, entity a, entity B, entity C, and entity D, respectively. The searching means groups the downstream nodes of this entity, e.g. entity a and entity B as a group and entity C and entity D as a group. The searching device traverses the group containing the entity A and the entity B, and the names of all the entities in the group are respectively matched with the corresponding vocabulary in the medium vocabulary set. After traversing, the searching device searches and obtains a downstream entity of the entity A and a downstream entity of the entity B in a BFS mode, and stores the downstream entity of the entity A and the downstream entity of the entity B, wherein the downstream entity of the entity A comprises the entity E, and the downstream entity of the entity B comprises the entity F, the entity G and the entity H. The search means groups the downstream entities of entity a and entity B, groups entity E and entity F, and groups entity G and entity H. The searching device traverses the group containing the entity E and the entity F, and the names of all the entities in the group are respectively matched with the corresponding vocabulary in the medium vocabulary set. After traversing, the searching device adopts a BFS mode to search the downstream entity of the entity E and the downstream entity of the entity F, and stores the downstream entity of the entity E and the downstream entity of the entity F. Wherein, the downstream entity of entity E comprises entity X, and the downstream entity of entity F comprises entity Y. The search means matches the names of the entities X and Y with the corresponding vocabulary in the intermediate vocabulary set, respectively. Since neither entity X nor entity Y has a downstream entity, after matching is completed, the search device traverses the group containing entity G and entity H, and matches the names of all entities in the group with the corresponding vocabulary in the intermediate vocabulary set, respectively. After traversing, the searching device adopts a BFS mode to search the downstream entity of the entity G and the downstream entity of the entity H, and stores the downstream entity of the entity G and the downstream entity of the entity H. The searching means processes the downstream entity of entity G and the downstream entity of entity H in the manner described above. After the processing is finished, the searching device traverses the group containing the entity C and the entity D, and the names of all the entities in the group are respectively matched with the corresponding vocabulary in the medium vocabulary set. After traversing, the searching device adopts a BFS mode to search the downstream entity of the entity C and the downstream entity of the entity D, and stores the downstream entity of the entity C and the downstream entity of the entity D. The search means processes the downstream entity of entity C and the downstream entity of entity D in the manner described above.
As can be seen from the above description, with respect to the BFS manner, the searching apparatus stores all the downstream entities of an entity and the downstream entities of the downstream entity when searching for the entity, even if the names of the entities are not matched. This is a memory resource for the search device. In contrast, the DFS mode searches only one path at a time, occupies little memory, but cannot search in parallel, and has low search efficiency. Therefore, a mode based on DFS+BFS is provided, and the search can be regarded as adopting the mode of DFS among groups, adopting the mode of BFS in groups, that is, the advantages of the mode of DFS+BFS comprise the advantages of the mode of DFS and the advantages of the mode of BFS, thereby not only reducing the memory occupation, but also being capable of efficiently searching in parallel.
In one possible example, after searching an entity Z, the searching device determines that the number of entities on the path between the entity Z and the starting candidate set and the number of unmatched words in the intermediate word set are greater than the width of the knowledge graph, which means that an entity matching the unmatched words in the intermediate word set cannot be found in the downstream entity of the entity Z, so that the searching device does not perform an operation of matching the names of the entity Z with the corresponding words in the intermediate word set, and performs the next search directly, and does not consider the downstream entity of the entity Z and the subsequent entities thereof in the next search. As shown in fig. 3a, assuming that the width of the knowledge graph is 7, the searching apparatus determines that the number of entities on the path between the entity 5 and the entity a is 4 after searching the entity 5, if the intermediate vocabulary set includes vocabulary a, vocabulary B, vocabulary C, vocabulary D, vocabulary E and vocabulary F, there are 4 unmatched vocabularies, namely vocabulary C, vocabulary D and vocabulary E. The search means determines that an entity whose name matches the vocabulary C, the vocabulary D, the vocabulary E, and the vocabulary F, respectively, cannot be found in the subsequent entities of the entity 5, and therefore the search means deletes the entity 5 and the subsequent entities (including the entity C2, the entity C3, the entity 7, the entity 8, the entity C9, the entity 10, the entity D1, the entity D2) from the search space and does not search for these entities. This approach can improve search efficiency and at the same time reduce memory usage.
In one possible example, the searching means obtains a path between the end point of the sub path P to the entity O, including:
determining a downstream node of the first node according to a bit string of the first node, wherein in the bit string of the first node, the position of a bit with a value of a first value in the bit string of the first node is used for indicating the number of the downstream node of the first node, and a path between the end point of the sub path P and the entity O comprises the first node and the downstream node of the first node; the first node is any entity except the entity O in the path between the end point of the sub-path P and the entity O.
Wherein the first value may be 1 or 0.
Specifically, the searching device may start at the end point of the sub-path P, determine a node downstream of the end point of the sub-path P according to the bit string of the end point of the sub-path P, and determine a node downstream of the end point of the sub-path P according to the bit string of the node downstream of the end point of the sub-path P until the entity O is found.
As shown in fig. 3e, it is assumed that the node "northbound" includes downstream nodes "stone house", "down mountain" and "handan", numbered 3,4,5, respectively. In the prior art, 3 numbers are stored by a variable-length array, and if one element in the array occupies 4 bytes and the header of the array also occupies 4 bytes, the storage of the array occupies 16 bytes. The number of the downstream node is saved by a bit string in the present application. For example, the bit string of the node "Hebei province" may be "00111000", where the position of the bit with a value of 1 in the bit string is used to represent the number of the node downstream of the node "Hebei province". In the bit string "00111000", the positions of bits having a value of 1 in the bit string are 3,4, and 5. In this way, the content occupied by storing the number of the downstream node can be reduced.
It should be noted that, the manner in which the search device matches the entity name with the vocabulary in the intermediate vocabulary set is referred to as the description of the fuzzy matching rule in S203, which will not be described here.
And a second step, the searching device determines a path from the end point of at least one first path to the end point of the answer path in the knowledge graph.
In a possible embodiment, the search means searches within a range of search depths starting from the end point of the second path. The search depth is a-d, a is the maximum value of the preset answer path length, d is the length of a second path, and the second path is one of at least one first path. And matching the attribute vocabulary of the searched entity with the target attribute words. The matching rule may refer to the relevant description of the fuzzy matching rule in S203. In one example, the fuzzy matching rule includes: whether the attribute vocabulary of the entity and the target attribute vocabulary are synonyms or not. If the entity is the synonym, the attribute vocabulary of the entity is matched with the target attribute vocabulary; if the attribute words are not synonyms, the attribute words representing the entity are not matched with the target attribute words. The searching device obtains entities with attribute words matched with the target attribute words, and determines the entities as the end point of the answer path.
In another possible implementation manner, the searching device obtains an entity containing the target attribute according to the bit string of the target attribute word, and in the bit string of the target attribute word, bits with the value of the second value are located at the position of the bit string of the target attribute word and used for indicating the number of the entity containing the target attribute; and starting from the end point of the second path in the knowledge graph, BFS is carried out in the search depth range, and the end point of the answer path is determined according to the searched entity and the entity containing the target attribute. The search depth is a-d, a is the maximum value of the preset answer path length, and d is the length of the second path. Wherein the second path is one of the at least one first path, and the end point of the answer path comprises an intersection of the searched entity and the entity containing the target attribute; and acquiring a path from the end point of the second path to the end point of the answer path, and determining at least one answer path according to the second path and the path from the end point of the second path to the end point of the answer path.
Wherein the second value may be 1 or 0. The second value may be the same as the first value or different from the first value.
Specifically, determining the number of the entity containing the target attribute according to the bit string of the target attribute; the bit of the second value in the bit string of the target attribute is used for representing the number of the entity containing the target attribute. For example, the bit string of the target attribute is "10000010", which indicates that the entity numbered 1 and the entity numbered 7 each contain the target attribute. In the knowledge graph, BFS is carried out in a search depth range from the end point of the second path, and whether the number of the searched entity is the number represented by the position of the bit with the second value in the bit string of the target attribute is determined. The search depth is a-d, a is the maximum value of the preset answer path length, and d is the length of the second path. If yes, the entity determines the end point of the answer path. In another example, the search apparatus acquires an entity containing the target attribute from the bit string of the target attribute, then acquires an intersection of the entity containing the target attribute and the searched entity, and determines the entity in the intersection as an end point of the answer path.
It will be appreciated that there are a plurality of second paths, different second paths, the lengths of which may be different. Therefore, when BFS is performed from the end of a second path, the search depth used is based on the maximum value of the length of the second path and the preset answer path length, and the specific calculation method is described in the above related description and will not be described herein.
Because the number of the entities in the knowledge graph is relatively large, for example, whether the entities contain the target attribute is determined by traversing the entities in the knowledge graph, the efficiency is relatively low. By introducing the bit string of the attribute, when the end point of the answer path is determined, the entity containing the target attribute is determined only by the bit string of the target attribute, and then the terminal of the answer path is determined from the entities, so that the search efficiency of the answer path is improved.
In one possible example, the search device may acquire the bit string of the target attribute, which may be acquired from another device or may be acquired according to a knowledge graph.
The searching device acquires the entities containing the same attribute for each attribute according to the attribute of the entity in the traversing knowledge graph, acquires the numbers of the entities, and determines the bit string of each attribute according to the numbers of the entities.
It should be appreciated that for a synonym attribute, the attributes that it represents are the same. The synonym attribute for the endpoint attribute is shown in table 1 below.
TABLE 1
When searching the end point of the answer path based on the target attribute, the synonym of the target attribute is considered, so that the answer path is searched more comprehensively.
In a possible embodiment, the search apparatus further comprises a display interface for displaying all answer paths of the question sentence.
In a specific example, assuming that the question sentence is "AAU 2310QUM weight value", the order in which the search device searches for entities in the knowledge graph is shown in fig. 3 f. The downstream entities of the entity 'AAU' are grouped in pairs, each group is regarded as a whole for DFS traversal, and each traversal of a group is equivalent to traversing all the entities in the group in sequence, namely traversing all the entities in the group by BFS. The path is obtained after the matching of the intermediate vocabulary is successful: AAU-AAU 3910-2310-QUM, starting from 2310QUM, uses BFS to find entities containing the attribute "weight". In fig. 3f, the thick solid boxes are entities having the attribute "weight", and the thin solid boxes are entities not having the attribute "weight". Whether an entity has an attribute of "weight" can be obtained from a bit string of the attribute of "weight", and the specific process will not be described here. In this way, the end point "engineering index" of the answer path can be obtained. The searching device obtains an answer path of the question sentence according to the end point engineering index and the path AAU-AAU3910-2310 QUM: AAU-AAU3910-2310 QUM-engineering index weight=39.5 kg. The search means displays the result as shown in fig. 3 g.
It can be seen that when the starting point candidate set is determined, the starting point of the more comprehensive answer path can be obtained by introducing the score of the entity name, so that the more comprehensive answer path can be obtained. When searching the path, the character string representing the number of the downstream entity of the entity is introduced, so that the downstream node of the entity does not need to be stored, and the occupation of the memory can be reduced. When searching for the end point of the answer path, the search efficiency can be improved by introducing the bit string of the attribute. The DFS+BFS mode is adopted in the path searching process, so that the searching efficiency can be improved. According to the scheme of the method and the device, the path with variable length can be obtained, words in the answer path are not required to be limited to words in the question sentence, and searching is more flexible.
It should be understood that nodes and entities in this application are two different designations of the same object, and the meaning of the expressions is the same.
Referring to fig. 4, a schematic structural diagram of a search device according to an embodiment of the present application is shown. As shown in fig. 4, the search apparatus 400 includes:
an obtaining unit 401, configured to obtain a question sentence and a knowledge graph of a user; acquiring at least two entity names according to a target vocabulary, wherein the at least two entity names are names of entities in a knowledge graph, and the target vocabulary is a first vocabulary in a question sentence; any entity name in the at least two entity names comprises a target vocabulary or is a target vocabulary;
A calculating unit 402, configured to calculate, according to the knowledge graph and the at least two entity names, a score of each entity name in the at least two entity names, where the score of each entity name is used to represent a matching degree between the entity name and the target vocabulary;
a determining unit 403, configured to determine a starting point candidate set from the knowledge graph according to the scores of at least two entity names, where the starting point corresponds to the target vocabulary; and determining at least one answer path corresponding to the question according to the starting point candidate set, the question and the knowledge graph.
In a possible implementation manner, the determining unit 403 is further configured to obtain a vocabulary inverted list according to the knowledge graph; the vocabulary inverted list comprises corresponding relations between a plurality of vocabularies and a plurality of entity names, one vocabulary corresponds to one or more entity names, each entity name in the plurality of entity names comprises a vocabulary corresponding to the entity name, or each entity name is a vocabulary corresponding to the entity name;
in terms of acquiring at least two entity names according to the target vocabulary, the acquiring unit 401 is specifically configured to:
traversing the word inverted list according to the target word to obtain at least two entity names.
In one possible implementation, the computing unit 402 is specifically configured to:
acquiring the weight of a target vocabulary; and calculating the score of each entity name according to the weight of the target vocabulary, the number of the vocabulary in each entity name, the average value of the number of the vocabulary of all entity names in the knowledge graph and the occurrence frequency of the target vocabulary in each entity name.
In one possible implementation manner, the determining unit 403 is specifically configured to, in determining the starting candidate set from the knowledge-graph according to the scores of at least two entity names:
sorting the scores of at least two entity names in order from big to small; respectively matching k entity names with the scores ranked at the front with the target vocabulary according to a first preset matching rule to obtain matched entity names; determining the starting candidate set includes entities in the knowledge-graph that are named as matching entity names.
In one possible implementation manner, in determining at least one answer path corresponding to the question according to the starting candidate set, the question and the knowledge graph, the determining unit 403 is specifically configured to:
acquiring a preposition collection and a target attribute word from a question sentence; the medium vocabulary set comprises vocabularies except the target vocabularies and the target attribute words in the question sentences; determining at least one first path from the knowledge graph according to the starting point candidate set and the intermediate vocabulary set, wherein the starting point of each first path is an entity in the starting point candidate set, and the end point of each first path is an entity in the knowledge graph, the name of which is matched with the last vocabulary in the intermediate vocabulary set; and determining at least one answer path according to the at least one first path and the target attribute word, wherein the attribute of the end point of the answer path comprises the target attribute.
In one possible implementation manner, the determining unit 403 is specifically configured to, in determining at least one first path from the knowledge-graph according to the starting candidate set and the intermediate prepositioned collection:
taking an entity in the starting point candidate set as a starting point, and acquiring a plurality of sub-path sets based on the intermediate vocabulary set; the method comprises the steps that a starting point of each sub-path in an ith sub-path set in a plurality of sub-path sets is a first entity, an end point of each sub-path is a second entity, and depth between the first entity and the second entity in a knowledge graph is not greater than search depth; the second entity is an entity with names matched with the ith vocabulary in the intermediate vocabulary set in the knowledge graph; when i=1, the first entity is an origin candidate set entity; when i is greater than 1, the first entity is the end point of the sub-path in the i-1 th sub-path set; the jth vocabulary in the intermediate vocabulary set is adjacent to the jth+1th vocabulary in the questioning sentence, and the jth vocabulary is in front of the jth+1th vocabulary in the questioning sentence; j is not more than N-1, N is the number of the medium vocabulary set vocabulary; at least one first path is acquired from the plurality of sub-path sets.
In one possible implementation, in taking an entity in the starting point candidate set as a starting point, the determining unit 403 is specifically configured to:
In the knowledge graph, starting from the end point of any one sub-path P in the ith sub-path set in the multiple sub-path sets, searching in the range of the searching depth, and matching the searched name of the entity O with the (i+1) th vocabulary in the medium vocabulary set according to a second preset matching rule; if the name of the entity O is matched with the (i+1) th vocabulary, acquiring a path between the end point of the sub-path P and the entity O; wherein the i+1th sub-path set includes paths between the end point of the sub-path P and the entity O.
In one possible implementation, in terms of acquiring a path between the end point of the sub-path P and the entity O, the determining unit 403 is specifically configured to:
determining a downstream node of the first node according to a bit string of the first node, wherein in the bit string of the first node, the position of a bit with a value of a first value in the bit string of the first node is used for indicating the number of the downstream node of the first node, and a path between the end point of the sub path P and the entity O comprises the first node and the downstream node of the first node; the first node is any entity except the entity O in the path between the end point of the sub-path P and the entity O.
In one possible implementation, the search is performed within a range of search depths in a dfs+bfs search manner.
In one possible implementation manner, in determining at least one answer path according to at least one first path and the target attribute word, the determining unit 403 is specifically configured to:
acquiring an entity containing the target attribute according to the bit string of the target attribute word, wherein in the bit string of the target attribute word, bits with values of second values are positioned at the position of the bit string of the target attribute word and used for indicating the number of the entity containing the target attribute; BFS is carried out in a search depth range from the end point of a second path in the knowledge graph, and the end point of an answer path is determined according to the searched entity and the entity containing the target attribute, wherein the second path is one of at least one first path, and the end point of the answer path comprises the intersection of the searched entity and the entity containing the target attribute; and acquiring a path from the end point of the second path to the end point of the answer path, and determining at least one answer path according to the second path and the path from the end point of the second path to the end point of the answer path.
It should be noted that, for the specific functional implementation of the search apparatus, reference may be made to the description of the risk assessment method, for example, the acquiring unit 401 is configured to execute the relevant content of S201, the calculating unit 402 is configured to execute the relevant content of S202, and the determining unit 403 is configured to execute the relevant content of S203-S204, which will not be described herein. Each unit or module in the apparatus may be separately or all combined into one or several other units or modules, or some unit(s) or module(s) may be further split into a plurality of units or modules with smaller functions, which may achieve the same operation without affecting the implementation of the technical effects of the embodiments of the present invention. The above units or modules are divided based on logic functions, and in practical applications, the functions of one unit (or module) may be implemented by a plurality of units (or modules), or the functions of a plurality of units (or modules) may be implemented by one unit (or module).
Based on the description of the method embodiment and the device embodiment, please refer to fig. 5, a schematic structural diagram of a search device 500 is further provided in the embodiment of the present invention. The search apparatus 500 shown in fig. 5 (the apparatus 500 may be a computer device in particular) comprises a memory 501, a processor 502, a communication interface 503 and a bus 504. The memory 501, the processor 502, and the communication interface 503 are communicatively connected to each other via a bus 504.
The Memory 501 may be a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access Memory (Random Access Memory, RAM).
The memory 501 may store a program, and when the program stored in the memory 501 is executed by the processor 502, the processor 502 and the communication interface 503 are used to perform the respective steps of the knowledge question-and-answer path search method of the embodiment of the present application.
The processor 502 may employ a general-purpose central processing unit (Central Processing Unit, CPU), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), graphics processor (graphics processing unit, GPU) or one or more integrated circuits for executing associated programs to perform the functions required to be performed by the elements in the search apparatus 400 of the present embodiment or to perform the knowledge question-answer path search method of the present method embodiment.
The processor 502 may also be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the knowledge question-answer path search method of the present application may be completed by an integrated logic circuit of hardware in the processor 502 or an instruction in the form of software. The processor 502 described above may also be a general purpose processor, a digital signal processor (Digital Signal Processing, DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 501, and the processor 502 reads the information in the memory 501, and combines the hardware thereof to complete the functions required to be executed by the units included in the network risk association evaluation device of the embodiment of the present application, or execute the knowledge question-answer path searching method of the embodiment of the method of the present application.
The communication interface 503 enables communication between the apparatus 500 and other devices or communication networks using a transceiving apparatus such as, but not limited to, a transceiver. For example, data may be acquired through the communication interface 503.
Bus 504 may include a path to transfer information between various components of device 500 (e.g., memory 501, processor 502, communication interface 503).
It should be noted that although the apparatus 500 shown in fig. 5 only shows a memory, a processor, a communication interface, those skilled in the art will appreciate that in a particular implementation, the apparatus 500 also includes other devices necessary to achieve proper operation. Also, as will be appreciated by those skilled in the art, the apparatus 500 may also include hardware devices that implement other additional functions, as desired. Furthermore, it will be appreciated by those skilled in the art that the apparatus 500 may also include only the devices necessary to implement the embodiments of the present application, and not necessarily all of the devices shown in fig. 5.
The embodiment of the application also provides a chip, which comprises a processor and a data interface, wherein the processor reads instructions stored in a memory through the data interface so as to realize the knowledge question-answer path searching method.
Optionally, as an implementation manner, the chip may further include a memory, where the memory stores instructions, and the processor is configured to execute the instructions stored on the memory, and when the instructions are executed, the processor is configured to execute the knowledge question-answer path searching method.
Embodiments also provide a computer readable storage medium having instructions stored therein, which when run on a computer or processor, cause the computer or processor to perform one or more steps of any of the methods described above.
Embodiments of the present application also provide a computer program product comprising instructions. The computer program product, when run on a computer or processor, causes the computer or processor to perform one or more steps of any of the methods described above.
Those of skill in the art will appreciate that the functions described in connection with the various illustrative logical blocks, modules, and algorithm steps described in connection with the disclosure herein may be implemented as hardware, software, firmware, or any combination thereof. If implemented in software, the functions described by the various illustrative logical blocks, modules, and steps may be stored on a computer readable medium or transmitted as one or more instructions or code and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media corresponding to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another (e.g., based on a communication protocol). In this manner, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described herein. The computer program product may include a computer-readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that the computer-readable storage medium and data storage medium do not include connections, carrier waves, signals, or other transitory media, but are actually directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor" as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functions described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combination codec. Moreover, the techniques may be fully implemented in one or more circuits or logic elements.
The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). The various components, modules, or units are described in this application to emphasize functional aspects of the devices for performing the disclosed techniques but do not necessarily require realization by different hardware units. Indeed, as described above, the various units may be combined in an encoded hardware unit in combination with suitable software and/or firmware, or provided by interoperating hardware units, including one or more processors as described above.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to specific descriptions of corresponding step procedures in the foregoing method embodiments, and are not repeated herein.
It should be understood that in the description of the present application, unless otherwise indicated, "/" means that the associated object is an "or" relationship, e.g., a/B may represent a or B; wherein A, B may be singular or plural. Also, in the description of the present application, unless otherwise indicated, "a plurality" means two or more than two. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural. In addition, in order to clearly describe the technical solutions of the embodiments of the present application, in the embodiments of the present application, the words "first", "second", and the like are used to distinguish the same item or similar items having substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ. Meanwhile, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion that may be readily understood.
In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the division of the unit is merely a logic function division, and there may be another division manner when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted or not performed. The coupling or direct coupling or communication connection shown or discussed with each other may be through some interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a read-only memory (ROM), or a random-access memory (random access memory, RAM), or a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, such as a digital versatile disk (digital versatile disc, DVD), or a semiconductor medium, such as a Solid State Disk (SSD), or the like.
The foregoing is merely a specific implementation of the embodiments of the present application, but the protection scope of the embodiments of the present application is not limited thereto, and any changes or substitutions within the technical scope disclosed in the embodiments of the present application should be covered by the protection scope of the embodiments of the present application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

Claims (22)

1. A knowledge question-answering path searching method, comprising:
acquiring a question sentence and a knowledge graph of a user;
acquiring at least two entity names according to a target vocabulary, wherein the at least two entity names are names of entities in the knowledge graph, and the target vocabulary is the first vocabulary in the question sentence; any entity name in the at least two entity names comprises the target vocabulary or is the target vocabulary;
calculating the score of each entity name in the at least two entity names according to the knowledge graph and the at least two entity names, wherein the score of each entity name is used for representing the matching degree of the entity name and the target vocabulary;
determining a starting point candidate set from the knowledge graph according to the scores of the at least two entity names, wherein the starting point corresponds to the target vocabulary;
And determining at least one answer path corresponding to the question according to the starting point candidate set, the question and the knowledge graph.
2. The method according to claim 1, wherein the method further comprises:
obtaining a vocabulary inverted list according to the knowledge graph; the vocabulary inverted list comprises a plurality of vocabularies and a plurality of entity names, one vocabulary corresponds to one or more entity names, each entity name in the plurality of entity names comprises a vocabulary corresponding to the entity name, or each entity name is a vocabulary corresponding to the entity name;
the obtaining at least two entity names according to the target vocabulary includes:
traversing the word inverted list according to the target word to obtain the at least two entity names.
3. The method according to claim 1 or 2, wherein the calculating the score of each of the at least two entity names according to the knowledge-graph and the at least two entity names comprises:
acquiring the weight of a target vocabulary;
and calculating the score of each entity name according to the weight of the target vocabulary, the number of the vocabulary in each entity name, the average value of the number of the vocabulary of all entity names in the knowledge graph and the occurrence frequency of the target vocabulary in each entity name.
4. A method according to any one of claims 1-3, wherein said determining a starting candidate set from the knowledge-graph based on the scores of the at least two entity names comprises:
sorting the scores of the at least two entity names in order from big to small;
respectively matching k entity names with the scores ranked at the front with the target vocabulary according to a first preset matching rule to obtain matched entity names;
determining that the starting candidate set includes an entity in the knowledge-graph named the matched entity name.
5. The method according to any one of claims 1-4, wherein the determining at least one answer path corresponding to the question sentence according to the starting candidate set, the question sentence, and the knowledge-graph comprises:
acquiring a prepositioning set and a target attribute word from the question sentence; wherein the medium vocabulary set comprises vocabularies except the target vocabularies and the target attribute words in the question sentence;
determining at least one first path from the knowledge graph according to the starting point candidate set and the intermediate vocabulary set, wherein the starting point of each first path is an entity in the starting point candidate set, and the end point of each first path is an entity in the knowledge graph, the names of which are matched with the last vocabulary in the intermediate vocabulary set;
And determining the at least one answer path according to the at least one first path and the target attribute word, wherein the attribute of the end point of the answer path comprises a target attribute.
6. The method of claim 5, wherein said determining at least one first path from the knowledge-graph based on the starting candidate set and the mesoset comprises:
taking the entity in the starting point candidate set as a starting point, and acquiring a plurality of sub-path sets based on an intermediate vocabulary set; the starting point of each sub-path in the ith sub-path set in the plurality of sub-path sets is a first entity, the end point of each sub-path is a second entity, and the depth between the first entity and the second entity in the knowledge graph is not greater than the search depth; the second entity is an entity with names matched with the ith vocabulary in the intermediate vocabulary set in the knowledge graph; when i=1, the first entity is the starting point candidate set entity; when i is greater than 1, the first entity is the end point of the sub-path in the i-1 th sub-path set; the jth vocabulary in the intermediate vocabulary set is adjacent to the jth+1th vocabulary in the question sentence, and the jth vocabulary is in front of the jth+1th vocabulary in the question sentence; the j is not more than N-1, and the N is the number of the words in the medium word set;
And acquiring the at least one first path according to the plurality of sub-path sets.
7. The method of claim 6, wherein the obtaining a plurality of sub-path sets based on the intermediate vocabulary starting with the entity in the starting candidate set comprises:
in the knowledge graph, starting from the end point of any one sub-path P in the ith sub-path set in the multiple sub-path sets, searching in the range of the search depth, and matching the searched name of the entity O with the (i+1) th vocabulary in the medium vocabulary set according to a second preset matching rule;
if the name of the entity O is matched with the (i+1) th vocabulary, acquiring a path from the end point of the sub-path P to the entity O;
wherein the (i+1) th sub-path set includes paths from the end point of the sub-path P to the entity O.
8. The method of claim 7, wherein the obtaining a path between the end point of the sub-path P to the entity O comprises:
determining a downstream node of a first node according to a bit string of the first node, wherein in the bit string of the first node, the position of a bit with a first value in the bit string of the first node is used for indicating the number of the downstream node of the first node, and a path between the end point of a sub-path P and the entity O comprises the first node and the downstream node of the first node;
The first node is any entity except the entity O in the path between the end point of the sub-path P and the entity O.
9. The method according to claim 7 or 8, characterized in that the search within the range of search depths is performed in a deep fast search dfs+breadth first search BFS.
10. The method of any of claims 5-9, wherein the determining the at least one answer path from the at least one first path and the target property word comprises:
acquiring an entity containing the target attribute according to the bit string of the target attribute word, wherein in the bit string of the target attribute word, bits with values of second values are positioned at the positions of the bit string of the target attribute word and used for indicating the number of the entity containing the target attribute;
in the knowledge graph, breadth-first search BFS is carried out in a search depth range from the end point of a second path, the end point of the answer path is determined according to the searched entity and the entity containing the target attribute, wherein the second path is one of at least one first path, and the end point of the answer path comprises the intersection of the searched entity and the entity containing the target attribute;
And acquiring a path between the end point of the second path and the end point of the answer path, and determining at least one answer path according to the path between the second path and the end point of the answer path.
11. A search apparatus, comprising:
the acquisition unit is used for acquiring the question sentences and the knowledge graph of the user; acquiring at least two entity names according to a target vocabulary, wherein the at least two entity names are names of entities in the knowledge graph, and the target vocabulary is the first vocabulary in the question sentence; any entity name in the at least two entity names comprises the target vocabulary or is the target vocabulary;
the calculation unit is used for calculating the score of each entity name in the at least two entity names according to the knowledge graph and the at least two entity names, and the score of each entity name is used for representing the matching degree of the entity name and the target vocabulary;
the determining unit is used for determining a starting point candidate set from the knowledge graph according to the scores of the at least two entity names, and the starting point corresponds to the target vocabulary; and determining at least one answer path corresponding to the question according to the starting point candidate set, the question and the knowledge graph.
12. The apparatus of claim 11, wherein the device comprises a plurality of sensors,
the determining unit is further used for obtaining a vocabulary inverted list according to the knowledge graph; the vocabulary inverted list comprises a plurality of vocabularies and a plurality of entity names, one vocabulary corresponds to one or more entity names, each entity name in the plurality of entity names comprises a vocabulary corresponding to the entity name, or each entity name is a vocabulary corresponding to the entity name;
in the aspect of acquiring at least two entity names according to the target vocabulary, the acquiring unit is specifically configured to:
traversing the word inverted list according to the target word to obtain the at least two entity names.
13. The apparatus according to claim 11 or 12, wherein the computing unit is specifically configured to:
acquiring the weight of a target vocabulary;
and calculating the score of each entity name according to the weight of the target vocabulary, the number of the vocabulary in each entity name, the average value of the number of the vocabulary of all entity names in the knowledge graph and the occurrence frequency of the target vocabulary in each entity name.
14. The apparatus according to any of the claims 11-13, wherein in the aspect of said determining a starting candidate set from said knowledge-graph based on the scores of said at least two entity names, said determining unit is specifically configured to:
Sorting the scores of the at least two entity names in order from big to small;
respectively matching k entity names with the scores ranked at the front with the target vocabulary according to a first preset matching rule to obtain matched entity names;
determining that the starting candidate set includes an entity in the knowledge-graph named the matched entity name.
15. The apparatus according to any one of claims 11-14, wherein in terms of said determining at least one answer path corresponding to said question according to said starting candidate set, said question and said knowledge-graph, said determining unit is specifically configured to:
acquiring a prepositioning set and a target attribute word from the question sentence; wherein the medium vocabulary set comprises vocabularies except the target vocabularies and the target attribute words in the question sentence;
determining at least one first path from the knowledge graph according to the starting point candidate set and the intermediate vocabulary set, wherein the starting point of each first path is an entity in the starting point candidate set, and the end point of each first path is an entity in the knowledge graph, the names of which are matched with the last vocabulary in the intermediate vocabulary set;
And determining the at least one answer path according to the at least one first path and the target attribute word, wherein the attribute of the end point of the answer path comprises a target attribute.
16. The apparatus according to claim 15, wherein said determining unit is specifically configured to, in said determining at least one first path from said knowledge-graph based on said starting candidate set and said mesoset:
taking the entity in the starting point candidate set as a starting point, and acquiring a plurality of sub-path sets based on an intermediate vocabulary set; the method comprises the steps that the starting point of each sub-path in an ith sub-path set in a plurality of sub-path sets is a first entity, the end point of each sub-path is a second entity, and the depth between the first entity and the second entity in the knowledge graph is not larger than the search depth; the second entity is an entity with names matched with the ith vocabulary in the intermediate vocabulary set in the knowledge graph; when i=1, the first entity is the starting point candidate set entity; when i is greater than 1, the first entity is the end point of the sub-path in the i-1 th sub-path set; the jth vocabulary in the intermediate vocabulary set is adjacent to the jth+1th vocabulary in the question sentence, and the jth vocabulary is in front of the jth+1th vocabulary in the question sentence; the j is not more than N-1, and the N is the number of the words in the medium word set;
And acquiring the at least one first path according to the plurality of sub-path sets.
17. The apparatus according to claim 16, wherein in the aspect of obtaining a plurality of sub-path sets based on an intermediate vocabulary set starting from an entity in the starting candidate set, the determining unit is specifically configured to:
in the knowledge graph, starting from the end point of any one sub-path P in the ith sub-path set in the multiple sub-path sets, searching in the range of the search depth, and matching the searched name of the entity O with the (i+1) th vocabulary in the medium vocabulary set according to a second preset matching rule;
if the name of the entity O is matched with the (i+1) th vocabulary, acquiring a path from the end point of the sub-path P to the entity O;
wherein the (i+1) th sub-path set includes paths from the end point of the sub-path P to the entity O.
18. The apparatus according to claim 17, wherein in terms of said obtaining a path between the end point of the sub-path P to the entity O, the determining unit is specifically configured to:
determining a downstream node of a first node according to a bit string of the first node, wherein in the bit string of the first node, the position of a bit with a first value in the bit string of the first node is used for indicating the number of the downstream node of the first node, and a path between the end point of a sub-path P and the entity O comprises the first node and the downstream node of the first node;
The first node is any entity except the entity O in the path between the end point of the sub-path P and the entity O.
19. The apparatus according to claim 17 or 18, wherein the search within the range of search depths is performed in a search mode of depth fast search dfs+breadth first search BFS.
20. The apparatus according to any one of claims 15-19, wherein in said determining said at least one answer path from said at least one first path and said target property word, said determining unit is specifically configured to:
acquiring an entity containing the target attribute according to the bit string of the target attribute word, wherein in the bit string of the target attribute word, bits with values of second values are positioned at the positions of the bit string of the target attribute word and used for indicating the number of the entity containing the target attribute;
BFS is carried out in the searching depth range from the end point of a second path in the knowledge graph, and the end point of the answer path is determined according to the searched entity and the entity containing the target attribute, wherein the second path is one of at least one first path, and the end point of the answer path comprises the intersection of the searched entity and the entity containing the target attribute;
And acquiring a path between the end point of the second path and the end point of the answer path, and determining at least one answer path according to the path between the second path and the end point of the answer path.
21. A search apparatus comprising a processor and a memory, wherein the memory is for storing program code, the processor being for executing the program code to implement the method of any of claims 1 to 10.
22. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, implements the method of any one of claims 1 to 10.
CN202210915024.4A 2022-07-29 2022-07-29 Knowledge question-answering path searching method and related device Pending CN117520487A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210915024.4A CN117520487A (en) 2022-07-29 2022-07-29 Knowledge question-answering path searching method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210915024.4A CN117520487A (en) 2022-07-29 2022-07-29 Knowledge question-answering path searching method and related device

Publications (1)

Publication Number Publication Date
CN117520487A true CN117520487A (en) 2024-02-06

Family

ID=89751899

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210915024.4A Pending CN117520487A (en) 2022-07-29 2022-07-29 Knowledge question-answering path searching method and related device

Country Status (1)

Country Link
CN (1) CN117520487A (en)

Similar Documents

Publication Publication Date Title
US8171029B2 (en) Automatic generation of ontologies using word affinities
US8583667B2 (en) Large graph measurement
US10698912B2 (en) Method for processing a database query
US20170242855A1 (en) Fast, scalable dictionary construction and maintenance
Välimäki et al. Space-efficient algorithms for document retrieval
WO2021072874A1 (en) Dual array-based location query method and apparatus, computer device, and storage medium
US8140546B2 (en) Computer system for performing aggregation of tree-structured data, and method and computer program product therefor
EP4109293A1 (en) Data query method and apparatus, electronic device, storage medium, and program product
CN110727769A (en) Corpus generation method and device, and man-machine interaction processing method and device
CN116383412B (en) Functional point amplification method and system based on knowledge graph
CN117520487A (en) Knowledge question-answering path searching method and related device
US10235432B1 (en) Document retrieval using multiple sort orders
CN114911826A (en) Associated data retrieval method and system
CN114547233A (en) Data duplicate checking method and device and electronic equipment
CN113420219A (en) Method and device for correcting query information, electronic equipment and readable storage medium
CN112199461A (en) Document retrieval method, device, medium and equipment based on block index structure
CN114579573B (en) Information retrieval method, information retrieval device, electronic equipment and storage medium
CN112860712B (en) Block chain-based transaction database construction method, system and electronic equipment
CN116737762B (en) Structured query statement generation method, device and computer readable medium
CN115809248B (en) Data query method and device and storage medium
CN116579297A (en) Entity linking method and device, storage medium and computer equipment
CN114996439A (en) Text search method and device
JP2006146355A (en) Method and device for retrieving similar document
CN117743527A (en) Method, system and storage medium for extracting user search word path
CN116069830A (en) Information query method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination