WO2018157790A1 - 一种相关实体确定方法、装置、计算设备及存储介质 - Google Patents

一种相关实体确定方法、装置、计算设备及存储介质 Download PDF

Info

Publication number
WO2018157790A1
WO2018157790A1 PCT/CN2018/077416 CN2018077416W WO2018157790A1 WO 2018157790 A1 WO2018157790 A1 WO 2018157790A1 CN 2018077416 W CN2018077416 W CN 2018077416W WO 2018157790 A1 WO2018157790 A1 WO 2018157790A1
Authority
WO
WIPO (PCT)
Prior art keywords
entity
target
candidate
determining
related entity
Prior art date
Application number
PCT/CN2018/077416
Other languages
English (en)
French (fr)
Inventor
李潇
张锋
王策
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018157790A1 publication Critical patent/WO2018157790A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present application relates to the field of data processing technologies, and in particular, to a related entity determining method, apparatus, computing device, and storage medium.
  • the related entity can be considered as other entities co-occurring with the queried target entity in the same query. It is important for the user to obtain relevant information of the queried target entity; for example, after the user inputs the query statement, the search engine will search for The target entity corresponding to the query statement (such as a webpage link) is displayed to the user, and the related entity co-occurring with the target entity in the query process is also recommended to the user, so as to guide the user to perform the search again, and improve the user to obtain relevant information.
  • the target entity corresponding to the query statement such as a webpage link
  • a typical scenario is that after the search engine searches for the target entity corresponding to the query statement, in addition to displaying the searched target entity on the search result page, it can also be in the setting area of the search result page (for example The left area) displays the recommended related entities so that the user can search again.
  • Applicants of the present application have found that currently, other entities that co-occur with a target entity are counted by open text (such as news text) to determine the related entities of a target entity; however, the content of open text entry has certain limitations. Sexuality and timeliness, which makes the outcome of the relevant entities through open text statistics uncontrollable, resulting in a lower recall rate for the relevant entities.
  • the recall rate indicates the ratio of the number of related entities identified to the total number of related entities.
  • the embodiments of the present application provide a related entity determining method, apparatus, computing device, and storage medium to improve the recall rate of the related entity determination result.
  • a method for determining related entities including:
  • the embodiment of the present application further provides a related entity determining apparatus, including:
  • a target knowledge map acquisition module configured to acquire a target knowledge map, wherein the target knowledge map has at least a target entity
  • a candidate entity set determining module configured to determine a candidate entity set of the target entity in the target knowledge map, where the candidate entity set includes: a candidate entity corresponding to each side of the target entity;
  • a related entity determining module is configured to determine a related entity of the target entity according to the foregoing set of candidate entities.
  • the embodiment of the present application further provides a computing device, including the foregoing related entity determining apparatus.
  • the related entity determining method includes: acquiring a target knowledge map, where the target knowledge map has at least a target entity; determining, in the target knowledge map, a set of candidate entities of the target entity; And including: a candidate entity corresponding to each side of the target entity; and determining a related entity of the target entity according to the candidate entity set.
  • the embodiment of the present application adopts a target knowledge map having at least a target entity, and extracts a set of candidate entities in the target knowledge map that can reach the target entity, and then determines a related entity of the target entity according to the candidate entity set, because
  • the relevant information of the target entity included in the target knowledge map is more comprehensive, so the relevant comprehensive information of the target entity history can be mined with great probability, so that the related entity results of the mined target entity are more comprehensive and the determined target is improved.
  • the recall rate of the entity's related entity results.
  • FIG. 1 is a flowchart of a method for determining a related entity according to an embodiment of the present application
  • FIG. 2 is a flowchart of a method for acquiring a target knowledge map according to an embodiment of the present application
  • FIG. 3 is another flowchart of a method for determining related entities according to an embodiment of the present application.
  • FIG. 5 is a flowchart of a method for determining a related entity of a target entity according to a set of candidate entities
  • FIG. 6 is still another flowchart of a method for determining related entities according to an embodiment of the present application.
  • FIG. 7 is a flowchart of a method for determining recommended ordering of related entities according to an embodiment of the present application.
  • FIG. 8 is a flowchart of another method for determining recommended ordering of related entities according to an embodiment of the present application.
  • FIG. 9 is a flowchart of still another method for determining recommended ordering of related entities according to an embodiment of the present application.
  • FIG. 10 is a structural block diagram of a related entity determining apparatus according to an embodiment of the present application.
  • FIG. 11 is another structural block diagram of a related entity determining apparatus according to an embodiment of the present application.
  • FIG. 12 is a block diagram showing the hardware structure of a computing device according to an embodiment of the present application.
  • FIG. 1 is a flowchart of a related entity determining method according to an embodiment of the present disclosure.
  • the method is applicable to a computing device having data computing capability, and the computing device is configured to execute a program corresponding to the method shown in FIG.
  • the computing device may select a server on the network side or an electronic device such as a computer on the user side;
  • the related entity determining method provided by the embodiment of the present application may include:
  • Step S100 Acquire a target knowledge map, and the target knowledge map has at least a target entity.
  • the target entity is an entity to be determined by the embodiment of the present application.
  • the embodiment of the present application may specify that the target entity of the related entity needs to be determined, and the target entity has the target entity.
  • the knowledge map is intended to describe the various entities or concepts that exist in the real world; each entity or concept can be identified by a globally unique ID (identity number), and each attribute-value pair can be used Characterizes the intrinsic properties of an entity, and relation is used to connect two entities to characterize the association between them; therefore, the knowledge map is mainly composed of nodes and edges between connected nodes, where a node can represent an entity or Concept, the edge of a connected node can be made up of attributes or relationships between connected nodes;
  • the data source of the knowledge map can be implemented by collecting structured data from the encyclopedia site and various vertical sites to cover most common sense knowledge, and the data is generally of high quality, but the update is slow;
  • the data source of the knowledge map can also be enriched by extracting the attribute-values of related entities from various semi-structured data (such as HTML tables); in addition, by query log (query log) Discovering new entities or new entity attributes can also continuously expand the coverage of knowledge maps;
  • embodiments of the present application may construct a target knowledge map through a data source that includes a target entity.
  • the embodiment of the present application can also understand the meaning of the input text including the target entity by using the knowledge map constructed by the data source of the target entity, so that the related information of the target entity is more understood.
  • the embodiment of the present application can obtain the input text including the target entity, and after constructing the knowledge map by using the data source containing the target entity, map the given named entity in the input text to the constructed knowledge map. On the target entity, get the target knowledge map.
  • Step S110 Determine a set of candidate entities of the target entity in the target knowledge map, where the set of candidate entities includes: candidate entities corresponding to the number of sides of the reachable target entity.
  • an entity may be considered as a node, and the entities may be connected by an edge; the target entity may reach a candidate entity through one edge, or may reach a candidate entity through multiple edges, and the embodiment of the present application may Determining the target entity, determining that the target entity passes the entity touched by one edge, and obtains the candidate entity corresponding to the number of sides. Starting from the target entity, determining the entity that the target entity touches through the two sides, and obtaining the candidate entity corresponding to the number of sides, By analogy, the candidate entities corresponding to each side number are obtained.
  • the embodiment of the present application may set a range of the number of sides, where the number of sides may include a plurality of sides, and the number of the sides in the range of the number of sides may be determined by the embodiment of the present application.
  • the target entity starts with the candidate entity touched by the corresponding number of sides, and obtains a candidate entity corresponding to each side of the target entity;
  • the embodiment of the present application can determine that the candidate entity that touches the target entity with one edge obtains the candidate entity corresponding to the number of sides. For example, the embodiment of the present application may determine that the candidate entity that touches the target entity by the two edges obtains the candidate entity corresponding to the number of edges. For the third number, the embodiment of the present application may determine the candidate entity that touches the target entity by three edges. A candidate entity corresponding to the number of sides is obtained, thereby obtaining a candidate entity corresponding to each side of the number of edges.
  • the setting of the edge number range is only an optional manner for determining the candidate entities corresponding to the number of sides of the target entity.
  • the embodiment of the present application may also determine that the target entity maps the target entities in the target knowledge map. All the number of sides involved, thereby determining the candidate entities corresponding to the number of sides of the target entity.
  • Step S120 Determine a related entity of the target entity according to the set of candidate entities.
  • the determined embodiment of the present application may be the related entity of the target entity.
  • the candidate entities in the candidate entity set may be de-reprocessed to reserve duplicate candidate entities.
  • the embodiment of the present application adopts a target knowledge map having at least a target entity, and mines a set of candidate entities in the target knowledge map that can reach the target entity, and then determines related entities of the target entity according to the set of candidate entities, and the target entity included in the target knowledge map
  • the relevant information is more comprehensive, so the relevant comprehensive information of the target entity history can be mined with great probability, so that the related entity results of the mined target entity are more comprehensive, and the recall of the related entity results of the identified target entity is improved. rate.
  • the data source containing the target entity can be obtained, and the target knowledge map is constructed by the data source containing the target entity, and the target knowledge map is acquired in a relatively simple manner, and the related target entities are related.
  • the information is more comprehensive, and the result of the related entity of the target entity that is finally mined has a higher recall rate.
  • FIG. 2 shows an optional implementation method for acquiring the target knowledge map provided by the embodiment of the present application. Referring to FIG. 2, the method may include:
  • Step S200 Acquire input text, where a plurality of named entities are pre-defined in the input text, and the named entity includes at least a target entity.
  • the input text may be a type of open text, and the input text is recorded with at least the target entity, and other entities may also be recorded.
  • the embodiment of the present application may prescribe the naming including at least the target entity in the input text.
  • Entity a named entity can be thought of as a given name, institution name, place name, and other entities identified by name in the input text.
  • Step S210 mapping a given named entity in the input text to a target entity of the knowledge map to obtain a target knowledge map; the knowledge map is constructed by a data source including the target entity.
  • the embodiment of the present application may map the given named entity in the input text to the target entity of the knowledge map to obtain the target knowledge map;
  • Mapping the given named entity in the input text to the target entity of the knowledge map which can be recognized as a process of linking a given named entity in the input text to an unambiguous target entity in the knowledge map, which may include Merger of synonymous entities, disambiguation of ambiguous entities, etc.;
  • the embodiment of the present application may use a named entity linking technology to map a given named entity in the input text to a target entity of the knowledge map, thereby giving the input text
  • the named entity is linked to the unambiguous target entity in the knowledge map;
  • the named entity link technology can mainly improve the information filtering capability of the online recommendation system and the Internet search engine.
  • the embodiment of the present application may set a range of the number of edges used by the candidate entity in the target knowledge map to mine the target entity, so as to obtain the target knowledge map, and then mine the number of edges corresponding to the edge number range.
  • FIG. 3 is another flowchart of the related entity determining method provided by the embodiment of the present application. Referring to FIG. 3, the method may be include:
  • Step S300 Acquire a target knowledge map, and the target knowledge map has at least a target entity.
  • step S300 may be implemented by the method shown in FIG. 2, or the target knowledge map may be constructed by using a data source that includes the target entity.
  • Step S310 Obtain a preset range of the number of sides, and the range of the number of sides includes a plurality of sides.
  • Step S320 Determine, according to the number of sides included in the range of the edge number, candidate entities corresponding to the number of sides of the target entity in the target knowledge map, to obtain a set of candidate entities of the target entity.
  • the embodiment of the present application may determine a candidate entity that is reached by the corresponding number of edges starting from the target entity, thereby determining the reachable target entity.
  • the candidate entity corresponding to each side of the number obtains a set of candidate entities of the target entity.
  • the candidate entities corresponding to the number of sides include: “Xiaohong”, “Xiaoqiang” and "Movie A”.
  • the candidate entities corresponding to the number of sides include: “Xiaoqiang” and “Xiaorong”; starting from the target entity “Xiaoming", the candidate entities corresponding to the number of sides 3 include: “Xiaorong”; According to this, the candidate entity corresponding to the number of sides of the target entity "Xiaoming" can be determined, and the set of candidate entities in the target knowledge map that can reach the target entity is obtained; the candidate entity set can specifically include:
  • range of the set number of sides may not be limited to the number of sides one to three described above, but the number of sides included in the range of the number of sides may be set according to actual conditions.
  • Step S330 determining related entities of the target entity according to the set of candidate entities.
  • the embodiment of the present application may directly determine the candidate entity included in the candidate entity set as the related entity of the target entity to implement the determination of the related entity of the target entity.
  • a candidate entity may exist in different edge numbers
  • a candidate entity may exist in a candidate entity corresponding to the number of sides.
  • the actual relationship between a candidate entity and the target may be multiple; based on this, in order to improve the accuracy of the relationship between the related entity and the target entity,
  • the embodiment of the present application may perform deduplication processing on the candidate entities in the candidate entity set to reserve the candidate entity with the smallest number of edges among the repeated candidate entities;
  • FIG. 5 illustrates an optional method flow for determining a related entity of the target entity according to the set of candidate entities.
  • the method may include:
  • Step S400 If there are duplicate candidate entities corresponding to different numbers of sides in the set of candidate entities, the candidate entities in the candidate entity set are subjected to deduplication processing to reserve candidate entities with the smallest number of edges among the repeated candidate entities.
  • the embodiment of the present application may determine candidate entities corresponding to the number of sides of the target entity. If there are duplicate candidate entities and the repeated candidate entities correspond to different numbers of edges, the reserved edge is followed. The principle of the smallest candidate entity, deduplication processing the repeated candidate entities in the candidate entity set, thereby obtaining the dequantized candidate entity set;
  • the set of candidate entities includes:
  • the candidate entities corresponding to the edge number one and the second number include the repeated candidate entities of “Xiaoqiang”, and the candidate entities corresponding to the edge number two and the edge number three include the repeated candidate entities of “small volume”. Then, according to the principle of the candidate entity with the smallest number of reserved edges, the candidate entities in the candidate entity set are subjected to de-duplication processing, and the candidate entity “Xiaoqiang” in the number of edges 2 can be removed, and the candidate entity in the number of edges 1 is retained.
  • the number of sides is two: "small capacity.”
  • Step S410 The candidate entity included in the de-reprocessed candidate entity set is used as a related entity of the target entity.
  • the set of candidate entities after de-reprocessing includes: non-repeating candidate entities corresponding to the number of sides of the target entity.
  • the embodiment of the present application adopts a target knowledge map having at least a target entity, and mines a set of candidate entities in the target knowledge map that can reach the target entity. Since the related information of the target entity included in the target knowledge map is more comprehensive, the probability of the target entity can be extremely high. Digging into the past comprehensive information about the history of the target entity, so that the set of candidate entities of the mined target entity is more comprehensive; and then de-duplicating the candidate entities corresponding to different sides of the candidate entity set to obtain the target entity The related entity can improve the relationship precision between the related entities and the target entity, and finally obtain the result of the related entity with high recall rate and high accuracy with the target entity.
  • FIG. 6 is still another flowchart of a method for determining related entities provided by an embodiment of the present application.
  • the method may include:
  • step S500 the input text is obtained, and a plurality of named entities are pre-specified in the input text, and the named entity includes at least the target entity.
  • Step S510 mapping a given named entity in the input text to a target entity of the knowledge map to obtain a target knowledge map; the knowledge map is constructed by a data source including the target entity.
  • Step S520 Obtain a preset range of the number of sides, and the range of the number of sides includes a plurality of sides.
  • Step S530 Determine, according to the number of sides included in the range of the edge number, candidate entities corresponding to the number of sides of the target entity in the target knowledge map, to obtain a set of candidate entities of the target entity.
  • Step S540 If there are duplicate candidate entities corresponding to different numbers of sides in the set of candidate entities, the candidate entities in the candidate entity set are subjected to deduplication processing to reserve candidate entities with the smallest number of edges among the repeated candidate entities.
  • Step S550 The candidate entity included in the de-reprocessed candidate entity set is used as a related entity of the target entity.
  • the related entity determining method provided by the embodiment of the present application may determine the related entity of the movie star "Xiao Ming", that is, the movie star "Xiao Ming" as the target entity, and the embodiment of the present application determines the related entity.
  • the implementation process can be as follows:
  • the server can retrieve the data source containing the target entity "Xiao Ming" from the encyclopedia site, the structured data of various vertical sites, and various semi-structured data and search logs;
  • the server constructs a knowledge map through a data source containing the target entity "Xiaoming"; when constructing, each entity in the data source can be used as a node, and the relationship between the entities as an edge, through the relationship between the entities, and the corresponding edges are connected entity;
  • the server obtains input text including the target entity "Xiao Ming".
  • the input text may also record other entities; specifically, the input text is pre-specified with a plurality of named entities.
  • the named entity includes at least the target entity "Xiao Ming";
  • the server uses the named entity linking technology to map the given named entity in the input text to the target entity of the knowledge map to obtain the target knowledge map; the named entity linking technology can be used to input the given named entity in the input text.
  • the entities in the target knowledge map perform the merging of synonymous entities and the disambiguation of ambiguous entities;
  • the server retrieves a preset range of the number of edges, determines a candidate entity of the target entity corresponding to each side of the number of edges, and obtains a set of candidate entities of the target entity; that is, for each number of sides in the range of edges, the server may Determining, in the target knowledge map, starting from the target entity with the corresponding number of sides, the candidate entity set is obtained; as shown in FIG. 4, the server may separately determine the number of edges from the edge number to the side number three. A candidate entity that touches the target entity through one edge, two edges, and three edges determines a candidate entity corresponding to each side of the target entity, and obtains a candidate entity set;
  • the server may perform deduplication processing on the repeated candidate entities in the candidate entity set, so that only candidate entities that are repeated on different side numbers are retained. a candidate entity having the smallest number of edges; further, the server may use the candidate entity included in the de-reprocessed candidate entity set as the related entity of the target entity;
  • the candidate entities included in the candidate entity set may be related entities of the target entity.
  • the embodiment of the present application may recommend the related entity of the target entity in the scenario that the recommendation entity needs to recommend the related entity; for example, when the user searches for the target entity, The search entry of the related entity of the target entity may be recommended to guide the user to perform the search again, and the convenience of the user to obtain the relevant information of the target entity is improved; correspondingly, the embodiment of the present application may determine the recommended order of each related entity, according to each correlation.
  • the recommended ordering of entities is recommended by the relevant entities, which will be described below.
  • a simple recommendation sorting method is to randomly define the recommended sorting of related entities, and recommend the related related entities in a randomly defined recommendation order; although this method is relatively simple, the accuracy of the recommended sorting may be lower, in some searches.
  • the recommendation scenario is not applicable.
  • the embodiment of the present application provides a recommendation ranking determination scheme of at least three related entities.
  • FIG. 7 is a flowchart of a method for determining recommended ordering of related entities according to an embodiment of the present application. Referring to FIG. 7, the method may include:
  • step S600 the degree of relevance of each related entity to the target entity is counted in the open text.
  • the correlation degree score of the related entity and the target entity in the open text is an application of the present embodiment to calculate the co-occurrence semantic network offline on the open text, and generally considers that if two entities (such as the target entity and a related entity) Frequently appear in the same sentence, chapter, then think that these two entities are strongly related.
  • the degree of correlation between the relevant entity and the target entity can be measured by mutual information between the related entity and the target entity.
  • Mutual Information is a useful information measure in information theory, which can be regarded as a random variable. The amount of information of another random variable, or the uncertainty of a random variable reduced by the knowledge of another random variable;
  • the embodiment of the present application may determine mutual information of the related entity and the target entity, and determine the degree of relevance of the related entity and the target entity by using the mutual information;
  • the embodiment of the present application may determine a first ratio of the number of texts of the related entity and the target entity to the total number of texts, determine a second ratio of the number of texts of the related entity to the total number of texts, and determine the text of the target entity. a third ratio of the quantity to the total number of texts, thereby determining mutual information of the related entity and the target entity according to the first ratio, the second ratio, and the third ratio, and indicating the related entity and the target entity by the determined mutual information Relevant degree score;
  • the big X can be regarded as a set
  • the small x is understood as the specific data obtained in the set
  • the definitions of the big Y and the small y are similar
  • p(x, y) represents the quantity and text of the text in which the entities x and y appear simultaneously
  • the ratio of the total number p(x) represents the ratio of the number of texts in which x appears to the total number of texts
  • p(y) represents the ratio of the number of texts in which y appears to the total number of texts.
  • Step S610 Determine a recommendation ranking of each related entity according to a relevance degree score of each related entity and the target entity, wherein the higher the correlation score, the higher the recommended ranking.
  • the relationship between some related entities is very fixed (becoming common sense), so the probability of mentioning in the open text is small, such as the well-known movie star couples; but these relationships with the target entity are very fixed.
  • Entity the degree of relevance to the target entity is very high, and should be recommended when the relevant entity recommends it. This is difficult to achieve through the first method of co-occurring semantic network; therefore, through the knowledge map, the entity The important setting relationship sets a larger weight, so that the relationship with the target entity is more important, but less related entities mentioned in the open text can be recommended;
  • FIG. 8 is a flowchart of another method for determining a recommended ordering of related entities according to an embodiment of the present application.
  • the method may include:
  • Step S700 Determine, by using the set of candidate entities after the de-duplication processing and the target entity, a nearest entity that is reachable by each related entity.
  • the embodiment of the present application needs to determine the closest entity that each related entity can reach within the range;
  • the nearest entity that is reachable by the related entity in the target knowledge map may be the target entity (the number of edges corresponding to the target entity is one), or may be other related entities (such as the related entity)
  • the number of edges corresponding to the target entity is greater than one, and needs to be transitioned to the target entity through other related entities that are closer to the target entity);
  • the related entities of the target entity include:
  • the number of sides is two: "small capacity.”
  • the related entities "Xiaohong”, “Xiaoqiang” and “Movie A” can directly reach the target entity "Xiaoming", so the nearest entity that can be reached is the target entity.
  • the related entity “Xiao Rong” needs to reach the target entity through the related entity “Xiaoqiang”, so the nearest entity that the related entity “Xiao Rong” can reach is “Xiaoqiang”.
  • Step S710 Determine a relationship weight corresponding to a relationship between each related entity and a reachable nearest entity according to a relationship weight corresponding to each relationship in the preset target knowledge map, and obtain a relationship weight corresponding to each related entity.
  • the embodiment of the present application may use empirical knowledge to set a specific relationship weight for different relationships between entities in the knowledge map, so that the more important the relationship has higher relationship weight;
  • the relationship of the entity in the corresponding knowledge map, the company, the company's shareholders, etc. can be set to a larger relationship weight;
  • the team or teammate of the entity in the corresponding knowledge map can be set. Set a larger relationship weight;
  • the embodiment of the present application may determine the relationship weights of the related entities according to the relationship between the related entities and the nearest entity that is reachable. ;
  • the related entities “Xiaohong”, “Xiaoqiang” and “Movie A” can directly reach the target entity “Xiaoming”, so the nearest entity that can be reached is the target entity, and the related entity “Xiaohong” corresponds.
  • the relationship weight is the relationship weight corresponding to the relationship between “Xiaohong” and “Xiaoming”.
  • the relationship weight corresponding to the related entity “Xiaoqiang” is the relationship weight corresponding to the relationship between “Xiaoqiang” and “Xiaoming”, and the related entity “Movie A” corresponds.
  • the relationship weight is the relationship weight corresponding to the relationship between "Movie A" and "Xiao Ming";
  • the related entity “Xiao Rong” needs to reach the target entity through the related entity “Xiaoqiang”, so the nearest entity that the relevant entity “Xiaorong” can reach is “Xiaoqiang”, then the related entity “Xiaorong” corresponds to the relationship weight “ The relationship weight corresponding to the relationship between Xiaorong and Xiaoqiang;
  • the embodiment of the present application may determine the nearest entity that is reachable by the related entity, and determine the relationship weight corresponding to the related entity by using the relationship weight corresponding to the relationship between the related entity and the reachable nearest entity.
  • step S720 for each related entity, the number of edges of the number of edges corresponding to the related entity is weighted, and combined with the corresponding relationship weight, to obtain a weight score corresponding to each related entity; wherein, the larger the number of edges, the smaller the weight of the edge number.
  • the embodiment of the present application may determine the weight scores of the related entities by combining the weights of the edges of the corresponding entities and the target entities; generally, the larger the number of edges, the more The smaller the number of weights, the lower the number of related entities that are farther away from the target entity, so that some related entities that should not be extended can be removed.
  • the edge corresponding to the related entity is considered The number of edges of the number is one. If the number of edges corresponding to the target entity and the target entity is greater than one, the weight of the edge of the related entity needs to be reduced, so that the weight of the edge of the related entity is less than 1.
  • the number of edges of the small number of edges is doubled, and the number of edges of the large number of edges is doubled.
  • the number of edges corresponding to the related entity is one
  • the corresponding The number of edges is one
  • the number of edges corresponding to the related entity is two
  • the number of edges corresponding to the related entity is three
  • the embodiment may apply the weight of the number of edges corresponding to the related entity to the corresponding relationship weight, and obtain the weight score corresponding to the related entity. For each related entity to process this, the weight score corresponding to each related entity can be obtained.
  • the related entities of the target entity include:
  • the number of sides is two: "small capacity.”
  • the relationship between "Xiaohong” and the nearest entity “Xiaoming” is a wife, and the corresponding relationship weight can be set to 1;
  • the relationship between "Xiaoqiang” and the nearest entity “Xiaoming” is a partner.
  • the corresponding relationship weight can be set to 0.5;
  • the relationship between "Movie A” and the most recent accessible entity “Xiaoming” is the main performance, and the corresponding relationship weight can be set to 0.7;
  • “Xiao Rong" and the nearest entity "Xiaoqiang” The relationship is a wife, and the corresponding relationship weight can be set to 1;
  • the number of sides corresponding to "Xiaohong”, “Xiaoqiang” and “Movie A” is one.
  • the weight of the corresponding side can be set to 1, and the number of sides corresponding to "Small” is two.
  • the corresponding number of sides can be set.
  • the weight is 0.5;
  • weight scores of related entities can be as shown in Table 1 below.
  • Step S730 Determine, according to the weight score corresponding to each related entity, a recommendation order of each related entity, wherein the higher the weight score, the higher the recommended ranking.
  • the relationship between the target knowledge entity and the target entity may be more important, but the recommendation order of the related entity that is rarely mentioned is improved, so that the recommended related entities are sorted. With high precision.
  • FIG. 9 is a flowchart of still another method for determining a recommended ordering of related entities according to an embodiment of the present application.
  • the method may include:
  • Step S800 Count the degree of relevance of each related entity and the target entity in the open text.
  • Step S810 Determine a nearest entity that is reachable by each related entity by using the set of candidate entities after deduplication processing and the target entity as a range.
  • Step S820 Determine a relationship weight corresponding to a relationship between each related entity and a reachable nearest entity according to a relationship weight corresponding to each relationship in the preset target knowledge map, and obtain a relationship weight corresponding to each related entity.
  • Step S830 For each related entity, the edge number of the number of edges corresponding to the related entity is weighted, and combined with the corresponding relationship weight, to obtain a weight score corresponding to each related entity; wherein, the larger the number of edges, the smaller the weight of the edge number.
  • step S840 for each related entity, the correlation degree score corresponding to the related entity is added to the weight score, and the ranking score corresponding to each related entity is obtained.
  • Step S850 Determine, according to the ranking score corresponding to each related entity, a recommended ranking of each related entity, wherein the higher the ranking score, the higher the recommended ranking.
  • the embodiment of the present application can determine the relevance degree score of the related entity and the target entity, and the weight score corresponding to the related entity, so that The correlation degree score of the related entity is added to the weight score, and the ranking score corresponding to the related entity is obtained, thereby performing the recommendation ordering of the related entity.
  • the embodiment of the present application implements the mining of related entities of the target entity based on the target knowledge map including the target entity. Since the related information of the target entity included in the target knowledge map is more comprehensive, the history of the target entity can be mined with great probability. Relevant information, so that the related entity results of the mined target entity are more comprehensive, and the recall rate of the relevant entity entity of the determined target entity is improved;
  • the co-occurrence semantic network, and/or the relationship weight of the target knowledge map itself determines the recommended ranking of the mined related entities, so that when the related entity is recommended, the recommended related entities have higher precision sorting and improve the target.
  • the probability that the relevant information of the entity is utilized by the user, and the convenience of obtaining the related information of the target entity is improved.
  • the related entity determining apparatus provided by the embodiment of the present application is introduced below, and the related entity determining apparatus described below may refer to the related entity determining method described above.
  • the related entity determining apparatus described below may be considered as a functional module architecture required by the computing device to implement the related entity determining method provided by the embodiment of the present application.
  • FIG. 10 is a structural block diagram of a related entity determining apparatus according to an embodiment of the present disclosure.
  • the device may be applied to a computing device having data computing capability, and the computing device may select a server on the network side or an electronic device such as a computer on the user side. ;
  • the related entity determining apparatus provided in this embodiment of the present application may include:
  • the target knowledge map acquiring module 100 is configured to acquire a target knowledge map, where the target knowledge map has at least a target entity;
  • a candidate entity set determining module 200 configured to determine a candidate entity set of the target entity in the target knowledge map; the candidate entity set includes: a candidate entity corresponding to each side of the reachable target entity;
  • the related entity determining module 300 is configured to determine a related entity of the target entity according to the set of candidate entities.
  • the target knowledge map acquiring module 100 is configured to acquire the target knowledge map, and specifically includes:
  • the input text is pre-specified with a plurality of named entities, and the named entity includes at least the target entity;
  • the named entity in the input text is mapped to the target entity of the knowledge map to obtain the target knowledge map; the knowledge map is constructed by the data source containing the target entity.
  • the candidate entity set determining module 200 is configured to determine a set of candidate entities of the target entity in the target knowledge map, and specifically includes:
  • the candidate entities corresponding to the number of sides of the target entity in the target knowledge map are determined, and the set of candidate entities of the target entity is obtained.
  • the related entity determining module 300 is configured to determine, according to the set of candidate entities, a related entity of the target entity, specifically:
  • the candidate entities in the candidate entity set are subjected to deduplication processing to reserve candidate entities with the smallest number of edges among the repeated candidate entities;
  • the candidate entity included in the de-reprocessed candidate entity set is used as the related entity of the target entity.
  • FIG. 11 is another structural block diagram of the related entity determining apparatus provided by the embodiment of the present application.
  • the related entity determining apparatus may further include:
  • the recommendation ranking determining module 400 is configured to determine a recommended ranking of each related entity to perform recommendation of the related entity according to the recommended ranking of each related entity.
  • the recommended ranking determining module 400 is configured to determine a recommended ranking of each related entity, specifically:
  • the correlation degree score corresponding to the related entity is added to the weight score, and the ranking score corresponding to each related entity is obtained;
  • the recommended ranking of each related entity is determined, wherein the higher the ranking score, the higher the recommended ranking.
  • the recommended ranking determination module 400 is configured to determine a weight score corresponding to each related entity in the target knowledge map, and specifically includes:
  • the number of edges of the number of edges corresponding to the related entity is weighted, and combined with the corresponding relationship weight, the weight score corresponding to each related entity is obtained; wherein the larger the number of edges, the smaller the weight of the edge number.
  • the recommended ranking determination module 400 is configured to count the correlation degree scores of each related entity and the target entity in the open text, including:
  • the recommended ranking determination module 400 is configured to determine the recommended ranking of each related entity, including:
  • the relevant degree scores of each related entity and the target entity are counted in the open text, and the recommended ranking of each related entity is determined according to the degree of relevance of each related entity and the target entity, wherein the higher the relevance score, the higher the recommended ranking .
  • the recommended ranking determination module 400 is configured to determine a recommended ranking of each related entity, including:
  • the related entity determining apparatus may improve the recall rate of the determined entity entity of the determined target entity, and the recommended related entity has a higher precision ordering, which may improve the probability that the related information of the target entity is utilized by the user. .
  • the embodiment of the present invention further provides a computing device, where the computing device may include the related entity determining device described above.
  • FIG. 12 is a block diagram showing the hardware structure of the computing device.
  • the computing device may include: a processor 1, a communication interface 2, a memory 3, and a communication bus 4;
  • the processor 1, the communication interface 2, and the memory 3 complete communication with each other through the communication bus 4;
  • the communication interface 2 can be an interface of the communication module, such as an interface of the GSM module;
  • the processor 1 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention.
  • CPU central processing unit
  • ASIC Application Specific Integrated Circuit
  • the memory 3 may include a high speed RAM memory and may also include a non-volatile memory such as at least one disk memory.
  • the processor 1 is specifically configured to:
  • a storage medium is further provided.
  • the foregoing storage medium may be located in at least one of a plurality of network devices in a network.
  • the storage medium is arranged to store a computer program for performing the following steps:
  • determining a set of candidate entities of the target entity in the target knowledge map includes: candidate entities corresponding to the number of sides of the target entity;
  • the storage medium is arranged to store a computer program for performing the following steps:
  • the input text is obtained, and the input text is pre-specified with a plurality of named entities, and the named entity includes at least the target entity;
  • the storage medium is arranged to store a computer program for performing the following steps:
  • S2 Determine, according to the number of sides included in the range of the edge number, candidate entities corresponding to the number of sides of the target entity in the target knowledge map, and obtain a set of candidate entities of the target entity.
  • the storage medium is arranged to store a computer program for performing the following steps:
  • the candidate entities in the candidate entity set are subjected to deduplication processing to reserve candidate entities with the smallest number of edges among the repeated candidate entities;
  • a candidate entity included in the de-reprocessed candidate entity set is used as a related entity of the target entity.
  • the storage medium is arranged to store a computer program for performing the following steps:
  • the storage medium is arranged to store a computer program for performing the following steps:
  • S4 Determine, according to the ranking score corresponding to each related entity, a recommended ranking of each related entity, wherein the higher the ranking score, the higher the recommended ranking.
  • the storage medium is arranged to store a computer program for performing the following steps:
  • the storage medium is arranged to store a computer program for performing the following steps:
  • the edge number of the number of edges corresponding to the related entity is weighted, and combined with the corresponding relationship weight, to obtain a weight score corresponding to each related entity; wherein, the larger the number of edges, the smaller the weight of the edge number.
  • the storage medium is arranged to store a computer program for performing the following steps:
  • S1 for a related entity, determining a first ratio of the number of texts of the related entity and the target entity to the total number of texts, a second ratio of the number of texts of the related entity to the total number of texts, and the number of texts of the target entity a third ratio to the total number of texts;
  • the foregoing storage medium may include, but not limited to, a USB flash drive, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, and a magnetic memory.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • the specific examples in this embodiment may refer to the examples described in Embodiment 1 and Embodiment 2, and details are not described herein again.
  • the steps of a method or algorithm described in connection with the embodiments disclosed herein can be implemented directly in hardware, a software module executed by a processor, or a combination of both.
  • the software module can be placed in random access memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or technical field. Any other form of storage medium known.
  • the target knowledge map has at least a target entity by acquiring a target knowledge map; determining a candidate entity set of the target entity in the target knowledge map; the candidate entity set includes: corresponding to each side of the reachable target entity a candidate entity; determining a related entity of the target entity according to the set of candidate entities.
  • the embodiment of the present application adopts a target knowledge map having at least a target entity, and mines a set of candidate entities in the target knowledge map that can reach the target entity, and then determines related entities of the target entity according to the set of candidate entities, because the target knowledge map is The related information of the target entity is more comprehensive, so it can mine the relevant comprehensive information of the target entity history with great probability, so that the related entity results of the mined target entity are more comprehensive, and the related target entity is improved.
  • the recall rate of entity results results.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例提供一种相关实体确定方法、装置、计算设备及存储介质,该方法包括:获取目标知识图谱,目标知识图谱至少具有目标实体;确定目标知识图谱中,目标实体的候选实体集合;候选实体集合包括:可触达目标实体的各边数对应的候选实体;根据候选实体集合,确定目标实体的相关实体。本申请实施例可以提升相关实体确定结果的召回率。

Description

一种相关实体确定方法、装置、计算设备及存储介质
本申请要求于2017年03月02日提交中国专利局、优先权号为2017101208369、申请名称为“一种相关实体确定方法、装置及计算设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,具体涉及一种相关实体确定方法、装置、计算设备及存储介质。
背景技术
相关实体可以认为是在同一查询中与查询到的目标实体共现的其他实体,对于用户获取查询到的目标实体的相关信息具有重要意义;比如用户在输入查询语句后,搜索引擎除了将搜索到的与该查询语句对应的目标实体(比如网页链接)展现给用户外,还会将查询过程中与该目标实体共现的相关实体推荐给用户,以引导用户进行再次搜索,提升用户获得相关信息的便利性;一种典型的场景是,搜索引擎在搜索到与查询语句对应的目标实体后,除在搜索结果页面显示所搜索到的目标实体,还可在搜索结果页面的设定区域(比如左侧区域)显示所推荐的相关实体,以便用户再次搜索。
本申请的申请人发现,目前主要是通过开放文本(比如新闻文本)来统计与一目标实体共现的其他实体,以确定一目标实体的相关实体;然而,开放文本录入的内容具有一定的局限性和时效性,这使得通过开放文本统计的相关实体确定结果不可控,导致相关实体确定结果的召回率较低(召回率表示确定的相关实体数量和相关实体总数量的比值,是确定结果全面性的一种体现)。
发明内容
有鉴于此,本申请实施例提供一种相关实体确定方法、装置、计算设备及存储介质,以提升相关实体确定结果的召回率。
为实现上述目的,本申请实施例提供如下技术方案:
一种相关实体确定方法,包括:
获取目标知识图谱,上述目标知识图谱至少具有目标实体;
确定上述目标知识图谱中,上述目标实体的候选实体集合;上述候选实体集合包括:可触达上述目标实体的各边数对应的候选实体;
根据上述候选实体集合,确定上述目标实体的相关实体。
本申请实施例还提供一种相关实体确定装置,包括:
目标知识图谱获取模块,设置为获取目标知识图谱,上述目标知识图谱至少具有目标实体;
候选实体集合确定模块,设置为确定上述目标知识图谱中,上述目标实体的候选实体集合;上述候选实体集合包括:可触达上述目标实体的各边数对应的候选实体;
相关实体确定模块,设置为根据上述候选实体集合,确定上述目标实体的相关实体。
本申请实施例还提供一种计算设备,包括上述上述的相关实体确定装置。
基于上述技术方案,本申请实施例提供的相关实体确定方法包括:获取目标知识图谱,上述目标知识图谱至少具有目标实体;确定上述目标知识图谱中,上述目标实体的候选实体集合;上述候选实体集合包括:可触达上述目标实体的各边数对应的候选实体;根据上述候选实体集合,确定上述目标实体的相关实体。可以看出,本申请实施例采用至少具有目标实 体的目标知识图谱,挖掘目标知识图谱中可触达上述目标实体的候选实体集合,进而根据上述候选实体集合,确定上述目标实体的相关实体,由于目标知识图谱收录的目标实体的相关信息更为全面,因此可以极大概率的挖掘到目标实体历史以往全面的相关信息,使得所挖掘出的目标实体的相关实体结果较为全面,提升所确定的目标实体的相关实体结果的召回率。
附图说明
为了更清楚地说明本申请实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请实施例提供的相关实体确定方法的流程图;
图2为本申请实施例提供的获取目标知识图谱的方法流程图;
图3为本申请实施例提供的相关实体确定方法的另一流程图;
图4为目标知识图谱中实体间关系的示意图;
图5为根据候选实体集合,确定目标实体的相关实体的方法流程图;
图6为本申请实施例提供的相关实体确定方法的再一流程图;
图7为本申请实施例提供的确定相关实体的推荐排序的方法流程图;
图8为本申请实施例提供的确定相关实体的推荐排序的另一方法流程图;
图9为本申请实施例提供的确定相关实体的推荐排序的再一方法流程图;
图10为本申请实施例提供的相关实体确定装置的结构框图;
图11为本申请实施例提供的相关实体确定装置的另一结构框图;
图12为本申请实施例提供的计算设备的硬件结构框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
图1为本申请实施例提供的相关实体确定方法的流程图,该方法可应用于具有数据运算能力的计算设备,通过该计算设备执行图1所示方法对应的程序,可实现相关实体的确定;该计算设备可以选用网络侧的服务器,也可以选用用户侧的电脑等电子设备;
参照图1,本申请实施例提供的相关实体确定方法可以包括:
步骤S100、获取目标知识图谱,目标知识图谱至少具有目标实体。
目标实体是本申请实施例待确定出相关实体的实体,本申请实施例可指定需要确定出相关实体的目标实体,且目标知识图谱中具有该目标实体。
知识图谱旨在描述真实世界中存在的各种实体或概念;每个实体或概念可用一个全局唯一确定的ID(身份标识号)来标识,每个属性-值对(attribute-value pair)可用来刻画实体的内在特性,而关系(relation)用来连接两个实体,刻画它们之间的关联;因此,知识图谱主要由节点以及连接节点之间的边构成,其中,一个节点可以表示一个实体或概念,连接节点的边则可由所连接节点间的属性或关系构成;
本申请实施例中,知识图谱的数据源可通过收集来自百科类站点和各种垂直站点的结构化数据,以覆盖大部分常识性知识实现,这些数据普遍质量较高,但更新比较慢;而另一方面,知识图谱的数据来源也可通过从各种半结构化数据(形如HTML表格)抽取相关实体的属性-值实现,以此丰富实体的描述;此外,通过搜索日志(query log)发现新的实体或新的实体属性,也可不断扩展知识图谱的覆盖率;
在一种可能的实现中,本申请实施例可通过包含目标实体的数据源构建出目标知识图谱。
为提升后续相关实体确定结果的全面性,本申请实施例也可通过包含目标实体的数据源所构建的知识图谱,理解包含目标实体的输入文本的含义,使得目标实体的相关信息的理解更为全面;在实现上,本申请实施例可获取包含目标实体的输入文本,在通过包含目标实体的数据源构建出知识图谱后,将输入文本中给定的命名实体,映射到所构建的知识图谱的目标实体上,得到目标知识图谱。
步骤S110、确定目标知识图谱中,目标实体的候选实体集合;候选实体集合包括:可触达目标实体的各边数对应的候选实体。
目标知识图谱中,实体可以认为是一个节点,实体之间可通过边连接;目标实体可能通过一条边触达一候选实体,也可能通过多条边触达一候选实体,本申请实施例可从目标实体出发,确定目标实体通过一条边触达的实体,得到边数一对应的候选实体,从目标实体出发,确定目标实体通过两条边触达的实体,得到边数二对应的候选实体,以此类推,得到各边数对应的候选实体。
可选的,在一种实现上,本申请实施例可设定边数范围,该边数范围可以包括多个边数,则对于边数范围中的各边数,本申请实施例可确定从目标实体出发以相应边数触达的候选实体,得到可触达目标实体的各边数对应的候选实体;
比如,设定边数范围包括边数一至边数三,则对于边数一,本申请实施例可确定以一条边触达目标实体的候选实体,得到边数一对应的候选实体,对于边数二,本申请实施例可确定以二条边触达目标实体的候选实体,得到边数二对应的候选实体,对于边数三,本申请实施例可确定以三条边触达目标实体的候选实体,得到边数三对应的候选实体,从而得到该边数范围中各边数对应的候选实体。
需要说明的是,设定边数范围仅是确定可触达目标实体的各边数对应的候选实体的可选方式,本申请实施例也可确定目标知识图谱中,其他实体触达目标实体所涉及的所有边数,从而以此确定可触达目标实体的各边数对应的候选实体。
步骤S120、根据候选实体集合,确定目标实体的相关实体。
可选的,一方面,本申请实施例可将所确定的候选实体集合,作为目标实体的相关实体。
可选的,另一方面,候选实体集合中,可能存在对应不同边数的重复的候选实体,本申请实施例可对候选实体集合中重复的候选实体进行去重处理,以保留重复的候选实体中边数最小的候选实体,从而将去重处理后的候选实体集合所包括的候选实体,作为目标实体的相关实体。
本申请实施例采用至少具有目标实体的目标知识图谱,挖掘目标知识图谱中可触达目标实体的候选实体集合,进而根据候选实体集合,确定目标实体的相关实体,由于目标知识图谱收录的目标实体的相关信息更为全面,因此可以极大概率的挖掘到目标实体历史以往全面的相关信息,使得所挖掘出的目标实体的相关实体结果较为全面,提升所确定的目标实体的相关实体结果的召回率。
上文介绍了在获取目标知识图谱时,可获取包含目标实体的数据源,通过包含目标实体的数据源构建目标知识图谱,这种目标知识图谱的获取方式较为简便,且收录的目标实体的相关信息较为全面,可实现最终挖掘出的目标实体的相关实体结果具有较高的召回率。
而在另一种实现方式中,本申请实施例可通过数据源(包含目标实体)构建的知识图谱,理解包含目标实体的输入文本的含义,进而获取到目标知识图谱,使得目标知识图谱中关于目标实体的相关信息的理解更为全面;相应的,图2示出了本申请实施例提供的获取目标知识图谱的一种可选实现方法流程,参照图2,该方法可以包括:
步骤S200、获取输入文本,输入文本中预先给定有多个命名实体,命名实体至少包括目标实体。
可选的,输入文本可以是开放文本的一种,该输入文本至少记录有目标实体,同时也可能记录有其他实体;本申请实施例可在输入文本中预先给定有至少包括目标实体的命名实体,命名实体可以认为是在输入文本中给定的人名、机构名、地名以及其他以名称为标识的实体。
步骤S210、将输入文本中给定的命名实体,映射到知识图谱的目标实体上,得到目标知识图谱;知识图谱由包含目标实体的数据源构建。
在获取输入文本,并确定由包含目标实体的数据源构建的知识图谱后,本申请实施例可将输入文本中给定的命名实体,映射到该知识图谱的目标实体上,得到目标知识图谱;
将输入文本中给定的命名实体,映射到知识图谱的目标实体上,可以认识是一个将输入文本中给定的命名实体,链接到知识图谱中无歧义的目标实体的过程,这个过程可以包括同义实体的合并、歧义实体的消歧等处理;
可选的,在具体实现上,本申请实施例可以使用named entity linking(命名实体链接)技术将输入文本中给定的命名实体,映射到知识图谱的目标实体上,从而将输入文本中给定的命名实体链接到知识图谱中无歧义的目标实体上;命名实体链接技术主要可提升在线推荐系统、互联网搜索引擎等系统的信息过滤能力。
可选的,本申请实施例可设定目标知识图谱中挖掘目标实体的候选实体所使用的边数范围,以在获取到目标知识图谱后,通过该边数范围所对应的各边数,挖掘目标知识图谱中可触达目标实体的各边数对应的候选实体;可选的,图3示出了本申请实施例提供的相关实体确定方法的另一流程图,参照图3,该方法可以包括:
步骤S300、获取目标知识图谱,目标知识图谱至少具有目标实体。
可选的,步骤S300可以通过图2所示方法实现,也可以通过包含目标实体的数据源构建出目标知识图谱。
步骤S310、获取预设定的边数范围,边数范围包括多个边数。
步骤S320、根据边数范围所包括的各边数,确定目标知识图谱中,可触达目标实体的各边数对应的候选实体,得到目标实体的候选实体集合。
可选的,在设定边数范围后,对于边数范围中的各边数,本申请实施例可确定从目标实体出发以相应边数触达的候选实体,从而确定出可触达目标实体的各边数对应的候选实体,得到目标实体的候选实体集合。
为便于理解,如图4所示,在目标知识图谱中,“小明”为目标实体,以目标实体“小明”出发通过多种关系可延伸触达多个候选实体,且连接的实体之间具有一定的关系;
以设定边数范围包括边数一至三为例,如图4所示,从目标实体“小明”出发,边数一对应的候选实体包括:“小红”,“小强”和“电影A”;从目标实体“小明”出发,边数二对应的候选实体包括:“小强”和“小容”;从目标实体“小明”出发,边数三对应的候选实体包括:“小容”;据此,可确定出可触达目标实体“小明”的各边数对应的候选实体,得到目标知识图谱中,可触达目标实体的候选实体集合;该候选实体集合可以具体包括:
边数一:“小红”,“小强”和“电影A”;
边数二:“小强”和“小容”;
边数三:“小容”。
需要说明的是,设定边数范围可以不限于上述描述的边数一至三,而是可根据实际情况设定边数范围所包括的边数。
步骤S330、根据候选实体集合,确定目标实体的相关实体。
可选的,在得到候选实体集合后,本申请实施例可直接将候选实体集 合中包括的候选实体,作为目标实体的相关实体,实现目标实体的相关实体的确定。
可选的,另一方面,候选实体集合中可能存在对应不同边数的重复的候选实体,即一候选实体可能存在于不同的边数中,如一候选实体可能存在于边数一对应的候选实体中,也可能存在于边数二对应的候选实体中;这种情况下,一候选实体与目标实际的关系可能是多种;基于此,为提升挖掘出的相关实体与目标实体的关系精度,本申请实施例可对候选实体集合中重复的候选实体进行去重处理,以保留重复的候选实体中边数最小的候选实体;
可选的,图5示出了根据候选实体集合,确定目标实体的相关实体的可选方法流程,参照图5,该方法可以包括:
步骤S400、若候选实体集合中,存在对应不同边数的重复的候选实体,将候选实体集合中重复的候选实体进行去重处理,以保留重复的候选实体中边数最小的候选实体。
在确定目标实体的候选实体集合后,本申请实施例可以确定出目标实体的各边数对应的候选实体,若其中存在重复的候选实体且重复的候选实体对应不同的边数,则按照保留边数最小的候选实体的原则,对候选实体集合中重复的候选实体进行去重处理,从而得到去重处理后的候选实体集合;
以边数一对应的候选实体与边数二对应的候选实体,存在重复的候选实体为例,则可将边数一与边数二对应的候选实体中,位于边数二的重复的候选实体进行去除,使得边数一与边数二对应的候选实体不同,实现对候选实体集合中重复的候选实体进行去重;
以图4所示为例,候选实体集合包括:
边数一:“小红”,“小强”和“电影A”;
边数二:“小强”和“小容”;
边数三:“小容”;
可以看出,边数一与边数二对应的候选实体中包括“小强”这一重复的候选实体,边数二和边数三对应的候选实体中包括“小容”这一重复的候选实体,则按照保留边数最小的候选实体的原则,对候选实体集合中重复的候选实体进行去重处理,可去除边数二中的候选实体“小强”,保留边数一中的候选实体“小强”,去除边数三中的候选实体“小容”,保留边数二中的候选实体“小容”,实现对候选实体集合中重复的候选实体的去重处理,得到如下去重处理后的候选实体集合:
边数一:“小红”,“小强”和“电影A”;
边数二:“小容”。
步骤S410、将去重处理后的候选实体集合所包括的候选实体,作为目标实体的相关实体。
去重处理后的候选实体集合包括:可触达目标实体的各边数对应的不重复的候选实体。
本申请实施例采用至少具有目标实体的目标知识图谱,挖掘目标知识图谱中可触达目标实体的候选实体集合,由于目标知识图谱收录的目标实体的相关信息更为全面,因此可以极大概率的挖掘到目标实体历史以往全面的相关信息,使得所挖掘出的目标实体的候选实体集合较为全面;进而再通过对候选实体集合中对应不同边数的重复的候选实体进行去重处理,得到目标实体的相关实体,可提升挖掘出的相关实体与目标实体的关系精度,最终得到召回率较高,且与目标实体的关系精度较高的相关实体确定结果。
可选的,图6示出了本申请实施例提供的相关实体确定方法的再一流程图,参照图6,该方法可以包括:
步骤S500、获取输入文本,输入文本中预先给定有多个命名实体,命名实体至少包括目标实体。
步骤S510、将输入文本中给定的命名实体,映射到知识图谱的目标实体上,得到目标知识图谱;知识图谱由包含目标实体的数据源构建。
步骤S520、获取预设定的边数范围,边数范围包括多个边数。
步骤S530、根据边数范围所包括的各边数,确定目标知识图谱中,可触达目标实体的各边数对应的候选实体,得到目标实体的候选实体集合。
步骤S540、若候选实体集合中,存在对应不同边数的重复的候选实体,将候选实体集合中重复的候选实体进行去重处理,以保留重复的候选实体中边数最小的候选实体。
步骤S550、将去重处理后的候选实体集合所包括的候选实体,作为目标实体的相关实体。
在一个可能的实施场景中,通过本申请实施例提供的相关实体确定方法,可确定电影明星“小明”的相关实体,即以电影明星“小明”为目标实体,本申请实施例确定其相关实体的实施过程可以如下:
服务器可从百科类站点、各种垂直站点的结构化数据,以及各种半结构化数据和搜索日志中抓取包含目标实体“小明”的数据源;
服务器通过包含目标实体“小明”的数据源构建知识图谱;具体构建时,可将数据源中的各实体作为节点,实体之间的关系作为边,通过实体之间的关系,以相应边连接各实体;
服务器获取包含目标实体“小明”的输入文本,该输入文本中除具有目标实体“小明”外,还可以记录有其他的实体;具体的,该输入文本中预先给定有多个命名实体,这些命名实体中至少包括目标实体“小明”;
服务器通过named entity linking技术,将输入文本中给定的命名实体,映射到知识图谱的目标实体上,得到目标知识图谱;具体可通过named entity linking技术,将输入文本中给定的命名实体,与目标知识图谱中的实体进行同义实体的合并、歧义实体的消歧等处理;
服务器调取预先设定的边数范围,确定边数范围中的各边数所对应的目标实体的候选实体,得到目标实体的候选实体集合;即对于边数范围中的各边数,服务器可确定在目标知识图谱中,从目标实体出发以相应边数触达的候选实体,得到候选实体集合;如图4所示,以边数范围包括边数一至边数三,则服务器可分别确定通过一条边、二条边和三条边触达目标实体的候选实体,确定出可触达目标实体的各边数对应的候选实体,得到候选实体集合;
如果候选实体集合中,存在对应不同边数的重复的候选实体,则服务器可将候选实体集合中重复的候选实体进行去重处理,从而对于在不同边数上重复的候选实体,仅保留其中的边数最小的候选实体;进而,服务器可将去重处理后的候选实体集合所包括的候选实体,作为目标实体的相关实体;
如果候选实体集合中不存在重复的候选实体,则可将候选实体集合所包括的候选实体,作为目标实体的相关实体。
通过上文描述的方案确定出目标实体的相关实体后,本申请实施例可在搜索推荐等需要推荐相关实体的场景下,对目标实体的相关实体进行推荐;如当用户搜索到目标实体时,可推荐目标实体的相关实体的搜索入口,以引导用户进行再次搜索,提升用户获得目标实体的相关信息的便利性;相应的,本申请实施例可以确定各相关实体的推荐排序,以根据各相关实体的推荐排序进行相关实体的推荐,下面将对此进行描述。
一种较为简单的推荐排序方式是随机的定义相关实体的推荐排序,以随机定义的推荐排序进行相应相关实体的推荐;这种方式虽然较为简便,但推荐排序的精度可能较低,在一些搜索推荐场景中并不适用,基于此,作为可选方案,本申请实施例提供至少如下三种相关实体的推荐排序确定方案。
一、在开放文本上计算相关实体与目标实体的相关程度分数,以相关程度分数确定相关实体的推荐排序,且相关程度越高,推荐排序越靠前;可选实现过程可如图7所示;
图7为本申请实施例提供的确定相关实体的推荐排序的方法流程,参照图7,该方法可以包括:
步骤S600、在开放文本中统计各相关实体与目标实体的相关程度分数。
在开放文本中统计相关实体与目标实体的相关程度分数是,本申请实施例离线在开放文本上计算同现语义网络的一种应用,一般认为如果2个实体(如目标实体与一相关实体)频繁出现在同一个句子、篇章中,那么就认为这2个实体是强相关的。
相关实体与目标实体的相关程度分数,可采用相关实体与目标实体的互信息衡量,互信息(Mutual Information)是信息论里一种有用的信息度量,它可以看成是一个随机变量中包含的关于另一个随机变量的信息量,或者说是一个随机变量由于已知另一个随机变量而减少的不肯定性;
对于一相关实体与目标实体的相关程度分数确定而言,本申请实施例可确定该相关实体与目标实体的互信息,以该互信息确定该相关实体与目标实体的相关程度分数;具体实现上,本申请实施例可确定同时出现该相关实体和目标实体的文本数量与文本总数量的第一比值,确定出现该相关实体的文本数量与文本总数量的第二比值,确定出现目标实体的文本数量与文本总数量的第三比值,从而根据该第一比值,第二比值和第三比值,确定该相关实体与目标实体的互信息,以所确定的互信息表示该相关实体与目标实体的相关程度分数;
在具体计算上,可以采用如下公式实现:
Figure PCTCN2018077416-appb-000001
其中,大X可以认为是一个集合,小x理解为在集合中得到的具体数据,大Y和小y的定义相似;p(x,y)表示同时出现实体x和y的文本的数量与文本总数量的比值,p(x)表示出现x的文本的数量与文本总数量的比值,p(y)表示出现y的文本的数量与文本总数量的比值。
步骤S610、根据各相关实体与目标实体的相关程度分数,确定各相关实体的推荐排序,其中,相关程分数度越高,推荐排序越靠前。
二、根据目标知识图谱中的关系权重,确定各相关实体在目标知识图谱中对应的权重分数,以相关实体的权重分数确定推荐排序;
其中,一些相关实体之间的关系由于很固定(成为常识),所以在开放文本里提及的概率较小,比如被人们公知的电影明星夫妻等;但是这些与目标实体的关系很固定的相关实体,与目标实体的相关程度又非常高,在相关实体推荐时又应该被推荐出来,这是通过前文的第一种通过同现语义网络的方式难以达到的;因此通过知识图谱,对实体之间重要的设定关系设置较大的权重,使得与目标实体的关系较为重要,但较少在开放文本中提及的相关实体能够被推荐出来;
可选的,图8示出了本申请实施例提供的确定相关实体的推荐排序的另一方法流程,参照图8,该方法可以包括:
步骤S700、以去重处理后的候选实体集合以及目标实体为范围,确定各相关实体可触达的最近实体。
以目标实体对应的相关实体(去重处理后的候选实体集合)以及目标实体本身为范围,本申请实施例需要确定各相关实体在该范围内可触达的最近实体;
可选的,一相关实体在目标知识图谱中可触达的最近实体,可能是目标实体(该相关实体与目标实体对应的边数为一),也可能是其他的相关实体(如该相关实体与目标实体对应的边数大于一,需要通过其他更为靠近目标实体的相关实体,过渡到目标实体);
如图4所示,对候选实体集合进行去除处理后,目标实体的相关实体包括:
边数一:“小红”,“小强”和“电影A”;
边数二:“小容”。
其中,相关实体“小红”,“小强”和“电影A”可直接触达目标实体“小明”,因此可触达的最近实体为目标实体,
而相关实体“小容”需要通过相关实体“小强”触达目标实体,因此相关实体“小容”可触达的最近实体为“小强”。
步骤S710、根据预设定的目标知识图谱中各关系相应的关系权重,确定各相关实体与可触达的最近实体的关系对应的关系权重,得到各相关实体对应的关系权重。
可选的,本申请实施例可利用经验知识,对知识图谱中实体间的不同关系设置针对性的关系权重,使得关系越重要的实体具有越高的关系权重;
例如对于科技领域实体的推荐,可将相应知识图谱中实体所在职位、所属公司、公司股东等设置较大的关系权重;对于体育领域的实体推荐,可将相应知识图谱中实体所在球队、队友等设置较大的关系权重;
在预先设定目标知识图谱中各关系相应的关系权重后,对于目标实体的各相关实体,本申请实施例可根据相关实体与可触达的最近实体的关系,确定各相关实体对应的关系权重;
如上文举例说明,相关实体“小红”,“小强”和“电影A”可直接触达目标实体“小明”,因此可触达的最近实体为目标实体,则相关实体“小红”对应的关系权重为“小红”与“小明”的关系对应的关系权重,相关实体“小强”对应的关系权重为“小强”与“小明”的关系对应的关系权重,相关实体“电影A”对应的关系权重为“电影A”与“小明”的关系对应的关系权重;
而相关实体“小容”需要通过相关实体“小强”触达目标实体,因此相关实体“小容”可触达的最近实体为“小强”,则相关实体“小容”对应的关系权重为“小容”与“小强”的关系对应的关系权重;
即对于一相关实体,本申请实施例可确定该相关实体可触达的最近实体,通过该相关实体与可触达的最近实体的关系对应的关系权重,确定该相关实体对应的关系权重。
步骤S720、对于各相关实体,将相关实体对应的边数的边数权重,与对应的关系权重相结合,得到各相关实体对应的权重分数;其中,边数越大,边数权重越小。
在确定各相关实体对应的关系权重后,本申请实施例可结合各相关实体与目标实体对应的边数的边数权重,确定出各相关实体的权重分数;一般认为,边数越大,边数权重越小,这是为了将与目标实体的边数较远的相关实体进行降权,可使得不应该扩展的一些相关实体能够被去掉;
如一相关实体与目标实体对应的边数为一(如去重处理后的候选实体集合中,或者候选实体集合中,该相关实体通过一条边触达目标实体),则认为该相关实体对应的边数的边数权重为一,如一相关实体与目标实体对应的边数大于一,则需对该相关实体的边数权重进行降权,使得该相关实体的边数权重小于1;
可选的,本申请实施例可设置相邻边数中,小边数的边数权重为大边数的边数权重的一倍,如可设置相关实体对应的边数为一,则对应的边数权重为一,相关实体对应的边数为二,则对应的边数权重为1/2=0.5,相关实体对应的边数为三,则对应的边数权重为0.5/2=0.25,以此类推。
在确定各边数的边数权重后,对于一相关实体,本申请实施例可将该相关实体对应的边数的边数权重,与对应的关系权重相乘,得到该相关实体对应的权重分数,对于各相关实体以此进行处理,则可得到各相关实体对应的权重分数。
如图4所示,对候选实体集合进行去除处理后,目标实体的相关实体包括:
边数一:“小红”,“小强”和“电影A”;
边数二:“小容”。
举例来说,“小红”与可触达的最近实体“小明”的关系为妻子,可设置对应的关系权重为1;“小强”与可触达的最近实体“小明”的关系为搭档,可设置对应的关系权重为0.5;“电影A”与可触达的最近实体“小明”的关系为主演,可设置对应的关系权重为0.7;“小容”与可触达的最近实体“小强”的关系为妻子,可设置对应的关系权重为1;
而“小红”,“小强”和“电影A”对应的边数均为一,可设置对应的边数权重均为1,“小容”对应的边数为二,可设置对应的边数权重为0.5;
相应的,相关实体“小红”的权重分数的计算为:关系权重乘以边数权重,即1*1=1;相关实体“小强”的权重分数的计算为:关系权重乘以边数权重,即0.5*1=0.5;相关实体“电影A”的权重分数的计算为:关系权重乘以边数权重,即0.7*1=0.7;相关实体“小容”的权重分数的计算为:关系权重乘以边数权重,即1*0.5=0.5;
相应的,各相关实体的权重分数示意可如下表1所示
相关实体 权重分数
小红 1
小强 0.5
电影A 0.7
小容 0.5
表1
可以看出,虽然小容具有较高的关系权重,但由于与目标实体的边数较远,因此整体的权重分数被降权。
步骤S730、根据各相关实体对应的权重分数,确定各相关实体的推荐排序,其中,权重分数越高,推荐排序越靠前。
本申请实施例通过目标知识图谱本身关系权重的约束,可以将一些与目标实体的关系较为重要,但由于公知而极少提及的相关实体的推荐排序进行提升,使得推荐出的相关实体的排序具有较高的精度。
三、将上述的推荐排序确定方案一与二相结合,即通过方案一在开放文本上计算相关实体与目标实体的相关程度分数,通过方案二根据目标知识图谱中的关系权重,确定相关实体的权重分数后,将同一相关实体的相关程度分数与权重分数相加,得到该相关实体的排序分数,以相关实体的排序分数,确定相关实体的推荐排序;
可选的,图9示出了本申请实施例提供的确定相关实体的推荐排序的再一方法流程,参照图9,该方法可以包括:
步骤S800、在开放文本中统计各相关实体与目标实体的相关程度分数。
步骤S810、以去重处理后的候选实体集合以及目标实体为范围,确定各相关实体可触达的最近实体。
步骤S820、根据预设定的目标知识图谱中各关系相应的关系权重,确定各相关实体与可触达的最近实体的关系对应的关系权重,得到各相关实体对应的关系权重。
步骤S830、对于各相关实体,将相关实体对应的边数的边数权重,与对应的关系权重相结合,得到各相关实体对应的权重分数;其中,边数越大,边数权重越小。
步骤S840、对于各相关实体,将相关实体对应的相关程度分数与权 重分数相加,得到各相关实体对应的排序分数。
步骤S850、根据各相关实体对应的排序分数,确定各相关实体的推荐排序,其中,排序分数越高,推荐排序越靠前。
图9所示可以认为是图7和图9方案的结合,对于每一个相关实体,本申请实施例可确定该相关实体与目标实体的相关程度分数,以及该相关实体对应的权重分数,从而该相关实体的相关程度分数与权重分数相加,得到该相关实体对应的排序分数,以此进行相关实体的推荐排序。
本申请实施例基于包含目标实体的目标知识图谱,实现目标实体的相关实体的挖掘,由于目标知识图谱收录的目标实体的相关信息更为全面,因此可以极大概率的挖掘到目标实体历史以往全面的相关信息,使得所挖掘出的目标实体的相关实体结果较为全面,提升所确定的目标实体的相关实体结果的召回率;
根据同现语义网络,和/或,目标知识图谱本身关系权重确定所挖掘的相关实体的推荐排序,可使得在进行相关实体的推荐时,所推荐的相关实体具有较高精度的排序,提升目标实体的相关信息被用户利用的概率,并提升目标实体的相关信息获取的便利性。
下面对本申请实施例提供的相关实体确定装置进行介绍,下文描述的相关实体确定装置可与上文描述的相关实体确定方法相互对应参照。下文描述的相关实体确定装置可以认为是计算设备为实现本申请实施例提供的相关实体确定方法,所需设置的功能模块架构。
图10为本申请实施例提供的相关实体确定装置的结构框图,该装置可应用于具有数据运算能力的计算设备,该计算设备可以选用网络侧的服务器,也可以选用用户侧的电脑等电子设备;
参照图10,本申请实施例提供的相关实体确定装置可以包括:
目标知识图谱获取模块100,设置为获取目标知识图谱,目标知识图谱至少具有目标实体;
候选实体集合确定模块200,设置为确定目标知识图谱中,目标实体的候选实体集合;候选实体集合包括:可触达目标实体的各边数对应的候选实体;
相关实体确定模块300,设置为根据候选实体集合,确定目标实体的相关实体。
可选的,目标知识图谱获取模块100,设置为获取目标知识图谱,具体包括:
获取输入文本,输入文本中预先给定有多个命名实体,命名实体至少包括目标实体;
将输入文本中给定的命名实体,映射到知识图谱的目标实体上,得到目标知识图谱;知识图谱由包含目标实体的数据源构建。
可选的,候选实体集合确定模块200,设置为确定目标知识图谱中,目标实体的候选实体集合,具体包括:
获取预设定的边数范围,边数范围包括多个边数;
根据边数范围所包括的各边数,确定目标知识图谱中,可触达目标实体的各边数对应的候选实体,得到目标实体的候选实体集合。
可选的,相关实体确定模块300,设置为根据候选实体集合,确定目标实体的相关实体,具体包括:
若候选实体集合中,存在对应不同边数的重复的候选实体,将候选实体集合中重复的候选实体进行去重处理,以保留重复的候选实体中边数最小的候选实体;
将去重处理后的候选实体集合所包括的候选实体,作为目标实体的相关实体。
可选的,图11示出了本申请实施例提供的相关实体确定装置的另一结构框图,结合图10和图11所示,该相关实体确定装置还可以包括:
推荐排序确定模块400,设置为确定各相关实体的推荐排序,以根据各相关实体的推荐排序进行相关实体的推荐。
可选的,一方面,推荐排序确定模块400,设置为确定各相关实体的推荐排序,具体包括:
在开放文本中统计各相关实体与目标实体的相关程度分数;
确定各相关实体在目标知识图谱中对应的权重分数;
对于各相关实体,将相关实体对应的相关程度分数与权重分数相加,得到各相关实体对应的排序分数;
根据各相关实体对应的排序分数,确定各相关实体的推荐排序,其中,排序分数越高,推荐排序越靠前。
推荐排序确定模块400,设置为确定各相关实体在目标知识图谱中对应的权重分数,具体包括:
以去重处理后的候选实体集合以及目标实体为范围,确定各相关实体可触达的最近实体;
根据预设定的目标知识图谱中各关系相应的关系权重,确定各相关实体与可触达的最近实体的关系对应的关系权重,得到各相关实体对应的关系权重;
对于各相关实体,将相关实体对应的边数的边数权重,与对应的关系权重相结合,得到各相关实体对应的权重分数;其中,边数越大,边数权重越小。
而推荐排序确定模块400,设置为在开放文本中统计各相关实体与目标实体的相关程度分数,具体包括:
对于一相关实体,确定同时出现该相关实体和目标实体的文本数量与文本总数量的第一比值,出现该相关实体的文本数量与文本总数量的第二比值,出现目标实体的文本数量与文本总数量的第三比值;
根据第一比值,第二比值和第三比值,确定该相关实体与目标实体的互信息,以所确定的互信息表示该相关实体与目标实体的相关程度分数。
另一方面,推荐排序确定模块400,设置为确定各相关实体的推荐排序,具体包括:
在开放文本中统计各相关实体与目标实体的相关程度分数,根据各相关实体与目标实体的相关程度分数,确定各相关实体的推荐排序,其中,相关程分数度越高,推荐排序越靠前。
再一方面,推荐排序确定模块400,设置为确定各相关实体的推荐排序,具体包括:
确定各相关实体在目标知识图谱中对应的权重分数,根据各相关实体对应的权重分数,确定各相关实体的推荐排序,其中,权重分数越高,推荐排序越靠前。
本申请实施例提供的相关实体确定装置可以提升所确定的目标实体的相关实体结果的召回率,且所推荐的相关实体具有较高精度的排序,可提升目标实体的相关信息被用户利用的概率。
可选的,本发明实施例还提供一种计算设备,该计算设备可以包括上述所述的相关实体确定装置。
可选的,图12示出了该计算设备的硬件结构框图,参照图12,该计算设备可以包括:处理器1,通信接口2,存储器3和通信总线4;
其中处理器1、通信接口2、存储器3通过通信总线4完成相互间的通信;
可选的,通信接口2可以为通信模块的接口,如GSM模块的接口;
处理器1可能是一个中央处理器CPU,或者是特定集成电路ASIC(Application Specific Integrated Circuit),或者是被配置成实施本发明实施例的一个或多个集成电路。
存储器3可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。
其中,处理器1具体用于:
获取目标知识图谱,所述目标知识图谱至少具有目标实体;
确定所述目标知识图谱中,所述目标实体的候选实体集合;所述候选实体集合包括:可触达所述目标实体的各边数对应的候选实体;
根据所述候选实体集合,确定所述目标实体的相关实体。
根据本申请实施例的另一个方面,还提供了一种存储介质,可选地,在本实施例中,上述存储介质可以位于网络中的多个网络设备中的至少一个网络设备。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的计算机程序:
S1,获取目标知识图谱,目标知识图谱至少具有目标实体;
S2,确定目标知识图谱中,目标实体的候选实体集合;候选实体集合包括:可触达目标实体的各边数对应的候选实体;
S3,根据候选实体集合,确定目标实体的相关实体。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的计算机程序:
S1,获取输入文本,输入文本中预先给定有多个命名实体,命名实体至少包括目标实体;
S2,将输入文本中给定的命名实体,映射到知识图谱的目标实体上,得到目标知识图谱;知识图谱由包含目标实体的数据源构建。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的计算机程序:
S1,获取预设定的边数范围,边数范围包括多个边数;
S2,根据边数范围所包括的各边数,确定目标知识图谱中,可触达目标实体的各边数对应的候选实体,得到目标实体的候选实体集合。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的计算机程序:
S1,若候选实体集合中,存在对应不同边数的重复的候选实体,将候选实体集合中重复的候选实体进行去重处理,以保留重复的候选实体中边数最小的候选实体;
S2,将去重处理后的候选实体集合所包括的候选实体,作为目标实体的相关实体。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的计算机程序:
S1,确定各相关实体的推荐排序,以根据各相关实体的推荐排序进行相关实体的推荐。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的计算机程序:
S1,在开放文本中统计各相关实体与目标实体的相关程度分数;
S2,确定各相关实体在目标知识图谱中对应的权重分数;
S3,对于各相关实体,将相关实体对应的相关程度分数与权重分数相加,得到各相关实体对应的排序分数;
S4,根据各相关实体对应的排序分数,确定各相关实体的推荐排序,其中,排序分数越高,推荐排序越靠前。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的计算机程序:
S1,在开放文本中统计各相关实体与目标实体的相关程度分数,根据各相关实体与目标实体的相关程度分数,确定各相关实体的推荐排序,其中,相关程分数度越高,推荐排序越靠前;
S2,确定各相关实体在目标知识图谱中对应的权重分数,根据各相关实体对应的权重分数,确定各相关实体的推荐排序,其中,权重分数越高,推荐排序越靠前。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的计算机程序:
S1,以去重处理后的候选实体集合以及目标实体为范围,确定各相关实体可触达的最近实体;
S2,根据预设定的目标知识图谱中各关系相应的关系权重,确定各相关实体与可触达的最近实体的关系对应的关系权重,得到各相关实体对应的关系权重;
S3,对于各相关实体,将相关实体对应的边数的边数权重,与对应的关系权重相结合,得到各相关实体对应的权重分数;其中,边数越大,边数权重越小。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的计算机程序:
S1,对于一相关实体,确定同时出现该相关实体和目标实体的文本数量与文本总数量的第一比值,出现该相关实体的文本数量与文本总数量的第二比值,出现目标实体的文本数量与文本总数量的第三比值;
S2,根据第一比值,第二比值和第三比值,确定该相关实体与目标实体的互信息,以所确定的互信息表示该相关实体与目标实体的相关程度分数。
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。可选地,本实施例中的具体示例可以参考上述实施例1和实施例2中所描述的示例,本实施例在此不再赘述。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的核心思想或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。
工业实用性
在本申请实施例中,通过获取目标知识图谱,目标知识图谱至少具有目标实体;确定目标知识图谱中,目标实体的候选实体集合;候选实体集合包括:可触达目标实体的各边数对应的候选实体;根据候选实体集合,确定目标实体的相关实体。可以看出,本申请实施例采用至少具有目标实体的目标知识图谱,挖掘目标知识图谱中可触达目标实体的候选实体集合,进而根据候选实体集合,确定目标实体的相关实体,由于目标知识图谱收录的目标实体的相关信息更为全面,因此可以极大概率的挖掘到目标实体历史以往全面的相关信息,使得所挖掘出的目标实体的相关实体结果较为全面,提升所确定的目标实体的相关实体结果的召回率。

Claims (21)

  1. 一种相关实体确定方法,包括:
    获取目标知识图谱,所述目标知识图谱至少具有目标实体;
    确定所述目标知识图谱中,所述目标实体的候选实体集合;所述候选实体集合包括:可触达所述目标实体的各边数对应的候选实体;
    根据所述候选实体集合,确定所述目标实体的相关实体。
  2. 根据权利要求1所述的相关实体确定方法,其中,所述获取目标知识图谱包括:
    获取输入文本,所述输入文本中预先给定有多个命名实体,所述命名实体至少包括所述目标实体;
    将输入文本中给定的命名实体,映射到知识图谱的目标实体上,得到目标知识图谱;所述知识图谱由包含目标实体的数据源构建。
  3. 根据权利要求1或2所述的相关实体确定方法,其中,所述确定所述目标知识图谱中,所述目标实体的候选实体集合包括:
    获取预设定的边数范围,所述边数范围包括多个边数;
    根据所述边数范围所包括的各边数,确定所述目标知识图谱中,可触达所述目标实体的各边数对应的候选实体,得到所述目标实体的候选实体集合。
  4. 根据权利要求1所述的相关实体确定方法,其中,所述根据所述候选实体集合,确定所述目标实体的相关实体包括:
    若所述候选实体集合中,存在对应不同边数的重复的候选实体,将所述候选实体集合中重复的候选实体进行去重处理,以保留重复的候选实体中边数最小的候选实体;
    将去重处理后的候选实体集合所包括的候选实体,作为所述目标实体的相关实体。
  5. 根据权利要求4所述的相关实体确定方法,其中,还包括:
    确定各相关实体的推荐排序,以根据各相关实体的推荐排序进行相关实体的推荐。
  6. 根据权利要求5所述的相关实体确定方法,其中,所述确定各相关实体的推荐排序包括:
    在开放文本中统计各相关实体与目标实体的相关程度分数;
    确定各相关实体在所述目标知识图谱中对应的权重分数;
    对于各相关实体,将相关实体对应的相关程度分数与权重分数相加,得到各相关实体对应的排序分数;
    根据各相关实体对应的排序分数,确定各相关实体的推荐排序,其中,排序分数越高,推荐排序越靠前。
  7. 根据权利要求5所述的相关实体确定方法,其中,所述确定各相关实体的推荐排序包括:
    在开放文本中统计各相关实体与目标实体的相关程度分数,根据各相关实体与目标实体的相关程度分数,确定各相关实体的推荐排序,其中,相关程分数度越高,推荐排序越靠前;
    或,确定各相关实体在所述目标知识图谱中对应的权重分数,根据各相关实体对应的权重分数,确定各相关实体的推荐排序,其中,权重分数越高,推荐排序越靠前。
  8. 根据权利要求6或7所述的相关实体确定方法,其中,所述确定各相关实体在所述目标知识图谱中对应的权重分数包括:
    以去重处理后的候选实体集合以及所述目标实体为范围,确定各相关 实体可触达的最近实体;
    根据预设定的目标知识图谱中各关系相应的关系权重,确定各相关实体与可触达的最近实体的关系对应的关系权重,得到各相关实体对应的关系权重;
    对于各相关实体,将相关实体对应的边数的边数权重,与对应的关系权重相结合,得到各相关实体对应的权重分数;其中,边数越大,边数权重越小。
  9. 根据权利要求6或7所述的相关实体确定方法,其中,所述在开放文本中统计各相关实体与目标实体的相关程度分数包括:
    对于一相关实体,确定同时出现该相关实体和目标实体的文本数量与文本总数量的第一比值,出现该相关实体的文本数量与文本总数量的第二比值,出现目标实体的文本数量与文本总数量的第三比值;
    根据所述第一比值,第二比值和第三比值,确定该相关实体与目标实体的互信息,以所确定的互信息表示该相关实体与目标实体的相关程度分数。
  10. 一种相关实体确定装置,包括:
    目标知识图谱获取模块,设置为获取目标知识图谱,所述目标知识图谱至少具有目标实体;
    候选实体集合确定模块,设置为确定所述目标知识图谱中,所述目标实体的候选实体集合;所述候选实体集合包括:可触达所述目标实体的各边数对应的候选实体;
    相关实体确定模块,设置为根据所述候选实体集合,确定所述目标实体的相关实体。
  11. 根据权利要求10所述的相关实体确定装置,其中,所述目标知识图谱获取模块,设置为获取目标知识图谱,具体包括:
    获取输入文本,所述输入文本中预先给定有多个命名实体,所述命名实体至少包括所述目标实体;
    将输入文本中给定的命名实体,映射到知识图谱的目标实体上,得到目标知识图谱;所述知识图谱由包含目标实体的数据源构建。
  12. 根据权利要求10所述的相关实体确定装置,其中,所述相关实体确定模块,设置为根据所述候选实体集合,确定所述目标实体的相关实体,具体包括:
    若所述候选实体集合中,存在对应不同边数的重复的候选实体,将所述候选实体集合中重复的候选实体进行去重处理,以保留重复的候选实体中边数最小的候选实体;
    将去重处理后的候选实体集合所包括的候选实体,作为所述目标实体的相关实体。
  13. 根据权利要求12所述的相关实体确定装置,其中,还包括:
    推荐排序确定模块,设置为确定各相关实体的推荐排序,以根据各相关实体的推荐排序进行相关实体的推荐。
  14. 根据权利要求13所述的相关实体确定装置,其中,所述推荐排序确定模块,设置为确定各相关实体的推荐排序,具体包括:
    在开放文本中统计各相关实体与目标实体的相关程度分数;
    确定各相关实体在所述目标知识图谱中对应的权重分数;
    对于各相关实体,将相关实体对应的相关程度分数与权重分数相加,得到各相关实体对应的排序分数;
    根据各相关实体对应的排序分数,确定各相关实体的推荐排序,其中,排序分数越高,推荐排序越靠前。
  15. 根据权利要求14所述的相关实体确定装置,其中,所述推荐排序 确定模块,设置为确定各相关实体在所述目标知识图谱中对应的权重分数,具体包括:
    以去重处理后的候选实体集合以及所述目标实体为范围,确定各相关实体可触达的最近实体;
    根据预设定的目标知识图谱中各关系相应的关系权重,确定各相关实体与可触达的最近实体的关系对应的关系权重,得到各相关实体对应的关系权重;
    对于各相关实体,将相关实体对应的边数的边数权重,与对应的关系权重相结合,得到各相关实体对应的权重分数;其中,边数越大,边数权重越小;
    所述推荐排序确定模块,设置为在开放文本中统计各相关实体与目标实体的相关程度分数,具体包括:
    对于一相关实体,确定同时出现该相关实体和目标实体的文本数量与文本总数量的第一比值,出现该相关实体的文本数量与文本总数量的第二比值,出现目标实体的文本数量与文本总数量的第三比值;
    根据所述第一比值,第二比值和第三比值,确定该相关实体与目标实体的互信息,以所确定的互信息表示该相关实体与目标实体的相关程度分数。
  16. 一种计算设备,其中,包括权利要求10-15任一项所述的相关实体确定装置。
  17. 一种相关实体确定方法,包括:
    计算设备获取目标知识图谱,所述目标知识图谱至少具有目标实体;
    所述计算设备确定所述目标知识图谱中,所述目标实体的候选实体集合;所述候选实体集合包括:可触达所述目标实体的各边数对应的候选实体;
    所述计算设备根据所述候选实体集合,确定所述目标实体的相关实体。
  18. 根据权利要求17所述的相关实体确定方法,其中,计算设备获取目标知识图谱包括:
    所述计算设备获取输入文本,所述输入文本中预先给定有多个命名实体,所述命名实体至少包括所述目标实体;
    所述计算设备将输入文本中给定的命名实体,映射到知识图谱的目标实体上,得到目标知识图谱;所述知识图谱由包含目标实体的数据源构建。
  19. 根据权利要求17或18所述的相关实体确定方法,其中,所述计算设备确定所述目标知识图谱中,所述目标实体的候选实体集合包括:
    所述计算设备获取预设定的边数范围,所述边数范围包括多个边数;
    所述计算设备根据所述边数范围所包括的各边数,确定所述目标知识图谱中,可触达所述目标实体的各边数对应的候选实体,得到所述目标实体的候选实体集合。
  20. 根据权利要求17所述的相关实体确定方法,其中,所述计算设备根据所述候选实体集合,确定所述目标实体的相关实体包括:
    若所述候选实体集合中,存在对应不同边数的重复的候选实体,所述计算设备将所述候选实体集合中重复的候选实体进行去重处理,以保留重复的候选实体中边数最小的候选实体;
    所述计算设备将去重处理后的候选实体集合所包括的候选实体,作为所述目标实体的相关实体。
  21. 一种存储介质,所述存储介质包括存储的计算机程序,其中,所述计算机程序运行时执行上述权利要求1至9或17-20任一项中所述的方法。
PCT/CN2018/077416 2017-03-02 2018-02-27 一种相关实体确定方法、装置、计算设备及存储介质 WO2018157790A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710120836.9 2017-03-02
CN201710120836.9A CN108536702B (zh) 2017-03-02 2017-03-02 一种相关实体确定方法、装置及计算设备

Publications (1)

Publication Number Publication Date
WO2018157790A1 true WO2018157790A1 (zh) 2018-09-07

Family

ID=63369790

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/077416 WO2018157790A1 (zh) 2017-03-02 2018-02-27 一种相关实体确定方法、装置、计算设备及存储介质

Country Status (2)

Country Link
CN (1) CN108536702B (zh)
WO (1) WO2018157790A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134796A (zh) * 2019-04-19 2019-08-16 平安科技(深圳)有限公司 基于知识图谱的临床试验检索方法、装置、计算机设备及存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008352B (zh) * 2019-03-28 2022-12-20 腾讯科技(深圳)有限公司 实体发现方法及装置
CN110825821B (zh) * 2019-09-30 2022-11-22 深圳云天励飞技术有限公司 人员关系的查询方法、装置、电子设备及存储介质
CN113010769A (zh) * 2019-12-19 2021-06-22 京东方科技集团股份有限公司 基于知识图谱的物品推荐方法、装置、电子设备及介质
CN112069323B (zh) * 2020-08-04 2024-04-26 扬州制汇互联信息技术有限公司 一种基于工业知识图谱的推荐方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591862A (zh) * 2011-01-05 2012-07-18 华东师范大学 一种基于词共现的汉语实体关系提取的控制方法及装置
CN104102713A (zh) * 2014-07-16 2014-10-15 百度在线网络技术(北京)有限公司 推荐结果的展现方法和装置
CN104537065A (zh) * 2014-12-29 2015-04-22 北京奇虎科技有限公司 一种搜索结果的推送方法及系统
CN104854583A (zh) * 2012-08-08 2015-08-19 谷歌公司 搜索结果排名和呈现
US20160189028A1 (en) * 2014-12-31 2016-06-30 Verizon Patent And Licensing Inc. Systems and Methods of Using a Knowledge Graph to Provide a Media Content Recommendation
CN106326211A (zh) * 2016-08-17 2017-01-11 海信集团有限公司 交互语句的关键词间距离的确定方法和装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9594831B2 (en) * 2012-06-22 2017-03-14 Microsoft Technology Licensing, Llc Targeted disambiguation of named entities
CN103593792B (zh) * 2013-11-13 2016-09-28 复旦大学 一种基于中文知识图谱的个性化推荐方法与系统
CN104199872A (zh) * 2014-08-19 2014-12-10 北京搜狗科技发展有限公司 一种信息推荐的方法以及装置
CN105095433B (zh) * 2015-07-22 2019-07-05 百度在线网络技术(北京)有限公司 实体推荐方法及装置
CN106372118B (zh) * 2016-08-24 2019-05-03 武汉烽火普天信息技术有限公司 面向大规模媒体文本数据的在线语义理解搜索系统及方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591862A (zh) * 2011-01-05 2012-07-18 华东师范大学 一种基于词共现的汉语实体关系提取的控制方法及装置
CN104854583A (zh) * 2012-08-08 2015-08-19 谷歌公司 搜索结果排名和呈现
CN104102713A (zh) * 2014-07-16 2014-10-15 百度在线网络技术(北京)有限公司 推荐结果的展现方法和装置
CN104537065A (zh) * 2014-12-29 2015-04-22 北京奇虎科技有限公司 一种搜索结果的推送方法及系统
US20160189028A1 (en) * 2014-12-31 2016-06-30 Verizon Patent And Licensing Inc. Systems and Methods of Using a Knowledge Graph to Provide a Media Content Recommendation
CN106326211A (zh) * 2016-08-17 2017-01-11 海信集团有限公司 交互语句的关键词间距离的确定方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134796A (zh) * 2019-04-19 2019-08-16 平安科技(深圳)有限公司 基于知识图谱的临床试验检索方法、装置、计算机设备及存储介质
CN110134796B (zh) * 2019-04-19 2023-06-02 平安科技(深圳)有限公司 基于知识图谱的临床试验检索方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN108536702B (zh) 2022-12-02
CN108536702A (zh) 2018-09-14

Similar Documents

Publication Publication Date Title
WO2018157790A1 (zh) 一种相关实体确定方法、装置、计算设备及存储介质
US9576029B2 (en) Trust propagation through both explicit and implicit social networks
US10068022B2 (en) Identifying topical entities
CN106415540B (zh) 联合搜索
US9251292B2 (en) Search result ranking using query clustering
US10146775B2 (en) Apparatus, system and method for string disambiguation and entity ranking
JP2017220203A (ja) 類似性スコアに基づきコンテンツアイテムと画像とのマッチングを評価する方法、およびシステム
US20150161258A1 (en) Customizing image search for user attributes
JP6124917B2 (ja) 情報検索のための方法および装置
CN106663100B (zh) 多域查询补全
US20120284253A9 (en) System and method for query suggestion based on real-time content stream
US9009192B1 (en) Identifying central entities
US11762899B1 (en) Lightness filter
US10127322B2 (en) Efficient retrieval of fresh internet content
EP4109293A1 (en) Data query method and apparatus, electronic device, storage medium, and program product
US9483559B2 (en) Reformulating query terms in structured search
JP2011221872A (ja) 知識量推定装置及びプログラム
US9275153B2 (en) Ranking search engine results
CN107463590B (zh) 自动的对话阶段发现
US9317528B1 (en) Identifying geographic features from query prefixes
TW201411379A (zh) 搜索系統及方法
US20150347427A1 (en) Use of user feedback in a citation search index

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18760664

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18760664

Country of ref document: EP

Kind code of ref document: A1