CN110309316B

CN110309316B - Method and device for determining knowledge graph vector, terminal equipment and medium

Info

Publication number: CN110309316B
Application number: CN201810587003.8A
Authority: CN
Inventors: 曹洋; 卢菁; 冯亚伟; 李彪; 范欣
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-06-08
Filing date: 2018-06-08
Publication date: 2022-10-25
Anticipated expiration: 2038-06-08
Also published as: CN110309316A

Abstract

The method comprises the steps of obtaining text information of each entity to be processed, determining topic distribution probability of each entity to be processed corresponding to each set topic based on the text information of each entity to be processed, obtaining a topic knowledge graph based on the topic distribution probability, and respectively determining knowledge graph vectors of each entity to be processed based on an extended knowledge graph obtained by combining the knowledge graph and the topic knowledge graph. Therefore, the topic knowledge graph is obtained based on the topic distribution probability determined by the text information, the knowledge graph is expanded based on the topic knowledge graph, the text information and graph structure information are effectively fused, and the expression significance of knowledge graph vectors is enriched.

Description

Method and device for determining knowledge graph vector, terminal equipment and medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a terminal device, and a medium for determining a knowledge graph vector.

Background

The knowledge map is also called scientific knowledge map, is known as knowledge domain visualization or knowledge domain mapping map in the book information world, and is a series of different graphs for displaying the relationship between the knowledge development process and the structure. The method is used for describing knowledge resources and carriers thereof through visualization technology, and mining, analyzing, constructing, drawing and displaying knowledge and the mutual relation among the knowledge resources and the carriers. Knowledge-graphs are typically represented using a triple structure, i.e., entity-relationship-entity. For example, one triad structure is [ lily ] - (phylum) - [ angiosperm phylum ].

In the prior art, in order to improve the application efficiency of the knowledge graph, the knowledge graph vector of an entity is generally obtained based on the triple structure of the knowledge graph.

However, the triple structure information is only a part of the knowledge information, and a large amount of other information is not effectively utilized. Therefore, a knowledge graph vector scheme is needed, which can fuse related information of multi-source heterogeneity to perform knowledge representation on a knowledge graph.

Disclosure of Invention

The embodiment of the application provides a knowledge graph vector determination method, a knowledge graph vector determination device, terminal equipment and a medium, which are used for enriching the expression significance of knowledge graph vectors when knowledge graph vectors of entities are obtained based on knowledge graphs.

In a first aspect, a method for determining a knowledge-graph vector is provided, including:

acquiring text information of each entity to be processed, and respectively determining the theme distribution probability of each entity to be processed corresponding to each set theme based on the text information of each entity to be processed;

determining the incidence relation between each entity to be processed and each theme to obtain a theme knowledge graph based on the determined theme distribution probability;

acquiring a stored knowledge graph between the entities to be processed, and acquiring an extended knowledge graph between each entity to be processed and each topic based on the topic knowledge graph and the knowledge graph;

and respectively determining the knowledge graph vector of each entity to be processed based on the determined extended knowledge graph.

Therefore, text information is fully fused, the knowledge graph is expanded, the information coverage of the expanded knowledge graph is high, and the expression significance of knowledge graph vectors is enriched.

Preferably, based on the determined distribution probability of each topic, determining the association relationship between each entity to be processed and each topic to obtain a topic knowledge graph, which specifically includes:

respectively aiming at the topic distribution probability of each entity to be processed corresponding to each topic, executing the following steps: when the distribution probability of the theme is determined to be higher than a preset distribution probability threshold value, associating the entity to be processed corresponding to the distribution probability of the theme with the theme;

and obtaining a theme knowledge graph based on the determined incidence relation between each entity to be processed and the theme.

In this way, each associated entity to be processed and the associated theme are screened out through the preset distribution probability threshold value.

Preferably, before obtaining the extended knowledge graph between each entity to be processed and each topic based on the topic knowledge graph and the knowledge graph, the method further comprises:

respectively aiming at each entity to be processed, the following steps are executed: determining a theme vector corresponding to the entity to be processed according to the theme distribution probability of the entity to be processed corresponding to each theme;

respectively aiming at every two entities to be processed, the following steps are executed: determining the distance between the theme vectors of the two entities to be processed, and establishing association between the two entities to be processed if the distance between the theme vectors of the two entities to be processed is higher than a preset distance threshold value;

and updating the theme knowledge graph based on the entities to be processed establishing the association.

Thus, the association between the entities to be processed is expanded through the text information.

Preferably, the determining the knowledge-graph vector of each entity to be processed based on the determined extended knowledge-graph specifically includes:

for a node in the extended knowledge-graph, performing the steps of: respectively determining the random walk probability of the node jumping to each adjacent node in the extended knowledge graph based on preset control return parameters and depth parameters; the nodes comprise entities to be processed and themes, the control return parameter is used for determining the random walk probability when one node returns to the previous node, and the depth parameter is used for determining the random walk probability when one node jumps to a node which is not adjacent to the previous node;

obtaining each random walk sequence of the extended knowledge graph based on the random walk probability among each node in the extended knowledge graph;

and respectively determining the knowledge map vector of each entity to be processed based on the determined random walk sequences.

Therefore, the expanded knowledge graph is the expanded knowledge graph, the information coverage is high, and the expression significance of the knowledge graph vector is enriched.

Preferably, after determining the knowledge-graph vector of each entity to be processed based on the determined extended knowledge-graph, the method further includes:

respectively determining the distance between knowledge map vectors of the entities to be processed;

and respectively determining the similarity between every two to-be-processed entities based on the distance between the knowledge graph vectors of the to-be-processed entities.

Thus, the similarity between the entities to be processed can be determined according to the distance between the knowledge-graph vectors.

In a second aspect, an apparatus for determining a knowledge-graph vector comprises:

the first determining unit is used for acquiring the text information of each entity to be processed and respectively determining the theme distribution probability of each entity to be processed corresponding to each set theme based on the text information of each entity to be processed;

the second determining unit is used for determining the incidence relation between each entity to be processed and each theme based on the determined distribution probability of each theme to obtain a theme knowledge graph;

the acquisition unit is used for acquiring the stored knowledge graph between the entities to be processed and acquiring the expanded knowledge graph between the entities to be processed and each topic based on the topic knowledge graph and the knowledge graph;

and the third determining unit is used for respectively determining the knowledge graph vector of each entity to be processed based on the determined extended knowledge graph.

Preferably, when determining the association relationship between each entity to be processed and each topic based on the determined distribution probability of each topic to obtain the topic knowledge graph, the second determining unit is specifically configured to:

Preferably, before the obtaining of the extended knowledge-graph between each entity to be processed and each topic based on the topic knowledge-graph and the knowledge-graph, the obtaining unit is further configured to:

Preferably, when determining the knowledge-map vector of each entity to be processed based on the determined extended knowledge-map, the third determining unit is specifically configured to:

Preferably, after determining the knowledge-map vector of each entity to be processed based on the determined extended knowledge-map, the third determining unit is further configured to:

In a third aspect, a terminal device is provided, comprising at least one processing unit, and at least one storage unit, wherein the storage unit stores a computer program that, when executed by the processing unit, causes the processing unit to perform the steps of any one of the above-described methods for knowledge-graph vector determination.

In a fourth aspect, there is provided a computer readable medium storing a computer program executable by a terminal device, the program, when run on the terminal device, causing the terminal device to perform the steps of any one of the above-described knowledge-graph vector determination methods.

In the method, the device, the terminal equipment and the medium for determining the knowledge graph vector, the text information of each entity to be processed is obtained, the topic distribution probability of each entity to be processed corresponding to each set topic is determined based on the text information of each entity to be processed, the topic knowledge graph is obtained based on the topic distribution probability, and the knowledge graph vector of each entity to be processed is determined based on the extended knowledge graph obtained by combining the knowledge graph and the topic knowledge graph. Therefore, the theme knowledge graph is obtained based on the theme distribution probability determined by the text information, the knowledge graph is expanded based on the theme knowledge graph, the text information and graph structure information are effectively fused, and the expression significance of knowledge graph vectors is further enriched.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a schematic structural diagram of a terminal device provided in an embodiment of the present application;

FIG. 2a is a schematic flow chart illustrating the determination and application of knowledge-graph vectors according to an embodiment of the present disclosure;

FIG. 2b is a flowchart of an implementation of a method for knowledge-graph vector determination according to an embodiment of the present disclosure;

FIG. 3a is a diagram illustrating an example of an atlas provided in an embodiment of the present application;

fig. 3b is a schematic diagram of a node jump provided in the embodiment of the present application;

FIG. 3c is a schematic diagram of a random walk provided in an embodiment of the present application;

FIG. 3d is a schematic diagram of knowledge-graph vector generation provided in an embodiment of the present application;

fig. 3e is an illustration of a user image expansion display provided in an embodiment of the present application;

FIG. 3f is a diagram of a topic recommendation example provided in an embodiment of the present application;

FIG. 3g is a diagram of a related example of reading the first embodiment of the present application;

FIG. 3h is a diagram of a topic recommendation example provided in an embodiment of the present application;

FIG. 3i is a second example of a related reading provided in the embodiments of the present application;

FIG. 3j is a diagram illustrating an example of entity identification provided in an embodiment of the present application;

FIG. 4 is a schematic structural diagram of an apparatus for determining knowledge-map vectors according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a terminal device in the embodiment of the present application.

Detailed Description

In order to enrich the accuracy of the representation meaning of a knowledge graph vector when the knowledge graph vector of an entity is obtained based on a knowledge graph, the embodiment of the application provides a method, a device, terminal equipment and a medium for determining the knowledge graph vector.

First, some terms referred to in the embodiments of the present application are explained so as to be easily understood by those skilled in the art.

1. The terminal equipment: the electronic device can be mobile or fixed, and can be used for installing various applications and displaying entities provided in the installed applications. For example, a mobile phone, a tablet computer, a vehicle-mounted device, a Personal Digital Assistant (PDA), or other electronic devices capable of implementing the above functions.

2. Knowledge graph: the map is also called scientific knowledge map, is known as knowledge domain visualization or knowledge domain mapping map in the book information world, and is a series of different graphs for displaying the relationship between the knowledge development process and the structure. The method is used for describing knowledge resources and carriers thereof through visualization technology, and mining, analyzing, constructing, drawing and displaying knowledge and the mutual relation among the knowledge resources and the carriers. Knowledge-graph is essentially a semantic network. Its nodes represent entities (entries) or concepts (concepts), and edges represent various semantic relationships between entities/concepts. Knowledge-graphs are typically represented using a triple structure, i.e., entity-relationship-entity.

In the prior art, a text vector of an entity is trained based on text information, a structure vector of the entity is trained based on a knowledge graph, and the text vector and the structure vector of the entity are close to each other as much as possible through a training process, so that the fusion of a text and a knowledge base is expressed. However, when the text information and the knowledge graph respectively present the characteristics of different aspects of the entity, the method is not applicable and the application range is small.

In view of this, embodiments of the present application provide a method, an apparatus, a terminal device, and a medium for determining a knowledge graph vector, where text information of each to-be-processed entity is obtained, a topic distribution probability that each to-be-processed entity corresponds to each set topic is determined based on the text information of each to-be-processed entity, a topic knowledge graph is obtained based on the topic distribution probability, and a knowledge graph vector of each to-be-processed entity is determined based on an extended knowledge graph obtained by combining the knowledge graph and the topic knowledge graph. Therefore, the topic knowledge graph is obtained based on the topic distribution probability determined by the text information, the knowledge graph is expanded based on the topic knowledge graph, the text information and graph structure information are effectively fused, and the expression significance of knowledge graph vectors is enriched.

The method for determining knowledge graph vectors provided by the embodiment of the application can be applied to terminal equipment, and the terminal equipment can be a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), and the like.

Fig. 1 shows a schematic structural diagram of a terminal device 100. Referring to fig. 1, the terminal device 100 includes: a processor 110, a memory 120, a power supply 130, a display unit 140, an input unit 150.

The processor 110 is a control center of the terminal device 100, connects various components using various interfaces and lines, and performs various functions of the terminal device 100 by running or executing software programs and/or data stored in the memory 120, thereby performing overall monitoring of the terminal device.

Alternatively, processor 110 may include one or more processing units; preferably, the processor 110 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 110. In some embodiments, the processor, memory, and memory may be implemented on a single chip, or in some embodiments, they may be implemented separately on separate chips.

The memory 120 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, various application programs, and the like; the storage data area may store data created according to the use of the terminal device 100, and the like. Further, the memory 120 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device, among others.

The terminal device 100 further includes a power supply 130 (e.g., a battery) for supplying power to various components, which may be logically connected to the processor 110 via a power management system, thereby performing functions of managing charging, discharging, and power consumption via the power management system.

The display unit 140 may be configured to display information input by a user or information provided to the user, and various menus of the terminal device 100, and is mainly configured to display a display interface of each application program in the terminal device 100 and entities such as texts and pictures displayed in the display interface in the embodiment of the present application. The display unit 140 may include a display panel 141. The Display panel 141 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The input unit 150 may be used to receive information such as numbers or characters input by a user. The input unit 150 may include a touch panel 151 and other input devices 152. Among other things, the touch panel 151, also referred to as a touch screen, may collect touch operations by a user thereon or nearby (e.g., operations by a user on or near the touch panel 151 using any suitable object or accessory such as a finger, a stylus, etc.).

Specifically, the touch panel 151 may detect a touch operation of a user, detect signals generated by the touch operation, convert the signals into touch point coordinates, transmit the touch point coordinates to the processor 110, receive a command from the processor 110, and execute the command. In addition, the touch panel 151 may be implemented in various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. Other input devices 152 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, power on/off keys, etc.), a trackball, a mouse, a joystick, and the like.

Of course, the touch panel 151 may cover the display panel 141, and when the touch panel 151 detects a touch operation on or near the touch panel, the touch panel is transmitted to the processor 110 to determine the type of the touch event, and then the processor 110 provides a corresponding visual output on the display panel 141 according to the type of the touch event. Although in fig. 1, the touch panel 151 and the display panel 141 are two separate components to implement the input and output functions of the terminal device 100, in some embodiments, the touch panel 151 and the display panel 141 may be integrated to implement the input and output functions of the terminal device 100.

The terminal device 100 may also include one or more sensors, such as pressure sensors, gravitational acceleration sensors, proximity light sensors, and the like. Of course, the terminal device 100 may further include other components such as a camera according to the requirements of a specific application, and these components are not shown in fig. 1 and are not described in detail since they are not components used in the embodiment of the present application.

Those skilled in the art will appreciate that fig. 1 is merely an example of a terminal device and is not limiting of terminal devices and may include more or fewer components than those shown, or some of the components may be combined, or different components.

Fig. 2a is a schematic diagram illustrating a process of determining and applying a knowledge-graph vector according to an embodiment of the present application. Firstly, the terminal equipment acquires text information of each entity to be processed, and acquires a theme vector corresponding to each entity to be processed based on the text information of each entity to be processed, wherein the theme vector corresponding to the entity to be processed comprises theme distribution probability of each set theme corresponding to the entity to be processed. Then, the terminal equipment determines the association relation between each entity to be processed and each theme based on each theme vector, and obtains a theme knowledge map. Further, the terminal equipment combines the theme knowledge map and the stored knowledge map to obtain an extended knowledge map, and determines knowledge map vectors of each entity to be processed respectively based on the extended knowledge map. Therefore, the terminal equipment can determine the similarity between the entities to be processed based on the distance between the knowledge map vectors, and further apply the similarity to similar news topic recommendation or user portrait extension and the like.

Referring to fig. 2b, a flowchart of an implementation of the method for determining a knowledge-graph vector according to the present application is shown. In the following description, the method is described in detail with reference to the schematic structural diagram of the terminal device shown in fig. 1, and the specific implementation flow of the method is as follows:

step 200: and the terminal equipment respectively determines the theme distribution probability of each entity to be processed corresponding to each set theme based on the acquired text information of each entity to be processed.

Specifically, first, the terminal device 100 obtains text information of each entity to be processed, and determines a first probability that each word in each text information corresponds to each set topic, and a second probability that each topic corresponds to each word.

Then, the terminal device 100 determines the topic distribution probability of each entity to be processed corresponding to each set topic respectively based on the obtained first probability and the second probability.

Optionally, when step 200 is executed, the terminal device 100 may obtain text information of each entity to be processed, and determine, based on the text information of each entity to be processed, a theme distribution probability that each entity to be processed corresponds to each set theme by using a preset document theme generation model. The document theme generation model is used for converting each text message into a theme vector with specified dimensions by assuming that each word of an article is obtained by a process of selecting a theme with a certain probability and selecting a word from the theme with a certain probability. Wherein document-to-topic follows a polynomial distribution and topic-to-word follows a polynomial distribution. The topic vector of one entity to be processed contains the topic distribution probability of the entity to be processed corresponding to each topic respectively. Optionally, the document topic generation model may be a three-layer bayesian probability model (LDA).

The entity to be processed is an entity in the knowledge graph, and can be a human object, a work, a place, a numerical value, a height and the like. Knowledge-graph is essentially a semantic network. Its nodes represent entities (entries) or concepts (concepts) and edges represent various semantic relationships between entities/concepts. Knowledge-graphs are typically represented using a triple structure, i.e., entity-relationship-entity. An entity may be represented by several relationship types, for example, people such as an entity having a relationship type of birthday, height, wife, etc. The movie entities include director, actors, country of production, date of showing, etc. The association relationship can be established between different entities through the relationship type of the entities.

For example, liu somewhere (entity) -wife (relationship type) -Zhu somewhere (entity).

Also for example, liu is a certain (entity) -movie works (relationship type) -has no lane (entity).

As another example, indifferent (entity) -country/region of production (type of relationship) -hong kong (entity) china.

Step 210: and the terminal equipment determines the incidence relation between each entity to be processed and each theme based on the determined distribution probability of each theme to obtain a theme knowledge map.

Specifically, the terminal device 100 executes the following steps for the topic distribution probability of each entity to be processed corresponding to each topic, respectively:

first, when the terminal device 100 determines that a topic distribution probability is higher than a preset distribution probability threshold value, the to-be-processed entity corresponding to the topic distribution probability is associated with the topic.

For example, the preset distribution probability threshold value is 0.8.

Then, the terminal device 100 obtains a topic knowledge graph based on the determined association relationship between each entity to be processed and the topic.

When the distribution probability of the theme is determined to be higher than the preset distribution probability threshold value, the corresponding entity to be processed and the theme are connected to obtain the theme knowledge map. For example, referring to fig. 3a, an exemplary graph is shown, wherein in the subject spectrogram shown in fig. 3a, a is associated with a subject 2, and B and D are respectively associated with a subject 1.

Therefore, the topic knowledge graph containing the association relation between each entity to be processed and each topic can be obtained according to the text information. Wherein, the subject knowledge graph is also a knowledge graph and adopts a triple structure. The nodes in the topic knowledge graph comprise the entities to be processed and the topics.

Further, the terminal device 100 may also update the topic knowledge graph, and the specific flow is as follows:

first, the terminal device 100 performs the following steps for each entity to be processed, respectively:

and determining a theme vector corresponding to the entity to be processed according to the theme distribution probability of the entity to be processed corresponding to each theme. Optionally, the document theme generation model may be used to directly determine each real theme vector to be processed.

Then, the terminal device 100 performs the following steps for each two entities to be processed, respectively:

determining the distance between the theme vectors of the two entities to be processed, and if the distance between the theme vectors of the two entities to be processed is higher than a preset distance threshold value, establishing the association between the two entities to be processed.

And finally, the terminal equipment updates the theme knowledge graph based on each entity to be processed establishing the association.

If the distance between the theme vectors of the two entities to be processed is smaller, the similarity of the two entities to be processed is higher, so that the entities to be processed with higher similarity can be associated, and the theme knowledge graph is expanded.

Obviously, the subject knowledge graph bears the text information of the entity to be processed, and the text information is fully fused.

In the embodiment of the application, the text information of each entity to be processed is obtained only after the theme knowledge graph is determined, and the theme knowledge graph is updated based on the text information of each entity to be processed. In practical applications, the subject knowledge graph may be updated at any step before the extended knowledge graph is obtained, which is not described herein again.

Step 220: the terminal equipment acquires the stored knowledge graph between the entities to be processed and acquires the extended knowledge graph between the entities to be processed and the topics based on the topic knowledge graph and the knowledge graph.

Specifically, the terminal device 100 acquires the stored knowledge graph between the entities to be processed, and fuses (i.e., merges) the topic knowledge graph and the knowledge graph to obtain the extended knowledge graph between each entity to be processed and each topic.

For example, referring to FIG. 3a, the knowledge-graph includes A, B, C and D, where B and D are associated with C, respectively, and in the topic graph, A is associated with topic 2 and B and D are associated with topic 1, respectively. The terminal device 100 fuses the knowledge graph and the topic knowledge graph to obtain an extended knowledge graph. Wherein, A in the extended knowledge graph is associated with a subject 2, B and D are respectively associated with a subject 1 and are respectively associated with C.

Therefore, the knowledge graph can be expanded on the basis of the knowledge graph and the text information of the entity to be processed, the expanded knowledge graph is obtained, the triple structure in the knowledge graph is included, the text information of the entity to be processed is fully fused, and the information coverage rate of the expanded knowledge graph is improved.

In the embodiment of the present application, it is not necessary to distinguish the types of the relationships between the entities, and optionally, the relationships between the entities may be set to be of the same type, so as to facilitate determination of the subsequent knowledge graph vector.

Step 230: and the terminal equipment respectively determines the knowledge map vector of each entity to be processed based on the determined extended knowledge map.

Specifically, first, the terminal device 100 performs the following steps for each node in the extended knowledge graph:

and respectively determining the random walk probability of a node, namely the current node jumping to each adjacent node in the extended knowledge graph, based on the preset control return parameter and the preset depth parameter.

The nodes comprise entities and subjects, the control return parameter is used for determining the random walk probability when the current node returns to the previous node, and the depth parameter is used for determining the random walk probability when the current node jumps to a node which is not adjacent to the previous node. Each node in the extended knowledge graph comprises an entity to be processed and a topic.

Optionally, when the terminal device 100 determines the random walk probability, the following formula may be first adopted:

wherein t is a previous node of the current node, x is a neighboring node of the current node, A (t, x) is a random walk probability when the current node jumps to a next neighboring node x, p is a control return parameter, q is a depth parameter, d _tx Is the relationship of node x to node t. d _tx =0 denotes that x node is t node, d _tx =1 denotes a neighbor node where x node is t node, d _tx =2 denotes a non-adjacent node where x node is t node, and P and q are positive numbers.

Optionally, in order to facilitate subsequent data processing, the terminal device 100 may further normalize each a (t, x) to obtain a normalized random walk probability that the current node jumps to each neighboring node.

For example, referring to FIG. 3b, a schematic diagram of node hopping is shown. Suppose that the current node is a v node, the last node of the v node is a t node, the control return parameter is p, and the depth parameter is q. The random walk probability of the v node jumping to the t node is a (t, x) =1/p. And if the x1 node is adjacent to the t node, the random walk probability of the v node jumping to the x1 node is A (t, x) = m. And both the x2 node and the x3 node are not adjacent to the t node, and the random walk probability of the v node jumping to the x2 node or the x3 node is A (t, x) =1/q. Further, each random walk probability is normalized, and each normalized random walk probability is obtained.

Then, the terminal device 100 obtains a random walk sequence of a specified step length of a specified sequence number based on the determined random walk probability between the nodes.

The number of the designated sequences is the total number of the random walk sequences, and the value is the product of the total number of the nodes and the number of the preset node turns. The node round times are various sequence times taking the same node as the starting point of the random walk sequence. The preset step size is the total node number of a random walk sequence.

In this way, by introducing two control return parameters and depth parameters, the width-first search and the depth-first search are introduced into the generation process of the random walk sequence, and by adopting the above formula, the jump probability of the random walk sequence is controlled by the return parameters and the depth parameters, and then each random walk probability is calculated.

For example, see FIG. 3c, which is a schematic diagram of random walk. If q is small, the random walk sequence is biased to the deep walk, for example, u-s4-s5-s6-s9-s8-s, and if p is small, the random walk sequence is biased to return to the last node just passed through (for example, the last node s1, s2 or s3 of the u node).

Next, the terminal device 100 determines a knowledge-map vector of each entity to be processed, respectively, based on each random walk sequence.

For example, referring to fig. 3d, a schematic diagram of knowledge-graph vector generation is shown, where the extended knowledge-graph includes nodes 1 to 6, each random walk sequence is obtained based on the random walk probability between each node in the extended knowledge-graph, and each knowledge-graph vector, such as ui-1, ui +1 and ui +2, is obtained through a skip-gram model based on each random walk sequence, where u is a vector and i is a node.

Optionally, when the knowledge-graph vector of each entity to be processed is determined, the knowledge-graph vector can be obtained by adopting a Skip-Gram algorithm. Namely, the word2vec algorithm is analogized, the nodes are used as words, the random walk sequence is used as a sentence through the random walk, and the knowledge map vector of the nodes is obtained through the Skip-Gram algorithm.

Step 240: and the terminal equipment determines the similarity between the entities to be processed based on the distance between the knowledge map vectors, and applies the entities to be processed according to the similarity.

The application based on knowledge-graph vectors is further illustrated below by specific three application scenarios.

The application scenario I is used for expanding user portrait, and the specific application is as follows:

referring to fig. 3e, the illustration chart of the user image is shown, and it is assumed that the original image of the user includes the following labels: car accidents, wang Xing people, traffic accidents, and budding pets. The terminal device obtains the labels of the original pictures and the stored knowledge map vectors of the labels, and respectively calculates the distance between each stored label and each label of the original pictures based on the map vectors to obtain the similarity. Then, the terminal device performs the following steps for each stored tag: a weighted sum of the similarity between each of the stored labels and the original representation and the corresponding weight is determined to obtain a similarity weighting between the stored labels and the original representation. And finally, the terminal equipment screens out labels with higher similarity with the original image according to the weighted sum of all the similarities, and expands the original image according to the screened labels.

In this way, the similarity between each stored label and the original image can be determined, and the original image can be expanded according to the similarity.

In an application scenario two, related topics are recommended, and the method is specifically applied as follows:

referring to fig. 3f, an example of topic recommendation is shown in fig. one. Assuming that the current news topic is the topic shown in fig. 3f, the terminal device uses the news topic as an entity to be processed (i.e., a node), and uses the topic content as text information of the news topic. The terminal equipment determines knowledge map vectors of each news topic based on the current news topic and the stored topic contents of each news topic, and respectively calculates the distance between the current news topic and the knowledge map of each news topic to obtain each similarity. Finally, based on the obtained similarity, the news topic associated with the current news topic is screened out, and as shown in fig. 3g, for an example of associated reading, a recommendation of associated reading is made to the user. For another example, refer to fig. 3h as a topic recommendation diagram two, refer to fig. 3i as an associated reading example diagram two. And the terminal equipment determines each news topic shown in the figure 3i as the associated topic of the news topic shown in the figure 3h by adopting a knowledge map vector determination method, and carries out topic recommendation to the user.

In this way, the similarity between the news topics can be determined, and thus the associated news topics can be recommended.

And an application scene three, entity identification, which is specifically applied as follows:

FIG. 3j is a diagram illustrating an example of entity identification. The terminal device takes the 'name' in the text as the entity to be processed and takes the text as corresponding text information. Taking the name "cao XX" as an example, the terminal device obtains a knowledge graph vector of the "cao XX" and knowledge graph vectors of each "cao XX" (that is, names of different people may be the same) stored in the database by using a knowledge graph vector determination method based on the "cao XX" and the text, selects a stored target "cao XX" most similar to the "cao XX" in the database based on a distance between the knowledge graph vectors of the "cao XX", and obtains stored characters (that is, characters related to each person are already stored in the data) associated with the target "cao XX" (that is, a parent or a child, etc.) stored in the database.

Therefore, the specific identity of the person in the body text can be identified through the knowledge map vector, and the stored associated person can be accurately acquired.

Based on the same inventive concept, the embodiment of the present application further provides a device for determining a knowledge graph vector, and because the principle of the device and the apparatus for solving the problem is similar to a method for determining a knowledge graph vector, the implementation of the device can refer to the implementation of the method, and repeated details are not repeated.

As shown in fig. 4, a schematic structural diagram of an apparatus for determining a knowledge-graph vector according to an embodiment of the present application includes:

the first determining unit 40 is configured to obtain text information of each entity to be processed, and determine, based on the text information of each entity to be processed, a topic distribution probability that each entity to be processed corresponds to each set topic;

a second determining unit 41, configured to determine, based on the determined distribution probability of each topic, an association relationship between each entity to be processed and each topic to obtain a topic knowledge graph;

an obtaining unit 42, configured to obtain a stored knowledge graph between the entities to be processed, and obtain an extended knowledge graph between each entity to be processed and each topic based on the topic knowledge graph and the knowledge graph;

and a third determining unit 43, configured to determine, based on the determined extended knowledge graph, a knowledge graph vector of each entity to be processed, respectively.

Preferably, when determining, based on the determined distribution probability of each topic, an association relationship between each entity to be processed and each topic to obtain a topic knowledge graph, the second determining unit 41 is specifically configured to:

respectively aiming at the topic distribution probability of each entity to be processed corresponding to each topic, executing the following steps: when the topic distribution probability is determined to be higher than a preset distribution probability threshold value, associating the entity to be processed corresponding to the topic distribution probability with the topic;

Preferably, before obtaining the extended knowledge-graph between each entity to be processed and each topic based on the topic knowledge-graph and the knowledge-graph, the obtaining unit 42 is further configured to:

Preferably, when determining the knowledge-map vector of each entity to be processed based on the determined extended knowledge-map, the third determining unit 43 is specifically configured to:

Preferably, after determining the knowledge-map vector of each entity to be processed based on the determined extended knowledge-map, the third determining unit 43 is further configured to:

and respectively determining the similarity between every two entities to be processed based on the distance between the knowledge graph vectors of the entities to be processed.

In the method, the device, the terminal equipment and the medium for determining the knowledge graph vector, the text information of each entity to be processed is obtained, the topic distribution probability of each entity to be processed corresponding to each set topic is determined based on the text information of each entity to be processed, the topic knowledge graph is obtained based on the topic distribution probability, and the knowledge graph vector of each entity to be processed is determined based on the extended knowledge graph obtained by combining the knowledge graph and the topic knowledge graph. Therefore, the topic knowledge graph is obtained based on the topic distribution probability determined by the text information, the knowledge graph is expanded based on the topic knowledge graph, the text information and graph structure information are effectively fused, and the expression significance of knowledge graph vectors is enriched. For convenience of description, the above parts are described separately as modules (or units) according to functions. Of course, the functionality of the various modules (or units) may be implemented in the same one or more pieces of software or hardware when implementing the present application.

Based on the same technical concept, the present application further provides a terminal device 500, and referring to fig. 5, the terminal device 500 is configured to implement the methods described in the above various method embodiments, for example, implement the embodiment shown in fig. 2b, and the terminal device 500 may include a memory 501, a processor 502, an input unit 503, and a display panel 504.

The memory 501 is used for storing computer programs executed by the processor 502. The memory 501 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the terminal device 500, and the like. The processor 502 may be a Central Processing Unit (CPU), a digital processing unit, or the like. The input unit 503 may be used to obtain a user instruction input by a user. The display panel 504 is configured to display information input by a user or information provided to the user, and in this embodiment, the display panel 504 is mainly configured to display interfaces of application programs in the terminal device and control entities displayed in the display interfaces. Alternatively, the display panel 504 may be configured in the form of a Liquid Crystal Display (LCD) or an organic light-emitting diode (OLED), and the like.

The embodiment of the present application does not limit the specific connection medium among the memory 501, the processor 502, the input unit 503, and the display panel 504. In the embodiment of the present application, the memory 501, the processor 502, the input unit 503, and the display panel 504 are connected by the bus 505 in fig. 5, the bus 505 is represented by a thick line in fig. 5, and the connection manner between other components is merely illustrative and not limited thereto. The bus 505 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.

The memory 501 may be a volatile memory (volatile memory), such as a random-access memory (RAM); the memory 501 may also be a non-volatile memory (non-volatile memory) such as, but not limited to, a read-only memory (rom), a flash memory (flash memory), a Hard Disk Drive (HDD) or a solid-state drive (SSD), or any other medium which can be used to carry or store desired program code in the form of instructions or data structures and which can be accessed by a computer. The memory 501 may be a combination of the above memories.

The processor 502, for implementing the embodiment shown in fig. 2b, includes:

the processor 502 is configured to call the computer program stored in the memory 501 to execute the embodiment shown in fig. 2 b.

The embodiment of the present application further provides a computer-readable storage medium, which stores computer-executable instructions required to be executed by the processor, and includes a program required to be executed by the processor.

In some possible embodiments, the aspects of a method for determining a knowledge-graph vector provided by the present application may also be implemented in the form of a program product, which includes program code for causing a terminal device to perform the steps of a method for determining a knowledge-graph vector according to various exemplary embodiments of the present application described above in this specification when the program product runs on the terminal device. For example, the terminal device may perform the embodiment as shown in fig. 2 b.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A program product for determination of a knowledge-graph vector of embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including a physical oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device over any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., over the internet using an internet service provider).

It should be noted that although in the above detailed description several units or sub-units of the apparatus are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the units described above may be embodied in one unit, according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.

Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for determining a knowledge-graph vector, comprising:

obtaining a theme knowledge graph based on the determined incidence relation between each entity to be processed and the theme;

acquiring a stored knowledge graph between the entities to be processed, and acquiring an extended knowledge graph between the entities to be processed and the topics based on the topic knowledge graph and the knowledge graph;

and respectively determining the knowledge map vector of each entity to be processed based on the determined extended knowledge map.

2. The method of claim 1, prior to obtaining an extended knowledge-graph between each entity to be processed and each topic based on the topic knowledge-graph and the knowledge-graph, further comprising:

and updating the theme knowledge graph based on each entity to be processed establishing the association.

3. The method of claim 2, wherein after determining the knowledge-graph vector for each entity to be processed based on the determined extended knowledge-graph, respectively, further comprising:

4. The method of claim 1, wherein determining a knowledge-graph vector for each entity to be processed based on the determined extended knowledge-graph comprises:

for a node in the extended knowledge-graph, performing the steps of: respectively determining the random walk probability of the node jumping to each adjacent node in the extended knowledge graph based on preset control return parameters and depth parameters; the nodes comprise entities to be processed and subjects, the control return parameter is used for determining the random walk probability when one node returns to the previous node, and the depth parameter is used for determining the random walk probability when one node jumps to a node which is not adjacent to the previous node;

5. An apparatus for knowledge-graph vector determination, comprising:

a second determining unit, configured to perform the following steps for the topic distribution probability of each to-be-processed entity corresponding to each topic, respectively: when the distribution probability of the theme is determined to be higher than a preset distribution probability threshold value, associating the entity to be processed corresponding to the distribution probability of the theme with the theme;

the acquiring unit is used for acquiring the stored knowledge graph between the entities to be processed and acquiring the extended knowledge graph between the entities to be processed and the topics based on the topic knowledge graph and the knowledge graph;

and the third determining unit is used for respectively determining the knowledge map vector of each entity to be processed based on the determined extended knowledge map.

6. The apparatus of claim 5, wherein prior to obtaining an extended knowledge-graph between each entity to be processed and each topic based on the topic knowledge-graph and the knowledge-graph, the obtaining unit is further to:

7. The apparatus of claim 6, wherein after determining the knowledge-graph vector for each entity to be processed based on the determined extended knowledge-graph, the third determining unit is further configured to:

8. The apparatus according to claim 5, wherein, when determining the knowledge-graph vector of each entity to be processed based on the determined extended knowledge-graph, the third determining unit is specifically configured to:

9. A terminal device comprising at least one processing unit and at least one memory unit, wherein the memory unit stores a computer program that, when executed by the processing unit, causes the processing unit to perform the steps of the method of any of claims 1~4.

10. A computer-readable medium, characterized in that it stores a computer program executable by a terminal device, which program, when run on the terminal device, causes the terminal device to perform the steps of the method of any one of claims 1~4.