CN114461813A

CN114461813A - Data pushing method, system and storage medium based on knowledge graph

Info

Publication number: CN114461813A
Application number: CN202210049886.3A
Authority: CN
Inventors: 李美子; 卢淑怡; 许多; 张波
Original assignee: Shanghai Normal University
Current assignee: Shanghai Normal University
Priority date: 2022-01-17
Filing date: 2022-01-17
Publication date: 2022-05-10

Abstract

The invention relates to a data pushing method, a system and a storage medium based on a knowledge graph, wherein the method comprises the following steps: the method comprises the steps of obtaining a problem searched by a user, wherein the problem comprises at least two knowledge keywords, extracting the knowledge keywords in the problem, and obtaining a corresponding minimum knowledge sub-tree in a knowledge graph according to the knowledge keywords; establishing candidate knowledge subtrees according to all side information related to the knowledge keywords in the minimum knowledge subtrees, performing score sorting on all the candidate knowledge subtrees according to a scoring rule, and screening the candidate knowledge subtrees; and establishing an individualized model according to the candidate knowledge subtrees acquired each time in the historical retrieval of the user, and pushing the knowledge node with the highest function value of the individualized model. Compared with the prior art, the method and the device have the advantages of high pushing efficiency and the like.

Description

Data pushing method, system and storage medium based on knowledge graph

Technical Field

The invention relates to the field of knowledge graphs, in particular to a data pushing method, a data pushing system and a data pushing storage medium based on knowledge graphs.

Background

In the knowledge explosion era, higher education enters the stages of large knowledge quantity, strong subject crossing and high knowledge updating speed, and also puts higher requirements on professional knowledge crossing fusion. To grow into a college complex talent, college students need to have the ability to build a comprehensive knowledge system. In the face of the educational knowledge resources which are explosively increased in the current big data era, college students have the dilemma that professional cross knowledge is difficult to accurately acquire, and the dilemma also becomes the challenge which needs to be met by college cross discipline talent cultivation. However, the barriers set by the traditional subject specialties of colleges and universities still exist, and the teaching practice is in the states of independent development, few cross-subject comprehensive courses and the like. Students lack systematic understanding of knowledge points of the current or cross-professional courses, and the knowledge points lack systematic connection; the teaching contents of different courses are greatly different and lack of knowledge consistency; the theory course arrangement and the skill practice teaching can not be quickly connected, and the reasons cause the incoherence of the teaching, thereby causing the knowledge points learned by students to be in a scattered independent state in a macroscopic angle.

Due to the limitation of knowledge accumulation of students, the accuracy and the specialty are lacked, and the implicit association and the long-path association between knowledge points are difficult to discover. The existing pushing method aiming at knowledge mainly extracts the characteristics of knowledge data, fuses the knowledge data based on the characteristics, and pushes the fused result to a user. However, since all knowledge data need to be updated and processed in real time, the method cannot dynamically adjust in time, and when a student searches, the amount of processed data is large, the time for acquiring and pushing is long, the efficiency is low, and the method cannot meet the requirement that a huge university student group acquires knowledge instantly.

Disclosure of Invention

The present invention is directed to a method, system and storage medium for pushing data based on knowledge graph, which overcome the above-mentioned shortcomings of the prior art.

The purpose of the invention can be realized by the following technical scheme:

a data pushing method based on knowledge graph includes the following steps:

s1, obtaining a problem searched by a user, wherein the problem comprises at least two knowledge keywords, extracting the knowledge keywords in the problem, and obtaining a corresponding minimum knowledge sub-tree in a knowledge graph according to the knowledge keywords;

s2, establishing candidate knowledge subtrees according to all node information and side information related to the knowledge keywords in the minimum knowledge subtree, performing score sorting on all the candidate knowledge subtrees according to a scoring rule, and screening the candidate knowledge subtrees;

s3, establishing a personalized model according to the candidate knowledge subtree acquired each time in the user history retrieval, and pushing the knowledge node with the highest personalized model function value.

Further, the scoring rule specifically includes:

scoring the candidate knowledge subtrees according to the following expression:

wherein score (target) represents score, t represents root node of candidate knowledge sub-tree, α represents compactness, β represents precision, num (e)^k) And n each represents the number of candidate knowledge sub-tree nodes, num (r)^k) Representing the number of candidate knowledge sub-tree edges,

representing the shortest distance of each node to the root node.

Further, the method for establishing the personalized model comprises the following steps:

a1, obtaining the distance from each node to other nodes in the candidate knowledge subtree, summing the distances, and determining a central knowledge node according to the centrality;

a2, obtaining the central knowledge node of the candidate knowledge subtree of each retrieval according to the historical retrieval records of the user, and establishing a central knowledge point set;

a3, calculating the correlation between the central knowledge point of the central knowledge point set and the central knowledge point of the next retrieval;

a4, establishing a personalized model T (o) according to the relevance and the centrality of the central knowledge node_i+1) The expression is as follows:

T(o_i+1)＝αsim(o_i,o_i+1)+βcore(o_i+1)

wherein α represents compactness, β represents precision, sim (o)_i,o_i+1) Indicates the correlation, core (o)_i+1) Representing the centrality.

Further, the calculation expression of the centrality is as follows:

wherein e is_xRepresenting the current node, N (e)_x) Denotes e_xAdjacent node of l (e)_j,e_k) 0 denotes the node e_jAnd node e_kThere is no connecting edge between l (e)_j,e_k) 1 denotes a node e_jAnd node e_kWith connecting edges in between.

Further, the correlation sim (o)_k,o_k+1) The calculation expression of (a) is as follows:

where dist denotes the average path length, o_kRepresents the central knowledge point in the k-th search of the central knowledge point set, p (e)_i,e_i+1) Represents a relationship weight, sp_i,o_i+1) Represents o_iTo o_i+1The shortest path length between, num (r) represents the number of times the relation r appears in the path, e represents the knowledge point on the shortest path between two central knowledge points, and w represents the support.

Further, after the central knowledge node is obtained, the central knowledge node is fed back to the user.

Further, when the user has no history retrieval record, randomly pushing knowledge nodes from the minimum knowledge subtree.

Further, after step S2 is executed, the knowledge information corresponding to the screened candidate knowledge subtree k before scoring is transmitted to the user side.

A data push system based on a knowledge graph, comprising a memory and a processor; the memory for storing a computer program; the processor is used for realizing the data pushing method based on the knowledge graph when the computer program is executed.

A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a method of knowledge-graph based data push as described above.

Compared with the prior art, the invention has the following advantages:

1. the invention establishes the minimum knowledge subtree based on the knowledge graph, establishes the candidate knowledge subtree according to the side information of the subtree, namely the relation between knowledge points, and simultaneously establishes the personalized model by combining the historical retrieval records of the user to push the user. And the knowledge map has the characteristic of real-time updating, so that the user can be ensured to acquire real-time knowledge information.

2. According to the invention, the personalized model combines the relevance and the centrality of the knowledge nodes of the historical retrieval center, so that the personalized model can better reflect the retrieval requirements of real users, and the rationality of pushed contents is ensured.

Drawings

FIG. 1 is a schematic flow chart of the present invention.

FIG. 2 is a diagram illustrating a complete process of retrieving and recommending in accordance with the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.

The embodiment provides a data pushing method based on a knowledge graph, as shown in fig. 1, specifically including the following steps:

and step S1, obtaining the problem searched by the user, wherein the problem comprises at least two knowledge keywords, extracting the knowledge keywords in the problem, and obtaining the corresponding minimum knowledge subtree in the knowledge graph according to the knowledge keywords.

The knowledge graph is derived from teaching related contents such as an expert system, a domain database, a learning network, a course selection system, documents and the like, and specifically comprises nodes and edges, wherein the nodes store information about knowledge points, such as a matching analysis method, a conditional probability and the like, so that the nodes are one node, the edges store associated information among different knowledge points, such as a mixed factor is the evaluation of the matching analysis method, and the evaluation is edge information. And extracting a part with the knowledge key words in the knowledge graph as a minimum knowledge sub-tree according to the knowledge key words in the retrieval problem.

And step S2, establishing candidate knowledge subtrees according to all the node information and the side information related to the knowledge keywords in the minimum knowledge subtree, performing score sorting on all the candidate knowledge subtrees according to a scoring rule, and screening the candidate knowledge subtrees.

The process of establishing the candidate knowledge subtree is a process of associating the knowledge nodes with the edges, for example, when a "matching analysis method" of retrieving knowledge points is used, the candidate knowledge subtree which is possibly acquired is converted into complete knowledge point information including node information and edge information, for example, "conditional probability is a precursor knowledge point of the matching analysis method".

After all candidate knowledge subtrees are obtained, the candidate knowledge subtrees are scored according to the following expression:

representing the shortest distance of each node to the root node.

And finally, reserving the candidate knowledge subtree with the highest score.

And step S3, establishing a personalized model according to the candidate knowledge subtrees acquired each time in the user history retrieval, and pushing the knowledge node with the highest personalized model function value.

The pushing method is completed based on the history retrieval records of the user, the content which most meets the user interest is pushed to the user based on the idea of big data, if the user never retrieves the content, that is, has no history retrieval content, the knowledge node is randomly pushed in the minimum knowledge sub-tree acquired in step S1 as the pushed content.

Each historical retrieval record corresponds to a screened candidate knowledge subtree, and a personalized model is established according to the candidate knowledge subtrees in the following establishing process:

and A1, obtaining the distances from each node to other nodes in the candidate knowledge subtree, summing the distances, determining a central knowledge node according to the centrality, and feeding back to the user.

Wherein, the calculation expression of the centrality is as follows:

Step A2, according to the historical search record of the user, obtaining the central knowledge node of the candidate knowledge sub-tree of each search, and establishing a central knowledge point set O ═ { O ═₁,o₂,...,o_k}。

And A3, calculating the correlation between the central knowledge point of the central knowledge point set and the central knowledge point searched for the next time.

Where the correlation sim (o)_k,o_k+1) The calculation expression of (a) is as follows:

wherein o is_kRepresents the central knowledge point in the k-th search of the central knowledge point set, p (e)_i,e_i+1) Represents a relationship weight, sp_i,o_i+1) Represents o_iTo o_i+1The shortest path length between the two, num (r) represents the frequency of the relation r appearing in the path, e represents the knowledge point on the shortest path between the two central knowledge points, w represents the support degree, dist represents the average path length, and o is adopted_iTo o_i+1Length of shortest path between sp.len (o)_i,o_i+1) Wherein the side information score for determining the shortest path is defined artificially, for example, the scores of "predecessor" and "evaluation" are both 0.5, the other relations are all 0, and the average path length is specifically expressed as:

step A4, establishing a personalized model T (o) according to the relevance and the centrality of the central knowledge node_i+1) The expression is as follows:

T(o_i+1)＝αsim(o_i,o_i+1)+βcore(o_i+1)

the expression shows that the personalized model reflects the relevance and the centrality of the central knowledge points of the historical retrieval records including the current time each time, the central knowledge point with the highest personalized model function value is selected, and the central knowledge point is pushed to the user.

The usage flow of the user end is shown in fig. 2, which specifically includes the following steps:

first the user asks a question, for example, how there is a correlation between the "matching analysis" and the "naive bayes classifier".

Then, the keywords are positioned in the knowledge graph through the extraction of a knowledge keyword matching analysis method and a naive Bayes classifier, and candidate knowledge subtrees are established according to side information, wherein three knowledge trees can position the knowledge keywords.

The scores of the candidate knowledge sub-trees are calculated according to the method in step S2 above, the scores of the three candidate knowledge sub-trees being ranked as a < b < c.

Assuming that the candidate knowledge subtrees 2 before scoring are selected as subsequent push contents, the user will receive knowledge information and knowledge node information corresponding to the two candidate knowledge subtrees at the same time.

And selecting the candidate knowledge subtree b as the content for establishing the personalized model, determining a central knowledge point according to the centrality, establishing the personalized model by combining the correlation, and finally selecting the knowledge point with the highest function value from the personalized model to push, namely pushing the 'Bernoulli model'.

The embodiment also provides a data pushing system based on the knowledge graph, which comprises a memory and a processor; a memory for storing a computer program; the processor executes the data pushing method based on the knowledge graph.

The present embodiment further provides a computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implements the data pushing method based on knowledge graph as mentioned in the embodiments of the present invention, and any combination of one or more computer-readable media may be adopted. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims

1. A data pushing method based on knowledge graph is characterized by comprising the following steps:

2. The data pushing method based on the knowledge-graph as claimed in claim 1, wherein the scoring rules specifically include:

scoring the candidate knowledge subtrees according to the following expression:

representing the shortest distance of each node to the root node.

3. The data pushing method based on the knowledge graph according to claim 1, wherein the personalized model is established by the following method:

T(o_i+1)＝αsim(o_i,o_i+1)+βcore(o_i+1)

4. The knowledge-graph-based data pushing method according to claim 3, wherein the centrality is calculated by the following expression:

5. The knowledge-graph-based data pushing method according to claim 3, wherein the correlation sim (o) is_k,o_k+1) The calculation expression of (a) is as follows:

6. The data pushing method based on the knowledge-graph of claim 3, wherein after the central knowledge node is obtained, the central knowledge node is fed back to the user.

7. The knowledge-graph-based data pushing method according to claim 1, wherein when the user has no history retrieval record, the knowledge nodes are randomly pushed from the smallest knowledge sub-tree.

8. The method of claim 1, wherein after step S2 is executed, the knowledge information corresponding to the candidate k-before-score knowledge subtrees is transmitted to the user end.

9. A data push system based on knowledge graph is characterized by comprising a memory and a processor; the memory for storing a computer program; the processor, when executing the computer program, is configured to implement the following method:

s1, obtaining the problem searched by the user, extracting the knowledge key words in the problem, and obtaining the corresponding minimum knowledge sub-tree in the knowledge map according to the knowledge key words;

s2, establishing candidate knowledge subtrees according to all the side information related to the knowledge keywords in the minimum knowledge subtree, performing score sorting on all the candidate knowledge subtrees according to a scoring rule, and screening the candidate knowledge subtrees;

10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of a method for knowledge-graph based data push according to any one of claims 1 to 8.