CN114861052A - News importance calculation method based on industrial knowledge graph - Google Patents

News importance calculation method based on industrial knowledge graph Download PDF

Info

Publication number
CN114861052A
CN114861052A CN202210468961.XA CN202210468961A CN114861052A CN 114861052 A CN114861052 A CN 114861052A CN 202210468961 A CN202210468961 A CN 202210468961A CN 114861052 A CN114861052 A CN 114861052A
Authority
CN
China
Prior art keywords
news
node
entity
news entity
entity node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210468961.XA
Other languages
Chinese (zh)
Inventor
张伟文
陈星宇
叶海明
程良伦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202210468961.XA priority Critical patent/CN114861052A/en
Publication of CN114861052A publication Critical patent/CN114861052A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a news importance calculation method based on an industrial knowledge graph, which relates to the technical field of industrial news importance calculation, solves the problem that the news importance cannot be calculated under the condition that a current news importance acquisition method lacks user data information, defines the industrial field of the news importance to be calculated, determines the relation between a news entity and the news entity, and constructs the industrial knowledge graph based on the relation between the news entity and the news entity; news entity node betweenness, news entity node degree, news entity node compactness and Page Rank algorithm indexes are introduced to evaluate an industrial knowledge graph; calculating the obtained indexes by adopting an entropy weight method so as to obtain the weight of each index; the weights of all the indexes are calculated to obtain news importance scores, industrial news with high importance scores are recommended to users, and the quality and accuracy of news recommendation are improved.

Description

News importance calculation method based on industrial knowledge graph
Technical Field
The invention relates to the technical field of industrial news importance calculation, in particular to a news importance calculation method based on an industrial knowledge graph.
Background
The knowledge graph technology is an important component of an artificial intelligence technology, and a knowledge base with semantic processing capability and open interconnection capability is established, so that application value can be generated in news recommendation. With the rapid development of information technology, the growth trend of information on the network is explosive, a user can search information content required by the user through a search engine, the search engine searches web pages similar to the requirements of the user on the network through the input content of the user and returns the web pages with certain similarity value with the input content of the user to the user, the search engine can also determine the sequence of returned information lists by combining the click rate, the reprint rate, the browsing time and other factors of all online users on the web pages, and thus a recommendation system is created.
News recommendation is usually to model a user by analyzing reading interest preferences of the user to help the user to efficiently acquire news needed by the user, the content of the current network news is complicated, a plurality of users cannot acquire real, important and timeliness news, and in order to solve the problems, the prior art discloses a method for acquiring the importance degree of news, wherein M news aiming at the same event are acquired, the respective corresponding similarity of N news groups consisting of the M news is acquired, the initial importance degree and the information source authority score of each news are acquired according to the respective corresponding similarity of the N news groups, the final importance degree of each news is evaluated by integrating the two, the importance degree of each news can be acquired from a plurality of news, but if industrial news data is rare, the method cannot acquire the importance degree of each news from the rare industrial news data, therefore, the collected user data cannot be used for calculating the news importance, that is, the news importance cannot be calculated under the condition of lacking user data information.
Disclosure of Invention
In order to solve the problem that the importance of news cannot be calculated under the condition that the current news importance acquisition method lacks user data information, the invention provides the news importance calculation method based on the industrial knowledge map, so that the news importance is calculated under the condition that user information is scarce, and the quality and the accuracy of news recommendation are improved.
In order to achieve the technical effects, the technical scheme of the invention is as follows:
a news importance calculation method based on an industrial knowledge graph comprises the following steps:
s1, defining an industrial field of news importance to be calculated, determining a news entity and a news entity relation, and constructing an industrial knowledge map of the industrial field based on the news entity and the news entity relation;
s2, according to the established industrial knowledge graph, news entity node betweenness, news entity node degree, news entity node compactness and Page Rank algorithm indexes are introduced to evaluate the industrial knowledge graph;
s3, calculating the obtained indexes by using the indexes of news entity nodes, news entity node degrees, news entity node compactness and Page Rank algorithm by using an entropy weight method to obtain the weight of each index;
and S4, calculating the weight of each index to obtain a news importance score, and recommending the industrial news with high importance score to the user.
In the technical scheme, news relates to a plurality of industries, and an industrial field, namely the industrial field of the importance of the news to be calculated, is required to be determined, the relation between a news entity and the news entity is determined, and an industrial knowledge map of the industrial field is constructed based on the relation between the news entity and the news entity; news entity node betweenness, news entity node degrees, news entity node tightness and Page Rank algorithm indexes are evaluated on the established industrial knowledge graph, the news entity node betweenness measures the number of shortest paths passing through the node in the industrial knowledge graph, the news entity node degrees reflect news entity relations, the news entity node tightness is used for measuring the capacity of the news node for exerting influence on other news nodes through the industrial knowledge graph, the Page Rank algorithm can accurately position the importance degree of the news node in the industrial knowledge graph according to the matching degree of user query, and therefore the news entity node betweenness, the news entity node degrees, the news entity node tightness and the Page Rank algorithm indexes can be respectively used as one of reference indexes of the importance degree in the industrial knowledge graph; the entropy weight method is used for calculating all indexes to obtain news importance scores, and industrial news with high importance scores is recommended to users, so that the method for obtaining the news importance is achieved, and the quality and accuracy of news recommendation are improved.
Preferably, in step S1, the news entity refers to an entity having practical meaning in the news content in the industry field, the relation of the news entity refers to a relation associated with the news entity, and the specific steps of the construction of the industry knowledge graph are as follows:
s11, acquiring news entity data and news entity relation data of the industrial field with the news importance to be calculated from the open news link data;
s12, carrying out knowledge fusion on the news entity data and the news entity relation data acquired in the step S11 to realize entity alignment and entity disambiguation;
s13, storing the news entity data and the news entity relation data fused in the step S12 into a relation database to form a knowledge base;
and S14, converting the news entity data and the news entity relation data in the knowledge base in the step S13 into link data in the knowledge map, so as to construct an industrial knowledge map.
The method comprises the steps that news importance is calculated based on data in an industrial knowledge map, news entity data and news entity relation data are obtained from open news link data, knowledge fusion is conducted on the obtained news entity data and the obtained news entity relation data, and as the same news entity has different names in different industrial data sources, based on the fact that the same news entity has the same characteristics in characteristics, entity alignment operation is conducted on the same news entity name by adopting a characteristic matching method, unique names are unified, and identification is conducted; for the condition that a news entity name in the industrial field of which the news importance degree is to be calculated points to a plurality of news entities, carrying out entity disambiguation operation on the news entity name to eliminate the phenomenon of polysemy; storing the fused news entity data and the news entity relation data into a relation database to form a knowledge base; and finally, converting the data in the knowledge base into link data in the knowledge graph, thereby constructing the industrial knowledge graph.
Preferably, in step S2, a node in the industry knowledge graph represents a news entity, an edge connected to a news entity node represents a news entity relationship, a path is formed between the node and the edge-node, a certain news entity node j in the industry knowledge graph is passed by several other shortest paths, which represents that the news entity node is important in the industry knowledge graph, and the importance or influence is represented by the betweenness B of the news entity nodes, and the expression is:
B i =∑ j≠l≠i [N jl (i)/N jl ],
wherein i, j, l respectively represent non-adjacent news entity nodes, B i Representing betweenness, N, of news entity nodes i jl Representing the number of shortest paths between a news entity node j and a news entity node l; n is a radical of jl (i) Indicating the number of pieces of shortest path between the news entity node j and the news entity node l that pass through the news entity node i.
Preferably, in step S2, the degrees of the news entity nodes include the out degree and the in degree of the news entity nodes:
(1) the out degree of the news entity node represents the number of edges connected from the news entity node i, and the expression is as follows:
Figure BDA0003625749080000031
wherein the content of the first and second substances,
Figure BDA0003625749080000032
representing the degree of departure of a news entity node, N i A neighbor set representing a news entity node i, news entity a ij Representing the number of edges directly connected between the news entity node i and the news entity node j;
(2) the in-degree of the news entity node indicates the number of connecting edges pointing to the news entity node i, and the expression is as follows:
Figure BDA0003625749080000041
wherein the content of the first and second substances,
Figure BDA0003625749080000042
representing the degree of entry of a News entity node, a ji Representing the number of edges directly connected between the news entity node j and the news entity node i;
(3) the total number of news entity nodes is defined as
Figure BDA0003625749080000043
Wherein k is i Representing the total degree of the news entity node i; the larger the degree of the news entity node is, the important the news entity of the node is.
Preferably, in step S2, the expression of the closeness of the news entity node is:
Figure BDA0003625749080000044
wherein, C c (i) Representing news entity node i closeness, d ij The shortest distance from the news node i to the news node j is shown, and N represents the number of nodes in the network.
Preferably, the news entity node closeness can also be calculated by adopting a kernel function, and the kernel function formula is represented as:
Figure BDA0003625749080000045
wherein U (i) represents the closeness of the news entity node i, d ij Represents the shortest distance from a news entity node i to a news entity node j, and p represents a news entity node v i Non-shortest distance routes to the remaining nodes, L (p) represents the length of these non-shortest routes, and h represents the width of the kernel function.
Preferably, in step S2, the procedure and rule of the page rank algorithm are as follows:
s21, setting initial PageRank values PR of all news entity nodes i (0) 1,2, …, N, satisfying:
Figure BDA0003625749080000046
s22, randomly walking k steps on the industrial knowledge graph, distributing the PR value of each news entity node in the k-1 step to the pointed news entity node, wherein each step is equivalent to one iteration, setting a scale constant s ∈ (0,1), calculating the PR value of each news entity node by using a PageRank correction rule, multiplying the PR value of the obtained news entity node by the scale constant s for reduction, then distributing the value of 1 to the reduced PR value, keeping the total PR value of the knowledge graph to be 1, namely the calculation formula of the PR value of the news entity node is as follows:
Figure BDA0003625749080000047
wherein, PR i (k) PR value, a, representing all news entity nodes ij Representing the number of edges directly connected between the news entity node i and the news entity node j; judging whether the news entity node i points to the link news entity node j, if so, a ij 1 is ═ 1; otherwise, a ij =0。
Preferably, once the random walk reaches a certain news entity node with the out-degree of 0, the node stays in the news entity node forever and cannot walk out, the node is set as a hanging node, if the random walk starts from a certain node, whether the news entity node is a hanging node or not, any news entity node in the industrial knowledge graph is allowed to be selected randomly with the probability of 1-, and the selected news entity node is used as a next target node.
Here, the existence of the hanging node may disable the PageRank algorithm, and if some sub-graphs without edges indicated exist in the industrial knowledge graph, the nodes in the sub-graphs may "suck up" all PR values in the network, thereby also disabling the PageRank algorithm, when the random walk starts, a certain node needs to be started, no matter whether the news entity node is a hanging node, any news entity node in the industrial knowledge graph is allowed to be randomly selected with a probability of 1, and the selected news entity node is taken as a target node of the next step, thereby walking out the hanging node.
Preferably, in step S3, the entropy weight calculation process includes:
s31, all news entity nodes extracted from the industrial knowledge graph are used as samples to form a sample library, and for n samples and 4 indexes in the sample library, the 4 indexes are respectively: news entity node betweenness, news entity node degree, news entity node compactness and Page Rank algorithm indexes, and setting the value of the jth index of the ith sample as x ij Wherein i is 1, …, n; j is 1, …, 4;
s32, normalizing the indexes in the step S31, wherein positive/negative indexes refer to indexes which are in positive/negative correlation with the news entity relationship, and different algorithms are adopted for the positive/negative indexes to perform data standardization:
the forward direction index is as follows:
Figure BDA0003625749080000051
negative direction index:
Figure BDA0003625749080000052
normalized data x' ij Is still marked as x ij
S33, calculating the proportion p of the ith sample value in the j index ij
Figure BDA0003625749080000053
S34, calculating the entropy e of the j index j
Figure BDA0003625749080000054
Wherein k is 1/ln (n)>0, satisfies e j ≥0;
S35, calculating the information entropy redundancy d j
d j =1-e j ,j=1,…,4,
S36, calculating the weight w of each index j
Figure BDA0003625749080000061
Preferably, in step S4, the specific process of obtaining the news importance score is as follows:
s41, based on an entropy weight method, obtaining the number of news entity nodes, the degree of the news entity nodes, the compactness of the news entity nodes and the weight of Page Rank algorithm indexes, and obtaining the comprehensive scores of all samples and the comprehensive scores s of all samples i The expression of (a) is:
Figure BDA0003625749080000062
s42, averaging the comprehensive scores to obtain news importance scores;
s43, sorting the news importance scores in a descending order, and recommending the news with high importance scores to the user after sorting.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the invention provides a news importance calculation method based on an industrial knowledge graph, because news relates to a plurality of industries, the industrial field is required to be determined, namely the industrial field of the news importance to be calculated is determined, the relation between a news entity and the news entity is determined, and the industrial knowledge graph of the industrial field is constructed based on the relation between the news entity and the news entity; the method comprises the steps that news entity node betweenness, news entity node degrees, news entity node compactness and Page Rank algorithm indexes are evaluated on a constructed industrial knowledge graph, the news entity node betweenness measures the number of shortest paths passing through the node in the industrial knowledge graph, the news entity node degrees reflect news entity relations, the news entity node compactness is used for measuring the capacity of the news node for influencing other news nodes through the industrial knowledge graph, and the Page Rank algorithm can accurately position the importance degree of the news node in the industrial knowledge graph according to the matching degree of user query; the entropy weight method is used for calculating all indexes to obtain news importance scores, and the industrial news with high importance scores is recommended to users, so that the requirement of industrial news pushing is met, and the quality and the accuracy of news recommendation are improved.
Drawings
Fig. 1 is a flowchart illustrating steps of a news importance calculation method based on an industry knowledge graph according to embodiment 1 of the present invention;
fig. 2 is a diagram showing a ranking of news importance scores proposed in embodiment 1 of the present invention;
FIG. 3 shows a flow chart for building an industry knowledge graph as set forth in example 2 of the present invention;
fig. 4 is a diagram showing a triplet format provided in embodiment 2 of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
for better illustration of the embodiment, some parts in the drawings may be omitted, enlarged or reduced, and do not represent actual dimensions, and the description of directions of the parts such as "up" and "down" is not limited to the patent;
it will be understood by those skilled in the art that certain well-known descriptions of the figures may be omitted;
the positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;
the technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
As shown in fig. 1, a news importance calculating method based on an industry knowledge graph includes the following steps:
s1, as news relates to a plurality of industries, the industry field is required to be clarified, namely the industry field of the importance of the news to be calculated is clarified, the industry field is selected to be the marine industry field in the embodiment, the marine industry comprises marine power industry, marine organism medical industry, marine transportation industry, marine ship industry and the like, marine resources in China are rich, the marine industry has development potential and toughness, and the news entity relationship of the marine industry are determined, wherein the news entity refers to an entity with practical significance in marine industry news content, specifically to a person, an event and an object, and also can be a concept, the news entity relationship refers to a relationship associated with the news entity, for example, in a sample example that a marine ship company A needs to pay 500 thousands of money to a marine ship company B at one time, the news entity refers to a marine ship company A and a marine ship company B, and the news entity relationship refers to pay money, and constructing a knowledge graph of the marine industry based on the news entities and the news entity relations, wherein the news entities and the relations jointly form nodes and edges of the knowledge graph of the marine industry.
S2, according to the established industrial knowledge graph, news entity node betweenness, news entity node degree, news entity node compactness and Page Rank algorithm indexes are introduced to evaluate the industrial knowledge graph; the news entity node betweenness measures the number of the shortest paths passing through the node in the industrial knowledge graph, the news entity node degree reflects the news entity relationship, the news entity node compactness is used for measuring the ability of the news node to influence other news nodes through the industrial knowledge graph, and the Page Rank algorithm can accurately position the importance degree of the news node in the industrial knowledge graph according to the matching degree of user query, so that the news entity node betweenness, the news entity node degree, the news entity node compactness and the Page Rank algorithm index can be respectively used as one of the reference indexes of the importance degree in the marine industrial knowledge graph.
In step S2, a node in the marine industry knowledge graph represents a news entity, an edge connected to a news entity node represents a news entity relationship, a path is formed between the node and the edge-node, a certain news entity node j in the marine industry knowledge graph is passed by a plurality of other shortest paths, which represents that the news entity node is important in the marine industry knowledge graph, and the importance or influence is represented by the betweenness B of the news entity nodes, and the expression is:
B i =∑ j≠l≠i [N jl (i)/N jl ],
wherein i, j, l respectively represent non-adjacent news entity nodes, B i Representing betweenness, N, of news entity nodes i jl Representing the number of shortest paths between a news entity node j and a news entity node l; n is a radical of jl (i) Indicating the number of pieces of shortest path between the news entity node j and the news entity node l that pass through the news entity node i.
The degree of the news entity node comprises the degree of going out and degree of going in of the news entity node:
(1) the out degree of the news entity node represents the number of edges connected from the news entity node i, and the expression is as follows:
Figure BDA0003625749080000081
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003625749080000082
representing the out-degree of a news entity node, N i A neighbor set representing a news entity node i, news entity a ij Representing the number of edges directly connected between the news entity node i and the news entity node j;
(2) the in-degree of the news entity node indicates the number of connecting edges pointing to the news entity node i, and the expression is as follows:
Figure BDA0003625749080000083
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003625749080000084
representing the degree of entry of a News entity node, a ji Representing a direct connection between a news entity node j and a news entity node iThe number of edges;
(3) the total number of news entity nodes is defined as
Figure BDA0003625749080000085
Wherein k is i Representing the total number of degrees of news entity node i.
The expression for solving the node closeness of the news entity is as follows:
Figure BDA0003625749080000086
wherein, C c (i) Representing the news entity node i compactness, d ij Represents the shortest distance from the news entity node i to the news entity node j, and N represents the number of nodes in the network.
The news entity node compactness can also be calculated by adopting a kernel function, and the kernel function formula is expressed as follows:
Figure BDA0003625749080000091
wherein U (i) represents the closeness of the news entity node i, d ij Represents the shortest distance from a news entity node i to a news entity node j, and p represents a news entity node v i And (3) non-shortest distance routes to other nodes, L (p) represents the lengths of the non-shortest routes, h represents the width of a kernel function, and the compactness accuracy of the news entity node calculated by adopting the kernel function is high.
The process and the rule of the page rank algorithm are as follows:
s21, setting initial PageRank values PR of all news entity nodes i (0) 1,2, …, N, satisfying:
Figure BDA0003625749080000092
s22, randomly walking k steps on the industrial knowledge graph, distributing the PR value of each news entity node in the k-1 step to the pointed news entity node, wherein each step is equivalent to one iteration, setting a scale constant s ∈ (0,1), calculating the PR value of each news entity node by using a PageRank correction rule, multiplying the PR value of the obtained news entity node by the scale constant s for reduction, then distributing the value of 1 to the reduced PR value, keeping the total PR value of the knowledge graph to be 1, namely the calculation formula of the PR value of the news entity node is as follows:
Figure BDA0003625749080000093
wherein, PR i (k) PR value, a, representing all news entity nodes ij Representing the number of edges directly connected between the news entity node i and the news entity node j; judging whether the news entity node i points to the link news entity node j, if so, a ij 1 is ═ 1; otherwise, a ij =0。
Once the random walk reaches a news entity node with the out-degree of 0, the node stays in the news entity node forever and cannot walk out, the node is set as a hanging node, the PageRank algorithm is disabled due to the existence of the hanging node, and if some sub-graphs without edges are existed in the industrial knowledge graph, all PR values in the network are possibly absorbed by the nodes in the sub-graphs, so that the PageRank algorithm is disabled, when the random walk starts, the node needs to start from a certain node, whether the news entity node is a hanging node or not, any news entity node in the industrial knowledge graph is allowed to be randomly selected with the probability of 1, and the selected news entity node is used as a target node of the next step, so that the hanging node is walked out.
S3, calculating the obtained news entity node betweenness, the degrees of the news entity nodes, the closeness of the news entity nodes and the indexes of a Page Rank algorithm by adopting an entropy weight method, so as to obtain the weight of each index; for a certain index, the dispersion degree of the certain index can be judged by using an entropy value, the smaller the information entropy value is, the larger the dispersion degree of the index is, the larger the influence of the index on comprehensive evaluation is, if the values of the certain index are all equal, the index does not play a role in the comprehensive evaluation, and based on the judgment rule, the entropy weight method is adopted to calculate the indexes of news entity node numbers, news entity node degrees, news entity node tightness and Page Rank algorithm to obtain the weight of each index, so that the dispersion degree of each index is judged.
And S4, calculating the weight of each index to obtain a news importance score, and recommending the industrial news with high importance score to the user.
In step S4, the specific acquisition process of the news importance score is as follows:
s41, based on an entropy weight method, obtaining the number of news entity nodes, the degree of the news entity nodes, the compactness of the news entity nodes and the weight of Page Rank algorithm indexes, and obtaining the comprehensive scores of all samples and the comprehensive scores s of all samples i The expression of (a) is:
Figure BDA0003625749080000101
s42, averaging the comprehensive scores to obtain news importance scores;
s43, sorting the news importance scores in a descending order, and recommending the top ten news sorted by the importance scores to the user based on the process and the picture 2 after sorting.
Example 2
Fig. 3 is a diagram of steps for constructing an industrial knowledge graph, and taking the construction of a marine industry knowledge graph as an example in this embodiment, referring to fig. 3, specific steps for constructing a marine industry knowledge graph are as follows:
s11, obtaining news entity data and news entity relation data in the marine industry field from open news link data, wherein the open news link data refer to public marine industry data, and the public marine industry data can be reported from a marine data center, government department data, national industry development reports and enterprises in various marine industry fields;
s12, carrying out knowledge fusion on the news entity data and the news entity relation data in the marine industry field acquired in the step S11 to realize entity alignment, wherein the same news entity has different names in different public marine industry data sources, and based on the characteristic that the same news entity has the same characteristics, the same news entity name is subjected to entity alignment operation by adopting a characteristic matching method, unique names are unified, and identification is carried out; the entity disambiguation is realized, and for the condition that one news entity name in the marine industry field points to a plurality of news entities, the entity disambiguation operation is carried out on the news entity name to eliminate the phenomenon of word ambiguity;
s13, storing the news entity data and the news entity relation data fused in the step S12 into a relation database to form a knowledge base;
and S14, converting the news entity data and the news entity relation data in the knowledge base in the step S13 into link data in a knowledge graph, abstracting the news entities into news entity nodes, abstracting the relation between the news entities into edges, converting the fields related to the news entity tables and the news entity relation tables in the knowledge base into a Resource Description Framework (RDF) format, referring to fig. 3, processing the triple data containing < news entities, news entity relations and news entity >, and converting the processed triple data into the link data in the knowledge graph, thereby constructing the knowledge graph of the marine industry.
Example 3
Referring to fig. 1, in step S3, the entropy weight calculation process includes:
s31, all news entity nodes obtained from the industrial knowledge graph are used as samples to form a sample library, and for n samples and 4 indexes in the sample library, the 4 indexes are respectively as follows: news entity node betweenness, news entity node degree, news entity node compactness and Page Rank algorithm indexes, and setting the value of the jth index of the ith sample as x ij Wherein i is 1, …, n; j is 1, …, 4;
s32, normalizing the indexes in the step S31, wherein positive/negative indexes refer to indexes which are in positive/negative correlation with the news entity relationship, and different algorithms are adopted for the positive/negative indexes to perform data standardization:
the forward direction index is as follows:
Figure BDA0003625749080000111
negative direction index:
Figure BDA0003625749080000112
normalized data x' ij Is still marked as x ij
S33, calculating the proportion p of the ith sample value in the j index ij
Figure BDA0003625749080000113
S34, calculating the entropy e of the j index j
Figure BDA0003625749080000114
Wherein k is 1/ln (n)>0, satisfies e j ≥0;
S35, calculating the information entropy redundancy d j
d j =1-e j ,j=1,…,4,
S36, calculating the weight w of each index j
Figure BDA0003625749080000121
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A news importance calculation method based on an industrial knowledge graph is characterized by comprising the following steps:
s1, defining an industrial field of news importance to be calculated, determining a news entity and a news entity relation, and constructing an industrial knowledge map of the industrial field based on the news entity and the news entity relation;
s2, according to the established industrial knowledge graph, news entity node betweenness, news entity node degree, news entity node compactness and Page Rank algorithm indexes are introduced to evaluate the industrial knowledge graph;
s3, calculating the obtained news entity node betweenness, the news entity node degree, the news entity node compactness and the Page Rank algorithm indexes by adopting an entropy weight method so as to obtain the weight of each index;
and S4, calculating the weight of each index to obtain a news importance score, and recommending the industrial news with high importance score to the user.
2. The method for calculating news importance based on industry knowledge graph of claim 1, wherein in step S1, the news entity refers to an entity having practical meaning in news content in industry field, the relation of the news entity refers to the relation associated with the news entity, and the industry knowledge graph is constructed by the following specific steps:
s11, acquiring news entity data and news entity relation data of the industrial field with the news importance to be calculated from the open news link data;
s12, carrying out knowledge fusion on the news entity data and the news entity relation data acquired in the step S11 to realize entity alignment and entity disambiguation;
s13, storing the news entity data and the news entity relation data fused in the step S12 into a relation database to form a knowledge base;
and S14, converting the news entity data and the news entity relation data in the knowledge base in the step S13 into link data in the knowledge map, so as to construct an industrial knowledge map.
3. The method for calculating news importance based on industry knowledge-graph of claim 1, wherein in step S2, the nodes in the industry knowledge-graph represent news entities, the edges connected to the news entity nodes represent news entity relationships, a path is formed between the nodes-edge-node, and a certain news entity node j in the industry knowledge-graph is passed by several other shortest paths, which represents that the news entity node is important in the industry knowledge-graph, and the importance or influence is represented by the betweenness B of the news entity node, and the expression is:
B i =∑ j≠l≠i [N jl (i)/N jl ]wherein i, j, l respectively represent non-adjacent news entity nodes, B i Representing betweenness, N, of news entity nodes i jl Representing the number of shortest paths between a news entity node j and a news entity node l; n is a radical of jl (i) Indicating the number of pieces of shortest path between the news entity node j and the news entity node l that pass through the news entity node i.
4. The method for calculating news importance based on industrial knowledge base, as claimed in claim 1, wherein in step S2, the degrees of the news entity nodes include out-degree and in-degree of the news entity nodes:
(1) the out degree of the news entity node represents the number of edges connected from the news entity node i, and the expression is as follows:
Figure FDA0003625749070000021
wherein the content of the first and second substances,
Figure FDA0003625749070000022
representing the degree of departure of a news entity node, N i A neighbor set representing a news entity node i, news entity a ij Representing edges directly connected between a news entity node i and a news entity node jCounting;
(2) the in-degree of the news entity node indicates the number of connecting edges pointing to the news entity node i, and the expression is as follows:
Figure FDA0003625749070000023
wherein the content of the first and second substances,
Figure FDA0003625749070000024
representing the degree of entry of a News entity node, a ji Representing the number of edges directly connected between the news entity node j and the news entity node i;
(3) the total number of news entity nodes is defined as:
Figure FDA0003625749070000025
wherein k is i Representing the total degree of news entity node i.
5. The method for calculating news importance based on industry knowledge-graph as claimed in claim 1, wherein in step S2, the expression of the closeness of news entity node is:
Figure FDA0003625749070000026
wherein, C c (i) Representing news entity node i closeness, d ij Representing the shortest distance from news entity node i to news entity node j, and N represents the number of nodes in the network.
6. The method for calculating news importance based on the industrial knowledge graph according to claim 1, wherein the news entity node closeness can be calculated by adopting a kernel function, and the kernel function formula is represented as:
Figure FDA0003625749070000027
wherein U (i) represents the closeness of the news entity node i, d ij Represents the shortest distance from a news entity node i to a news entity node j, and p represents a news entity node v i Non-shortest distance routes to the remaining nodes, L (p) represents the length of these non-shortest routes, and h represents the width of the kernel function.
7. The method for calculating news importance based on industry knowledge graph according to claim 1, wherein in step S2, the process and rules of the page rank algorithm are as follows:
s21, setting initial PageRank values PR of all news entity nodes i (0) 1,2, N, satisfying:
Figure FDA0003625749070000031
s22, randomly walking k steps on the industrial knowledge graph, distributing the PR value of each news entity node in the k-1 step to the pointed news entity node, wherein each step is equivalent to one iteration, setting a scale constant s ∈ (0,1), calculating the PR value of each news entity node by using a PageRank correction rule, multiplying the PR value of the obtained news entity node by a scale constant s for reduction, and then distributing the value of 1-s to the reduced PR value to keep the total PR value of the knowledge graph as 1, namely the calculation formula of the PR value of the news entity node is as follows:
Figure FDA0003625749070000032
wherein, PR i (k) PR value, a, representing all news entity nodes ij Representing the number of edges directly connected between the news entity node i and the news entity node j; judging whether the news entity node i points to the link news entity node j, if so, a ij 1 is ═ 1; otherwise, a ij =0。
8. The method of claim 7, wherein once the random walk reaches a news entity node with out-degree of 0, the random walk stays in the news entity node for a long time and cannot go out any more, the node is set as a hanging node, and if the random walk starts from a node, whether the news entity node is a hanging node or not, the random walk allows any news entity node in the industrial knowledge graph to be randomly selected with a probability of 1-s, and the selected news entity node is used as a target node of the next step.
9. The method for calculating news importance based on industry knowledge graph according to any one of claims 1 to 8, wherein in step S3, the entropy weight calculation process includes:
s31, all news entity nodes extracted from the industrial knowledge graph are used as samples to form a sample library, and for n samples and 4 indexes in the sample library, the 4 indexes are respectively: news entity node betweenness, news entity node degree, news entity node compactness and Page Rank algorithm indexes, and setting the value of the jth index of the ith sample as x ij Wherein i is 1.·, n; j ═ 1,. 4;
s32, normalizing the indexes in the step S31, wherein positive/negative indexes refer to indexes which are in positive/negative correlation with the news entity relationship, and different algorithms are adopted for the positive/negative indexes to perform data standardization:
the forward direction index is as follows:
Figure FDA0003625749070000033
negative direction index:
Figure FDA0003625749070000041
normalized data x' ij Is still marked as x ij
S33, calculating the proportion p of the ith sample value in the j index ij
Figure FDA0003625749070000042
S34, calculating the entropy e of the j index j
Figure FDA0003625749070000043
Wherein k 1/ln (n) > 0, satisfies e j ≥0;
S35, calculating the information entropy redundancy d j
d j =1-e j ,j=1,...,4,
S36, calculating the weight w of each index j
Figure FDA0003625749070000044
10. The method for calculating news importance based on industry knowledge graph of claim 9, wherein in step S4, the news importance score is obtained as follows:
s41, based on an entropy weight method, obtaining the number of news entity nodes, the degree of the news entity nodes, the compactness of the news entity nodes and the weight of Page Rank algorithm indexes, and obtaining the comprehensive scores of all samples and the comprehensive scores s of all samples i The expression of (a) is:
Figure FDA0003625749070000045
s42, averaging the comprehensive scores to obtain news importance scores;
s43, sorting the news importance scores in a descending order, and recommending the news with high importance scores to the user after sorting.
CN202210468961.XA 2022-04-29 2022-04-29 News importance calculation method based on industrial knowledge graph Pending CN114861052A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210468961.XA CN114861052A (en) 2022-04-29 2022-04-29 News importance calculation method based on industrial knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210468961.XA CN114861052A (en) 2022-04-29 2022-04-29 News importance calculation method based on industrial knowledge graph

Publications (1)

Publication Number Publication Date
CN114861052A true CN114861052A (en) 2022-08-05

Family

ID=82636089

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210468961.XA Pending CN114861052A (en) 2022-04-29 2022-04-29 News importance calculation method based on industrial knowledge graph

Country Status (1)

Country Link
CN (1) CN114861052A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117332856A (en) * 2023-11-03 2024-01-02 安徽国麒科技有限公司 Battery knowledge map abstract generation method based on sampling sub-graph strategy

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117332856A (en) * 2023-11-03 2024-01-02 安徽国麒科技有限公司 Battery knowledge map abstract generation method based on sampling sub-graph strategy
CN117332856B (en) * 2023-11-03 2024-02-23 安徽国麒科技有限公司 Battery knowledge map abstract generation method based on sampling sub-graph strategy

Similar Documents

Publication Publication Date Title
CN111783419B (en) Address similarity calculation method, device, equipment and storage medium
RU2671047C2 (en) Search tables understanding
US7636713B2 (en) Using activation paths to cluster proximity query results
US8185536B2 (en) Rank-order service providers based on desired service properties
US20080147578A1 (en) System for prioritizing search results retrieved in response to a computerized search query
CN107577782B (en) Figure similarity depicting method based on heterogeneous data
Cui et al. Dual implicit mining-based latent friend recommendation
Cohen et al. Scalable similarity estimation in social networks: Closeness, node labels, and random edge lengths
CN112364151B (en) Thesis mixed recommendation method based on graph, quotation and content
Emamgholizadeh et al. A framework for quantifying controversy of social network debates using attributed networks: biased random walk (BRW)
JP2009110508A (en) Method and system for calculating competitiveness metric between objects
CN110609889A (en) Method and system for determining importance ranking of objects and selecting review experts based on academic network
Zheng et al. Correlation coefficients of interval-valued pythagorean hesitant fuzzy sets and their applications
CN114861052A (en) News importance calculation method based on industrial knowledge graph
CN112784049B (en) Text data-oriented online social platform multi-element knowledge acquisition method
CN114511085A (en) Entity attribute value identification method, apparatus, device, medium, and program product
CN111930949B (en) Search string processing method and device, computer readable medium and electronic equipment
CN113535949A (en) Multi-mode combined event detection method based on pictures and sentences
CN110543601B (en) Method and system for recommending context-aware interest points based on intelligent set
CN111079035B (en) Domain searching and sorting method based on dynamic map link analysis
CN109271491B (en) Cloud service recommendation method based on unstructured text information
CN114168733B (en) Rule retrieval method and system based on complex network
CN111078859A (en) Author recommendation method based on reference times
CN116431763A (en) Domain-oriented science and technology project duplicate checking method and system
KR102454261B1 (en) Collaborative partner recommendation system and method based on user information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination