AU2020327352B2 - Key node identification method based on technology graph - Google Patents

Key node identification method based on technology graph Download PDF

Info

Publication number
AU2020327352B2
AU2020327352B2 AU2020327352A AU2020327352A AU2020327352B2 AU 2020327352 B2 AU2020327352 B2 AU 2020327352B2 AU 2020327352 A AU2020327352 A AU 2020327352A AU 2020327352 A AU2020327352 A AU 2020327352A AU 2020327352 B2 AU2020327352 B2 AU 2020327352B2
Authority
AU
Australia
Prior art keywords
key node
technology graph
technology
method based
identification method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
AU2020327352A
Other versions
AU2020327352A1 (en
Inventor
Bin HUA
Qiyu Lu
Ping Song
Qiqi ZHANG
Sanshan ZHAO
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Shanghai Electric Power Co Ltd
Original Assignee
State Grid Shanghai Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Shanghai Electric Power Co Ltd filed Critical State Grid Shanghai Electric Power Co Ltd
Publication of AU2020327352A1 publication Critical patent/AU2020327352A1/en
Application granted granted Critical
Publication of AU2020327352B2 publication Critical patent/AU2020327352B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Abstract

The present disclosure relates to a key node identification method based on a technology graph, including: building a technology graph; performing centrality calculation on node data in the technology graph, to obtain a key node; reducing dimensions of the technical indicators of the node data by using a principal component analysis method; and analyzing a relationship between the key node and the simplified technical indicators, to obtain key nodes of different categories. Compared with the prior art, the present disclosure integrates network centrality indicators and literature measurement of scientific and technological resources, and resolves disadvantages such as undiversified and impractical indicators for identifying a key node in the technology graph. Based on relevant theories of complex network technologies, quantitative calculation is performed on relevant indicators of the technology graph, thereby helping more accurately to identify a key node, to discover a trend of technology research or a technology trend clue, and to provide decision support for technological innovation.

Description

KEY NODE IDENTIFICATION METHOD BASED ON TECHNOLOGY GRAPH TECHNICAL FIELD
[0001] The present disclosure relates to a data processing method, and in particular, to a key node identification method based on a technology graph.
BACKGROUND
[0002] Identifying key nodes, that is, key technologies and hot technologies, in a technology graph network greatly helps scientific development and technological innovation. Traditional discussion about key nodes in a network often focuses on centralization of complex networks and evaluation of node importance, and statistical properties of the network are measured by using empirical methods. Use of a single measurement indicator or method to identify a key node is not comprehensive. Each measurement indicator or method can reflect the importance of a node in the network only from a particular side. This does not conform to the actual situation. In the era of rapid development of the Internet, a simple combination of measurement indicators cannot satisfy actual requirements, and higher requirements for accuracy of identifying key nodes are put forward.
[0003] Especially now, the application of the networks are more widely and has more practical significance. The measurement indicators merely from a theoretical perspective do not meet the actual situation, and reduce the accuracy of identifying key nodes.
SUMMARY
[0004] An objective of the present disclosure is to provide a key node identification method based on a technology graph, to overcome the disadvantages in the prior art and resolve a problem of undiversified and impractical indicators for identifying a key node in a technology graph.
[0005] The objective of the present disclosure may be achieved by the following technical process:
[0006] A key node identification method based on a technology graph includes:
[0007] building a technology graph;
[0008] performing centrality calculation on node data in the technology graph, to obtain key nodes;
[0009] reducing technical indicators of the node data by a principal component analysis method, wherein the technical indicators are in multiple dimensions; and
[0010] analyzing a relationship between each of the key nodes and each of reduced technical indicators by a linear regression method, to obtain a key node representing the multiple dimensions.
[0011] The technology graph is built by extracting scientific and technological achievements from multiple websites and databases by using an entity, relationship, and attribute extraction method, and performing knowledge fusion on the extracted scientific and technological achievements.
[0012] The websites and the databases include at least one of https://www.cnki.net/, http://www.drcnet.com.cn/, a self-built resource library, research and development institution data, policy and regulation data, industry dynamics data, a patent database, and an industry standard database.
[0013] The centrality includes degree centrality, closeness centrality, and betweenness centrality.
[0014] The dimensions of the technical indicators are classified as project level, talent level and a scientific research achievements level.
[0015] Technical indicators of the project level dimension include a total quantity of projects, a fund project category, and an investment in scientific research.
[0016] Technical indicators of the talent level dimension include an average age of talents, average education of talents, and a quantity of talents.
[0017] In the scientific research achievements level dimension, scientific research achievements include papers, patents, and other achievements.
[0018] Paper-related technical indicators include a total quantity of papers, a total citation frequency, a quantity of core journal papers, a total citation frequency of core journal papers, a quantity of funded papers, a total citation frequency of funded papers, a percentage of core journal papers, a total citation frequency per paper, a citation frequency per core journal paper, a citation frequency per funded paper, and an h-indicator. Patent-related technical indicators include a total quantity of patents and a quantity of invention patents. Technical indicators related to other achievements include an achievement award, an achievement appraisal result, a quantity of standards, and editor-in-chief or associate editor publications.
[0019] Compared with the prior art, the present disclosure integrates network centrality indexes and literature measurement of scientific and technological resources, and resolves disadvantages such as undiversified and impractical indexes for identifying a key node in the technology graph. Based on relevant theories of complex network technologies, quantitative calculation is performed on relevant indexes of the technology graph, thereby helping more accurately to identify a key node, to discover a trend of technology research or a technology trend clue, and to provide decision support for technological innovation.
BRIEF DESCRIPTION OF THE DRAWINGS
[0020] FIG. 1 is a flowchart of a key node identification method based on a technology graph according to this embodiment.
[0021] FIG. 2 is a technology graph built according to this embodiment.
[0022] FIG. 3 is a curve diagram of a cumulative contribution rate of each evaluation index according to this embodiment.
DETAILED DESCRIPTION
[0023] The present disclosure is described in detail below with reference to the accompanying drawings and specific embodiments. The embodiments are implemented on the premise of the technical solution of the present disclosure and provide the detailed implementations and specific operation processes, but the protection scope of the present disclosure is not limited to the following embodiment.
[0024] Embodiment
[0025] As shown in FIG. 1, a key node identification method based on a technology graph includes the following steps.
[0026] (1) Build a technology graph
[0027] Obtain metadata from https://www.cnki.net/, http://www.drcnet.com.cn/, a self-built resource library, external experts and research and development institution data, internal projects and scientific and technological achievements data, adding policy and regulation data, industry dynamics data, patent data, and industry standard data, and perform entity, relationship, and attribute extraction on the metadata, and perform entity disambiguation and co-reference resolution on the extracted information, to build a technology graph, as shown in FIG. 2.
[0028] (2) In consideration of statistical indicators of a complex network, locate a key node based on centrality indicators such as degree centrality, closeness centrality, and betweenness centrality, a node with high betweenness centrality and a high-frequency is a key technology in the field, representing hot topics of research within this period.
[0029] The degree centrality is a sum of direct connections between one node and other nodes. As a connection in the technology graph is directional, the degree centrality may be classified into in-degree centrality and out-degree centrality. The formula of degree centrality is: D (u)=
U=1 Xv(v # u), where u is any node in the technology graph, n is a quantity of nodes in the graph, and Xvu represents whether a node v is directly connected to the node u. The degree centrality is a most direct measurement indicator of node centrality in network analysis and reflects cohesion of a node. Higher degree centrality of a node indicates higher importance of the node in a network.
[0030] The closeness centrality is a reciprocal of a sum of shortest path distances from one
node to all other nodes, and reflects closeness between a node and another node in a network. n-1 The formula of closeness centrality is: Cnorm(u) = -,1 where u is any node in the
technology graph, n is a quantity of nodes in the graph, and d(u, v) is a shortest path distance
between another node v and the node u. Because a connection in the technology graph is
directional, the closeness centrality may be classified into in-closeness centrality and out
closeness centrality. The in-closeness centrality reflects integration power of a node, and the
out-closeness centrality reflects radiation power of a node.
[0031] The betweenness centrality is a quantity of shortest paths passing through a node, that
is, a quantity of times that a node serves as a bridge for a shortest path between any two nodes.
The formula of betweenness centrality is: B(u) = s, where u is any node in the p technology graph, p is a total quantity of shortest paths between a node s and a node t, and p(u)
is a quantity of shortest paths between the node s and the node t through the node u. A larger
quantity of times that a node serves as an "intermediary" indicates higher betweenness centrality
of the node, and the node serves as a "transportation hub" in the network.
[0032] (3) Performing literature measurement based on scientific and technological resources, starting from two aspects: scientific research investment and scientific research achievements.
[0033] In terms of scientific research investment, there include scientific research projects and talent teams. In terms of scientific research projects, there include a total quantity of projects, fund projects, and investment in scientific research. In terms of talent teams, there include an average age of talents, average education of talents, and a quantity of talents.
[0034] The scientific research achievements include papers, patents, standards, monographs, and achievements. Paper-related factors include a total quantity of papers, a total citation frequency, a quantity of core journal papers, a total citation frequency of core journal papers, a quantity of funded papers, a total citation frequency of funded papers, a percentage of core journal papers, a total citation frequency per paper, a citation frequency per core journal paper, a citation frequency per funded paper, and an h-indicator. Patent-related factors include a total quantity of patents and a quantity of invention patents. Achievements include an achievement award, an achievement appraisal result, a quantity of standards, and editor-in-chief or associate editor publications.
[0035] (4) Transform multiple dimensions of evaluation indicators defined in (2) and (3) into mutually independent comprehensive evaluation indicators by using principal component analysis, eliminate correlations between the evaluation indicators, and reduce a quantity of indicators for node importance evaluation.
[0036] In the present disclosure, a technology graph is built for a co-occurrence relationship of 200 technologies in scientific and technological data, and node importance is evaluated in terms of dimensions of a network topology, a project level, a talent level, and scientific research achievements. 27 evaluation indicators corresponding to each technology are separately calculated, to form a 200*27 matrix, and principal component analysis is performed on the matrix, to obtain an eigenvalue, a contribution rate, and a cumulative contribution rate. The cumulative contribution rate is shown in FIG. 3.
[0037] It can be learned from the figure that a cumulative contribution rate of the first five principal components reaches 90.79%. Therefore, selecting only the first five principal components can fully represent information contained in the 27 evaluation indicators. The evaluation matrix can be reduced to a 200*5 matrix by calculating a product of an original indicator weight value matrix and an evaluation indicator matrix that correspond to the first five principal components.
[0038] (5) The comprehensive index of evaluating the node criticality can be expressed by a linear regression, the formula is showed in (4):
[0039] Z = 0.3284*y1 + 0. 1 5 3 1 *y2 + 0.2157*y3 + 0.1196*y4 + 0.0911*y5
[0040] where yi to y5 represent the first five principal components obtained through principal component analysis in (4), the coefficients are the contribution rate of the first five principal components.
[0041] The values calculated from the above formula are sorted to obtain key nodes, and the key nodes are highlighted in the network for easy identification. In addition, this method can also be used to identify key nodes in networks, such as research fields network, authors network, and research institutes network.
[0042] The reference to any prior art in this specification is not, and should not be taken as, an acknowledgement or any form of suggestion that such prior art forms part of the common general knowledge.
[0043] It will be understood that the terms "comprise" and "include" and any of their derivatives (e.g. comprises, comprising, includes, including) as used in this specification, and the claims that follow, is to be taken to be inclusive of features to which the term refers, and is not meant to exclude the presence of any additional features unless otherwise stated or implied.

Claims (9)

  1. CLAIMS: 1. A key node identification method based on a technology graph, comprising:
    building a technology graph;
    performing centrality calculation on node data in the technology graph, to obtain key nodes;
    reducing technical indicators of the node data by a principal component analysis method, wherein the technical indicators are in multiple dimensions; and
    analyzing a relationship between each of the key nodes and each of reduced technical indicators by a linear regression method, to obtain a key node representing the multiple dimensions.
  2. 2. The key node identification method based on a technology graph according to claim 1, wherein the technology graph is built by extracting scientific and technological achievements from multiple websites and databases by using an entity, relationship, and attribute extraction method, and performing knowledge fusion on the extracted scientific and technological achievements.
  3. 3. The key node identification method based on a technology graph according to claim 2, wherein the websites and the databases comprise at least one of https://www.cnki.net/, http://www.drcnet.com.cn/, a self-built resource library, research and development institution data, policy and regulation data, industry dynamics data, a patent database, and an industry standard database.
  4. 4. The key node identification method based on a technology graph according to claim 1, wherein the centrality comprises degree centrality, closeness centrality, and betweenness centrality.
  5. 5. The key node identification method based on a technology graph according to claim 1, wherein the multiple dimensions of the technical indicators are classified as project level, talent level and scientific research achievements level.
  6. 6. The key node identification method based on a technology graph according to claim 5, wherein technical indicators of the project level dimension comprise a total quantity of projects, a fund project category, and an investment in scientific research.
  7. 7. The key node identification method based on a technology graph according to claim 5, wherein technical indicators of the talent level dimension comprise an average age of talents, average education of talents, and a quantity of talents.
  8. 8. The key node identification method based on a technology graph according to claim 5, wherein in the scientific research achievements level dimension, scientific research achievements comprise papers, patents, and other achievements.
  9. 9. The key node identification method based on a technology graph according to claim 8, wherein paper-related technical indicators comprise a total quantity of papers, a total citation frequency, a quantity of core journal papers, a total citation frequency of core journal papers, a quantity of funded papers, a total citation frequency of funded papers, a percentage of core journal papers, a total citation frequency per paper, a citation frequency per core journal paper, a citation frequency per funded paper, and an h-indicator; patent-related technical indicators comprise a total quantity of patents and a quantity of invention patents; and technical indicators related to other achievements comprise an achievement award, an achievement appraisal result, a quantity of standards, and editor-in-chief or associate editor publications.
AU2020327352A 2020-06-18 2020-12-14 Key node identification method based on technology graph Active AU2020327352B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN202010559077.8A CN111813951A (en) 2020-06-18 2020-06-18 Key point identification method based on technical map
CN202010559077.8 2020-06-18
PCT/CN2020/136036 WO2021253758A1 (en) 2020-06-18 2020-12-14 Key node identification method based on technology graph

Publications (2)

Publication Number Publication Date
AU2020327352A1 AU2020327352A1 (en) 2022-01-20
AU2020327352B2 true AU2020327352B2 (en) 2023-01-05

Family

ID=72845160

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2020327352A Active AU2020327352B2 (en) 2020-06-18 2020-12-14 Key node identification method based on technology graph

Country Status (3)

Country Link
CN (1) CN111813951A (en)
AU (1) AU2020327352B2 (en)
WO (1) WO2021253758A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111813951A (en) * 2020-06-18 2020-10-23 国网上海市电力公司 Key point identification method based on technical map
CN114417837B (en) * 2022-01-19 2024-02-13 合肥工业大学 Scientific and technological big data popularity and frontier measurement method based on subject evolution trend
CN114567562B (en) * 2022-03-01 2024-02-06 重庆邮电大学 Method for identifying key nodes of coupling network of power grid and communication network
CN114880482A (en) * 2022-04-26 2022-08-09 广州广电运通金融电子股份有限公司 Graph embedding-based relation graph key personnel analysis method and system
CN116595192B (en) * 2023-05-18 2023-11-21 中国科学技术信息研究所 Technological front information acquisition method and device, electronic equipment and readable storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100262576A1 (en) * 2007-12-17 2010-10-14 Leximancer Pty Ltd. Methods for determining a path through concept nodes

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295692B (en) * 2016-08-05 2019-07-12 北京航空航天大学 Product initial failure root primordium recognition methods based on dimensionality reduction and support vector machines
CN109299090B (en) * 2018-09-03 2023-05-30 平安科技(深圳)有限公司 Foundation centrality calculating method, system, computer equipment and storage medium
CN109446342A (en) * 2018-10-30 2019-03-08 沈阳师范大学 A kind of education of middle and primary schools knowledge mapping analysis method and system based on He Ximan index
CN110032665B (en) * 2019-03-25 2023-11-17 创新先进技术有限公司 Method and device for determining graph node vector in relational network graph
CN110490331A (en) * 2019-08-23 2019-11-22 北京明略软件系统有限公司 The processing method and processing device of knowledge mapping interior joint
CN111813951A (en) * 2020-06-18 2020-10-23 国网上海市电力公司 Key point identification method based on technical map

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100262576A1 (en) * 2007-12-17 2010-10-14 Leximancer Pty Ltd. Methods for determining a path through concept nodes

Also Published As

Publication number Publication date
AU2020327352A1 (en) 2022-01-20
WO2021253758A1 (en) 2021-12-23
CN111813951A (en) 2020-10-23

Similar Documents

Publication Publication Date Title
AU2020327352B2 (en) Key node identification method based on technology graph
JP4920023B2 (en) Inter-object competition index calculation method and system
US9092789B2 (en) Method and system for semantic analysis of unstructured data
US20120310648A1 (en) Name identification rule generating apparatus and name identification rule generating method
CN102360367A (en) XBRL (Extensible Business Reporting Language) data search method and search engine
WO2020258303A1 (en) Semantic model instantiation method, system and device
Athanasopoulos et al. Extracting REST resource models from procedure-oriented service interfaces
US20130326048A1 (en) Contextual network access optimizer
Reda et al. Towards a data quality assessment in big data
Xu Model for evaluating the mechanical product design quality with dual hesitant fuzzy information
Ji et al. A multitask context-aware approach for design lesson-learned knowledge recommendation in collaborative product design
Rodriguez et al. An approach for web service discoverability anti-pattern detection for journal of web engineering
Kaur et al. HAS: Hybrid Analysis of Sentiments for the perspective of customer review summarization
Mynarz et al. Towards a Benchmark for LOD-Enhanced Knowledge Discovery from Structured Data.
Yin et al. A deep natural language processing‐based method for ontology learning of project‐specific properties from building information models
Salem et al. The design of valid multidimensional star schemas assisted by repair solutions
KR20190052980A (en) Device and method of processing recruitment information
JP5125161B2 (en) Web information collecting apparatus, web information collecting method, and web information collecting program
Lu et al. A new ontology meta-matching technique with a hybrid semantic similarity measure
Alam et al. A data-driven score model to assess online news articles in event-based surveillance system
JP5316170B2 (en) Financial analysis support program, financial analysis support device, and financial analysis support method
JP5720071B2 (en) Compound word concept analysis system, method and program
Soheili et al. An evaluation of information behaviour studies through the Scholarly Capital Model
Behkamal et al. A metric suite for systematic quality assessment of linked open data
Song et al. Multi-strategies ontology alignment aggregated by AHP

Legal Events

Date Code Title Description
FGA Letters patent sealed or granted (standard patent)