WO2021253758A1 - Key node identification method based on technology graph - Google Patents

Key node identification method based on technology graph Download PDF

Info

Publication number
WO2021253758A1
WO2021253758A1 PCT/CN2020/136036 CN2020136036W WO2021253758A1 WO 2021253758 A1 WO2021253758 A1 WO 2021253758A1 CN 2020136036 W CN2020136036 W CN 2020136036W WO 2021253758 A1 WO2021253758 A1 WO 2021253758A1
Authority
WO
WIPO (PCT)
Prior art keywords
technical
indicators
centrality
points based
technology
Prior art date
Application number
PCT/CN2020/136036
Other languages
French (fr)
Chinese (zh)
Inventor
华斌
宋平
陆启宇
张琪祁
赵三珊
Original Assignee
国网上海市电力公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 国网上海市电力公司 filed Critical 国网上海市电力公司
Priority to AU2020327352A priority Critical patent/AU2020327352B2/en
Publication of WO2021253758A1 publication Critical patent/WO2021253758A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Definitions

  • the invention relates to a data processing method, in particular to a key point identification method based on a technical map.
  • identifying the key nodes in the network has a great auxiliary effect on the deployment of science and technology innovation.
  • the traditional discussion of key nodes in the network often exists in the centralization of complex networks and the evaluation of node importance, and the statistical properties of the network are measured by empirical methods.
  • the single use of one of the above measurement indicators or methods to identify key nodes is very one-sided. Each measurement indicator or method can only reflect the status of the node in the network from a certain side, which does not conform to the actual situation. In the era of rapid development of the Internet, a simple combination of measurement indicators cannot meet the actual needs, and higher requirements are placed on the accuracy of identifying key points.
  • the purpose of the present invention is to overcome the above-mentioned defects in the prior art and provide a method for identifying key points based on a technical map, and solve the problems of identifying the unity of key node indicators in the technical map and deviating from reality.
  • a key point identification method based on the technical map including:
  • Principal component analysis method is used to simplify the multiple dimensions of the technical indicators of the node data
  • the technical map is obtained by extracting the scientific and technological achievements of multiple websites and databases by using entity, relationship and attribute extraction methods, and constructing the extracted scientific and technological achievements through knowledge fusion.
  • the website and the database include at least one of Tongfang Knowledge Network, National Research Network, self-built resource database, R&D institution data, policy and regulation data, industry dynamic data, patent database, and industry standard database.
  • the centrality includes degree centrality, close centrality and betweenness centrality.
  • the dimensions of the technical indicators include project level dimensions, talent level dimensions and scientific research results level dimensions.
  • the technical indicators of the project level dimension include the total number of projects, the types of funded projects, and the investment in scientific research funds.
  • the technical indicators of the talent level dimension include the average age of the talent, the average education background of the talent, and the number of talents.
  • scientific research achievements include papers, patents and other achievements.
  • the technical indicators related to the papers include the total number of papers, the total frequency of citations, the number of core journal papers, the total frequency of core journal citations, the number of fund papers, the total frequency of fund citations, the percentage of core journal papers, and the percentage of core journal papers.
  • the patent-related technical indicators include the total number of patents and the number of invention patents; the technical indicators related to the other achievements Including achievement awards, achievement appraisal results, number of standards, and works of editor-in-chief or associate editor-in-chief.
  • the linear regression method is used to analyze the relationship between the key nodes and the technical indicators.
  • the present invention comprehensively considers the network centrality index and the literature measurement of scientific and technological resources, and solves the shortcomings of the unity of the key node index in the identification technology map and the fact that it is out of reality.
  • the quantitative calculation of the relevant indicators of the technology map is conducive to more accurately identifying key nodes, discovering the trend of technological research or technological trend clues, and providing decision-making support for technological innovation.
  • FIG. 1 is a flowchart of a method for identifying key points based on a technical map of this embodiment
  • Figure 2 is a technical map constructed in this embodiment
  • Figure 3 is a graph showing the cumulative contribution rate of each evaluation index in this embodiment.
  • a method for identifying key points based on a technical map includes the following steps:
  • the degree centrality is the sum of the direct connections between a node and other nodes. Since the connection of the technical map is directional, it can be divided into point-in centrality and point-out centrality. Combining the point-in centrality and the point-out centrality, the calculation formula of the node's centrality is: Where u is any node in the technical graph, n is the number of nodes in the graph, and X vu indicates whether the nodes v and u are directly connected. Centrality is the most direct measure of node centrality in network analysis, and it reflects the cohesion of a node. The higher the degree centrality of a node, the more important the node is in the network;
  • the proximity centrality is the reciprocal of the sum of the shortest path distances from a node to all other nodes. It reflects the closeness between a node and other nodes in the network.
  • the standardized calculation formula for the proximity centrality of a node is: Where u is any node in the technical graph, n is the number of nodes in the graph, and d(u, v) is the shortest path distance between another node v and u. Since the connection of the technical map is directional, it can be divided into in-approaching centrality and out-approaching centrality.
  • the in-closeness centrality reflects the integration power of the node, and the out-closeness centrality reflects the radiation power of the node;
  • Betweenness centrality is the number of shortest paths passing through a node. That is, the number of times that a node acts as a bridge for the shortest path between any two other nodes.
  • the calculation formula of node betweenness centrality is: Among them, u is any node in the technical graph, p is the total number of shortest paths between node s and node t, and p(u) is the number of shortest paths through node u between node s and node t. The higher the number of times a node acts as an "intermediary", the greater its betweenness centrality, and it acts as a "transportation hub" in the network.
  • Scientific research investment is divided into scientific research projects and talent echelon.
  • Scientific research projects include the total number of projects, fund projects and scientific research funding investment.
  • the talent echelon includes the average age of talents, the average academic qualifications of talents, and the number of talents;
  • Scientific research results include papers, patents, standards, monographs and achievements.
  • the factors that need to be considered are the total number of papers, the total frequency of citations, the number of papers in core journals, the total frequency of citations of core journals, the number of funded papers, and the total frequency of funded citations.
  • Patents include the total number of patents and the number of invention patents.
  • the results include Achievement awards and achievement appraisal, as well as the number of standards, the editor-in-chief or associate editor's works, etc.;
  • the multi-dimensional evaluation indicators defined in 2) and 3) are transformed into mutually independent comprehensive evaluation indicators, which eliminates the correlation between evaluation indicators and simplifies the number of critical indicators for evaluation nodes.
  • the present invention constructs a technology map for the co-occurrence relationship of 200 technologies in scientific and technological data, and evaluates the criticality of nodes from the dimensions of network topology, project level, talent level, and scientific research results. Calculate the 27 evaluation indicators corresponding to each technology to form a 200*27 matrix, perform principal component analysis on the matrix, and obtain the characteristic root, contribution rate and cumulative contribution rate. The cumulative contribution rate is shown in Figure 3:
  • the cumulative contribution rate of the first 5 principal components reaches 90.79%. Therefore, selecting only the first 5 principal components can fully represent the information contained in the 27 evaluation indicators.
  • the evaluation matrix can be reduced to 200*5.
  • y 1 to y 5 in the formula represent the first 5 principal components obtained by principal component analysis in 4).
  • key nodes Through function calculation and sorting the obtained values, key nodes can be obtained, which are marked with eye-catching colors on the network for easy identification.
  • this method can also be used to identify key nodes in the network composed of research fields, authors, research institutions and other subjects.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Educational Administration (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Primary Health Care (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A key node identification method based on a technology graph, comprising: constructing a technology graph; performing centrality calculation on node data in the technology graph to obtain key nodes; using a principal component analysis method to simplify multiple dimensions of technology indicators of the node data; and analysing the relationship between the key nodes and the technology indicators after simplification to obtain key nodes corresponding to the multiple dimensions. The present method comprehensively considers network centrality indicators and literature metrics of science and technology resources to solve the defects of the homogeneity and departure from reality of identifying key node indicators in a technology graph, and quantitatively calculates relevant indicators of the technology graph on the basis of theories related to complex network technology, facilitating more accurate identification of key nodes and discovering the direction of technological research or technological trend clues to provide decision-making support for scientific and technological innovation.

Description

一种基于技术图谱的关键点识别方法A Key Point Recognition Method Based on Technical Atlas 技术领域Technical field
本发明涉及一种数据处理方法,尤其是涉及一种基于技术图谱的关键点识别方法。The invention relates to a data processing method, in particular to a key point identification method based on a technical map.
背景技术Background technique
在技术图谱网络种,识别出网络中的关键节点,也即关键技术和热点技术,对于科创布局工作的展开有很大的辅助作用。传统的对于网络中的关键节点的讨论常存在于复杂网络的中心化问题和节点重要度评估上,通过实证方法度量网络的统计性质。单一运用上述某种测度指标或方法识别关键节点具有很强的片面性,每种测度指标或方法都只能从某一侧面反映节点在网络中的地位,不符合实际情况。在互联网飞速发展的时代,简单的测度指标组合无法满足现实需求,对识别关键点的准确性提出了更高的要求。In the technology map network, identifying the key nodes in the network, that is, key technologies and hotspot technologies, has a great auxiliary effect on the deployment of science and technology innovation. The traditional discussion of key nodes in the network often exists in the centralization of complex networks and the evaluation of node importance, and the statistical properties of the network are measured by empirical methods. The single use of one of the above measurement indicators or methods to identify key nodes is very one-sided. Each measurement indicator or method can only reflect the status of the node in the network from a certain side, which does not conform to the actual situation. In the era of rapid development of the Internet, a simple combination of measurement indicators cannot meet the actual needs, and higher requirements are placed on the accuracy of identifying key points.
特别是现在网络的应用更加广泛,网络的应用具有更多的现实意义,单从理论角度的测量度指标不贴合实际,降低了识别关键节点的准确性。Especially now that the application of the network is more extensive, and the application of the network has more practical significance. The measurement index from the theoretical point of view does not fit the actual situation, which reduces the accuracy of identifying key nodes.
发明内容Summary of the invention
本发明的目的就是为了克服上述现有技术存在的缺陷而提供一种基于技术图谱的关键点识别方法,解决识别技术图谱中关键节点指标的单一性和脱离实际等问题。The purpose of the present invention is to overcome the above-mentioned defects in the prior art and provide a method for identifying key points based on a technical map, and solve the problems of identifying the unity of key node indicators in the technical map and deviating from reality.
本发明的目的可以通过以下技术方案来实现:The purpose of the present invention can be achieved through the following technical solutions:
一种基于技术图谱的关键点识别方法,包括:A key point identification method based on the technical map, including:
构建技术图谱;Build a technical map;
对所述技术图谱中的节点数据,进行中心度计算,得到关键节点;Perform centrality calculation on the node data in the technical map to obtain key nodes;
采用主成分分析法,对所述的节点数据的多个维度的技术指标进行简化;Principal component analysis method is used to simplify the multiple dimensions of the technical indicators of the node data;
分析所述的关键节点与所述技术指标之间的关系,得到不同维度下的关键节点。Analyze the relationship between the key nodes and the technical indicators to obtain key nodes in different dimensions.
所述的技术图谱通过采用实体、关系和属性的抽取方法对多个网站和数据库的科技成果进行抽取,并对抽取得到的所述科技成果进行知识融合构建得到。The technical map is obtained by extracting the scientific and technological achievements of multiple websites and databases by using entity, relationship and attribute extraction methods, and constructing the extracted scientific and technological achievements through knowledge fusion.
所述的网站和所述数据库包括同方知网、国研网、自建资源库、研发机构数据、政策法规数据、行业动态数据、专利数据库和行业标准数据库中的至少一个。The website and the database include at least one of Tongfang Knowledge Network, National Research Network, self-built resource database, R&D institution data, policy and regulation data, industry dynamic data, patent database, and industry standard database.
所述的中心度包括度中心度、接近中心度和介数中心度。The centrality includes degree centrality, close centrality and betweenness centrality.
所述的技术指标的维度包括项目水平维度、人才水平维度和科研成果水平维度。The dimensions of the technical indicators include project level dimensions, talent level dimensions and scientific research results level dimensions.
所述的项目水平维度的技术指标包括项目总数、基金项目类别和科研经费投入。The technical indicators of the project level dimension include the total number of projects, the types of funded projects, and the investment in scientific research funds.
所述的人才水平维度的技术指标包括人才平均年龄、人才平均学历和人才数量。The technical indicators of the talent level dimension include the average age of the talent, the average education background of the talent, and the number of talents.
所述的科研成果水平维度中,科研成果包括论文、专利和其它成果。In the dimension of scientific research achievement level, scientific research achievements include papers, patents and other achievements.
所述的论文相关的技术指标包括论文总数、被引总频次、核心期刊论文数、核心期刊被引总频次、基金论文数、基金被引总频次、核心期刊论文占比、核心期刊论文占比、总篇均被引频次、核心期刊篇均被引频次、基金篇均被引频次和H指数;所述专利相关的技术指标包括专利总数目和发明专利数目;所述其它成果相关的技术指标包括成果获奖、成果鉴定结果、标准数目、主编或副主编著作。The technical indicators related to the papers include the total number of papers, the total frequency of citations, the number of core journal papers, the total frequency of core journal citations, the number of fund papers, the total frequency of fund citations, the percentage of core journal papers, and the percentage of core journal papers. , The total citation frequency of all articles, the citation frequency of core journal articles, the all citation frequency of fund articles, and the H index; the patent-related technical indicators include the total number of patents and the number of invention patents; the technical indicators related to the other achievements Including achievement awards, achievement appraisal results, number of standards, and works of editor-in-chief or associate editor-in-chief.
采用线性回归法分析所述的关键节点与所述技术指标之间的关系。The linear regression method is used to analyze the relationship between the key nodes and the technical indicators.
与现有技术相比,本发明综合考虑了网络中心度指标和科技资源的文献计量,解决了识别技术图谱中关键节点指标的单一性和脱离实际等缺点,基于复杂网络技术的相关理论,对技术图谱的相关指标进行量化计算,有利于更加准确地识别关键节点,发现技术研究的走向或技术趋势性线索,为科技创新提供决策支持。Compared with the prior art, the present invention comprehensively considers the network centrality index and the literature measurement of scientific and technological resources, and solves the shortcomings of the unity of the key node index in the identification technology map and the fact that it is out of reality. Based on the related theories of complex network technology, The quantitative calculation of the relevant indicators of the technology map is conducive to more accurately identifying key nodes, discovering the trend of technological research or technological trend clues, and providing decision-making support for technological innovation.
附图说明Description of the drawings
图1为本实施例基于技术图谱的关键点识别方法流程图;FIG. 1 is a flowchart of a method for identifying key points based on a technical map of this embodiment;
图2为本实施例构建的技术图谱;Figure 2 is a technical map constructed in this embodiment;
图3为本实施例各评价指标的累积贡献率曲线图。Figure 3 is a graph showing the cumulative contribution rate of each evaluation index in this embodiment.
具体实施方式detailed description
下面结合附图和具体实施例对本发明进行详细说明。本实施例以本发明技术方案为前提进行实施,给出了详细的实施方式和具体的操作过程,但本发明的保护范围不限于下述的实施例。The present invention will be described in detail below with reference to the drawings and specific embodiments. This embodiment is implemented on the premise of the technical solution of the present invention, and provides detailed implementation and specific operation procedures, but the protection scope of the present invention is not limited to the following embodiments.
实施例Example
如图1所示,一种基于技术图谱的关键点识别方法,包括以下步骤:As shown in Figure 1, a method for identifying key points based on a technical map includes the following steps:
1)构建技术图谱1) Build a technical map
从同方知网、国研网、自建资源库、外部专家及研发机构数据,内部项目及科技成果数据,添加政策法规数据、行业动态数据、专利数据及行业标准数据中获取元数据,进行实体、关系和属性的抽取,对抽取的信息进行实体消歧和共指消解,,构建技术图谱,如图2所示。Obtain metadata from Tongfang Knowledge Network, National Research Network, self-built resource database, data from external experts and R&D institutions, data from internal projects and scientific and technological achievements, adding policy and regulation data, industry dynamic data, patent data and industry standard data to perform entity , Relation and attribute extraction, entity disambiguation and coreference resolution are performed on the extracted information, and the technical map is constructed, as shown in Figure 2.
2)从复杂网络的统计指标角度考虑,基于度中心度、接近中心度、介数中心度等指标的大小来定位关键节点,具备高中介中心性和高频特性的节点,就是本领域内的关键技术,代表着这段时期的研究热点主题;2) From the perspective of statistical indicators of complex networks, key nodes are located based on indicators such as degree centrality, proximity centrality, and betweenness centrality. Nodes with high betweenness centrality and high frequency characteristics are the ones in this field. The key technologies represent the hot topics of research during this period;
度中心度是一个节点与其他节点直接连接的总和。由于技术图谱的连接是有方向的,则可分为点入中心度和点出中心度。结合点入中心度和点出中心度综合考虑,节点的度中心度的计算公式为:
Figure PCTCN2020136036-appb-000001
其中u是技术图谱中任意一个节点,n是图中节点的个数,X vu表示节点v与u之间之间是否直接相连。中心度是网络分析中刻画节点中心性的最直接度量指标,它反映了一个节点的凝聚力。一个节点的度中心性越高,该节点在网络中就越重要;
The degree centrality is the sum of the direct connections between a node and other nodes. Since the connection of the technical map is directional, it can be divided into point-in centrality and point-out centrality. Combining the point-in centrality and the point-out centrality, the calculation formula of the node's centrality is:
Figure PCTCN2020136036-appb-000001
Where u is any node in the technical graph, n is the number of nodes in the graph, and X vu indicates whether the nodes v and u are directly connected. Centrality is the most direct measure of node centrality in network analysis, and it reflects the cohesion of a node. The higher the degree centrality of a node, the more important the node is in the network;
接近中心度是一个节点到所有其他节点的最短路径距离之和的倒数。它反映网络中某一节点与其他节点之间的接近程度。节点的接近中心度标准化计算公式为:
Figure PCTCN2020136036-appb-000002
其中u是技术图谱中任意一个节点,n是图中节点的个数,d(u,v)是另一个节点v与u之间最短的路径距离。由于技术图谱的连接是有方向的,则可分为入接近中心度和出接近中心度。入接近中心度反映节点的整合力,出接近中心度反映节点的辐射力;
The proximity centrality is the reciprocal of the sum of the shortest path distances from a node to all other nodes. It reflects the closeness between a node and other nodes in the network. The standardized calculation formula for the proximity centrality of a node is:
Figure PCTCN2020136036-appb-000002
Where u is any node in the technical graph, n is the number of nodes in the graph, and d(u, v) is the shortest path distance between another node v and u. Since the connection of the technical map is directional, it can be divided into in-approaching centrality and out-approaching centrality. The in-closeness centrality reflects the integration power of the node, and the out-closeness centrality reflects the radiation power of the node;
介数中心度是经过一个节点的最短路径的数目。即一个结点担任其它任意两个结点之间最短路径的桥梁的次数。节点介数中心度计算公式为:
Figure PCTCN2020136036-appb-000003
Figure PCTCN2020136036-appb-000004
其中,u是技术图谱中任意一个节点,p是节点s和节点t之间最短路径的总数,p(u)是节点s和节点t之间通过节点u的最短路径数。一个结点充当“中 介”的次数越高,它的介数中心度就越大,它在网络中起到“交通枢纽”的作用。
Betweenness centrality is the number of shortest paths passing through a node. That is, the number of times that a node acts as a bridge for the shortest path between any two other nodes. The calculation formula of node betweenness centrality is:
Figure PCTCN2020136036-appb-000003
Figure PCTCN2020136036-appb-000004
Among them, u is any node in the technical graph, p is the total number of shortest paths between node s and node t, and p(u) is the number of shortest paths through node u between node s and node t. The higher the number of times a node acts as an "intermediary", the greater its betweenness centrality, and it acts as a "transportation hub" in the network.
3)基于科技资源的文献计量,从科研投入、科研成果两个方面入手;3) Document measurement based on scientific and technological resources, starting from two aspects: scientific research investment and scientific research results;
科研投入又分为科研项目和人才梯队,科研项目包括项目总数、基金项目和科研经费投入,人才梯队又包括人才平均年龄、人才平均学历和人才数量;Scientific research investment is divided into scientific research projects and talent echelon. Scientific research projects include the total number of projects, fund projects and scientific research funding investment. The talent echelon includes the average age of talents, the average academic qualifications of talents, and the number of talents;
科研成果包括论文、专利、标准、专著和成果,其中,论文需要考虑的因素是论文总数、被引总频次、核心期刊论文数、核心期刊被引总频次、基金论文数、基金被引总频次、核心期刊论文占比、核心期刊论文占比、总篇均被引频次、核心期刊篇均被引频次、基金篇均被引频次和H指数,专利包括专利总数目和发明专利数目,成果包括成果获奖和成果鉴定,还有标准数目、主编或者副主编著作等;Scientific research results include papers, patents, standards, monographs and achievements. Among them, the factors that need to be considered are the total number of papers, the total frequency of citations, the number of papers in core journals, the total frequency of citations of core journals, the number of funded papers, and the total frequency of funded citations. , The proportion of core journal papers, the proportion of core journal papers, the citation frequency of the total papers, the citation frequency of the core journal papers, the citation frequency of fund papers, and the H index. Patents include the total number of patents and the number of invention patents. The results include Achievement awards and achievement appraisal, as well as the number of standards, the editor-in-chief or associate editor's works, etc.;
4)通过主成分分析将2)和3)中定义的多维度的评估指标转化为相互独立的综合评估指标,消除评估指标间的相关性,简化评估节点关键性的指标数。4) Through principal component analysis, the multi-dimensional evaluation indicators defined in 2) and 3) are transformed into mutually independent comprehensive evaluation indicators, which eliminates the correlation between evaluation indicators and simplifies the number of critical indicators for evaluation nodes.
本发明对200项技术在科技资料中的共现关系构建了技术图谱,从网络拓扑结构、项目水平、人才水平和科研成果这几个维度来评估节点的关键性。分别计算每项技术对应的27项评估指标,构成一个200*27的矩阵,对该矩阵进行主成分分析,得到特征根、贡献率和累积贡献率,其累积贡献率如图3所示:The present invention constructs a technology map for the co-occurrence relationship of 200 technologies in scientific and technological data, and evaluates the criticality of nodes from the dimensions of network topology, project level, talent level, and scientific research results. Calculate the 27 evaluation indicators corresponding to each technology to form a 200*27 matrix, perform principal component analysis on the matrix, and obtain the characteristic root, contribution rate and cumulative contribution rate. The cumulative contribution rate is shown in Figure 3:
从图中可以看出,前5个主成分的累计贡献率达到90.79%。因此只选取前5个主成分可以充分代表27个评估指标所含的信息。通过计算前5个主成分对应的原指标权重值矩阵与评估指标矩阵的乘积,可以将评价矩阵约简为200*5。It can be seen from the figure that the cumulative contribution rate of the first 5 principal components reaches 90.79%. Therefore, selecting only the first 5 principal components can fully represent the information contained in the 27 evaluation indicators. By calculating the product of the original index weight value matrix corresponding to the first 5 principal components and the evaluation index matrix, the evaluation matrix can be reduced to 200*5.
5)利用线性回归表达式,以前5个主成分的贡献率作为主成分的权重,可以得到节点关键性的综合数值。基于4)的结果,得到评价节点关键性的综合函数:5) Using linear regression expressions, the contribution rates of the previous five principal components are used as the weights of the principal components, and the comprehensive value of the criticality of the node can be obtained. Based on the result of 4), a comprehensive function for evaluating the criticality of the node is obtained:
Z=0.3284*y 1+0.1531*y 2+0.2157*y 3+0.1196*y 4+0.0911*y 5 Z=0.3284*y 1 +0.1531*y 2 +0.2157*y 3 +0.1196*y 4 +0.0911*y 5
其中,式中的y 1至y 5代表的是4)中主成分分析得到的前5个主成分。 Among them, y 1 to y 5 in the formula represent the first 5 principal components obtained by principal component analysis in 4).
通过函数计算,对得到的数值进行排序,可以得到关键节点,在网络以醒目的颜色加以标记,便于识别。另外对于研究领域、作者、研究机构等主体构成的网络也可以采用这种方法来识别网络中的关键节点。Through function calculation and sorting the obtained values, key nodes can be obtained, which are marked with eye-catching colors on the network for easy identification. In addition, this method can also be used to identify key nodes in the network composed of research fields, authors, research institutions and other subjects.

Claims (10)

  1. 一种基于技术图谱的关键点识别方法,其特征在于,包括:A method for identifying key points based on a technical map, which is characterized in that it includes:
    构建技术图谱;Build a technical map;
    对所述技术图谱中的节点数据,进行中心度计算,得到关键节点;Perform centrality calculation on the node data in the technical map to obtain key nodes;
    采用主成分分析法,对所述的节点数据的多个维度的技术指标进行简化;Principal component analysis method is used to simplify the multiple dimensions of the technical indicators of the node data;
    分析所述的关键节点与简化后的技术指标之间的关系,得到所述多个维度对应的关键节点。The relationship between the key nodes and the simplified technical indicators is analyzed to obtain key nodes corresponding to the multiple dimensions.
  2. 根据权利要求1所述的一种基于技术图谱的关键点识别方法,其特征在于,所述的技术图谱通过采用实体、关系和属性的抽取方法对多个网站和数据库的科技成果进行抽取,并对抽取得到的所述科技成果进行知识融合构建得到。The method for identifying key points based on a technical map according to claim 1, wherein the technical map extracts scientific and technological achievements of multiple websites and databases by using entity, relationship and attribute extraction methods, and The extracted scientific and technological achievements are obtained by knowledge fusion construction.
  3. 根据权利要求2所述的一种基于技术图谱的关键点识别方法,其特征在于,所述的网站和所述数据库包括同方知网、国研网、自建资源库、研发机构数据、政策法规数据、行业动态数据、专利数据库和行业标准数据库中的至少一个。The method for identifying key points based on a technical map according to claim 2, wherein the website and the database include Tongfang Knowledge Network, National Research Network, self-built resource database, R&D institution data, policies and regulations At least one of data, industry dynamic data, patent database, and industry standard database.
  4. 根据权利要求1所述的一种基于技术图谱的关键点识别方法,其特征在于,所述的中心度包括度中心度、接近中心度和介数中心度。The method for identifying key points based on a technical map according to claim 1, wherein the centrality includes degree centrality, close centrality and betweenness centrality.
  5. 根据权利要求1所述的一种基于技术图谱的关键点识别方法,其特征在于,所述的技术指标的多个维度包括项目水平维度、人才水平维度和科研成果水平维度。The method for identifying key points based on a technology map of claim 1, wherein the multiple dimensions of the technical indicators include project level dimensions, talent level dimensions, and scientific research achievement level dimensions.
  6. 根据权利要求5所述的一种基于技术图谱的关键点识别方法,其特征在于,所述的项目水平维度的技术指标包括项目总数、基金项目类别和科研经费投入。The method for identifying key points based on a technical map according to claim 5, wherein the technical indicators of the project level dimension include the total number of projects, the types of funded projects, and the investment of scientific research funds.
  7. 根据权利要求5所述的一种基于技术图谱的关键点识别方法,其特征在于,所述的人才水平维度的技术指标包括人才平均年龄、人才平均学历和人才数量。The method for identifying key points based on a technical map according to claim 5, wherein the technical indicators of the talent level dimension include the average age of the talent, the average education background of the talent, and the number of talents.
  8. 根据权利要求5所述的一种基于技术图谱的关键点识别方法,其特征在于,所述的科研成果水平维度中,科研成果包括论文、专利和其它成果。The method for identifying key points based on a technical map according to claim 5, characterized in that, in the level dimension of scientific research achievements, scientific research achievements include papers, patents and other achievements.
  9. 根据权利要求8所述的一种基于技术图谱的关键点识别方法,其特征在于,所述的论文相关的技术指标包括论文总数、被引总频次、核心期刊论文数、核心期刊被引总频次、基金论文数、基金被引总频次、核心期刊论文占比、核心期刊论文占比、总篇均被引频次、核心期刊篇均被引频次、基金篇均被引频次和H指数;所述专利相关的技术指标包括专利总数目和发明专利数目;所述其它成果相关的技 术指标包括成果获奖、成果鉴定结果、标准数目、主编或副主编著作。The method for identifying key points based on a technical map according to claim 8, wherein the technical indicators related to the paper include the total number of papers, the total frequency of citations, the number of core journal papers, and the total frequency of citations of core journals. , The number of fund papers, the total frequency of fund citations, the percentage of core journal papers, the percentage of core journal papers, the frequency of total citations, the frequency of citations of core journals, the frequency of citations of fund papers, and the H index; The technical indicators related to patents include the total number of patents and the number of invention patents; the other technical indicators related to achievements include awards for achievements, results of achievements appraisal, number of standards, and works of editor-in-chief or associate editor-in-chief.
  10. 根据权利要求1所述的一种基于技术图谱的关键点识别方法,其特征在于,采用线性回归法分析所述的关键节点与所述技术指标之间的关系。The method for identifying key points based on technical graphs according to claim 1, characterized in that a linear regression method is used to analyze the relationship between the key nodes and the technical indicators.
PCT/CN2020/136036 2020-06-18 2020-12-14 Key node identification method based on technology graph WO2021253758A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2020327352A AU2020327352B2 (en) 2020-06-18 2020-12-14 Key node identification method based on technology graph

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010559077.8 2020-06-18
CN202010559077.8A CN111813951A (en) 2020-06-18 2020-06-18 Key point identification method based on technical map

Publications (1)

Publication Number Publication Date
WO2021253758A1 true WO2021253758A1 (en) 2021-12-23

Family

ID=72845160

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/136036 WO2021253758A1 (en) 2020-06-18 2020-12-14 Key node identification method based on technology graph

Country Status (3)

Country Link
CN (1) CN111813951A (en)
AU (1) AU2020327352B2 (en)
WO (1) WO2021253758A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114417837A (en) * 2022-01-19 2022-04-29 合肥工业大学 Method for measuring popularity and frontier of science and technology big data based on subject evolution trend
CN114567562A (en) * 2022-03-01 2022-05-31 重庆邮电大学 Method for identifying key nodes of coupling network of power grid and communication network
CN116595192A (en) * 2023-05-18 2023-08-15 中国科学技术信息研究所 Technological front information acquisition method and device, electronic equipment and readable storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111813951A (en) * 2020-06-18 2020-10-23 国网上海市电力公司 Key point identification method based on technical map
CN114880482A (en) * 2022-04-26 2022-08-09 广州广电运通金融电子股份有限公司 Graph embedding-based relation graph key personnel analysis method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295692A (en) * 2016-08-05 2017-01-04 北京航空航天大学 Product initial failure root primordium recognition methods based on dimensionality reduction Yu support vector machine
CN110032665A (en) * 2019-03-25 2019-07-19 阿里巴巴集团控股有限公司 Determine the method and device of node of graph vector in relational network figure
CN110490331A (en) * 2019-08-23 2019-11-22 北京明略软件系统有限公司 The processing method and processing device of knowledge mapping interior joint
CN111813951A (en) * 2020-06-18 2020-10-23 国网上海市电力公司 Key point identification method based on technical map

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100262576A1 (en) * 2007-12-17 2010-10-14 Leximancer Pty Ltd. Methods for determining a path through concept nodes
CN109299090B (en) * 2018-09-03 2023-05-30 平安科技(深圳)有限公司 Foundation centrality calculating method, system, computer equipment and storage medium
CN109446342A (en) * 2018-10-30 2019-03-08 沈阳师范大学 A kind of education of middle and primary schools knowledge mapping analysis method and system based on He Ximan index

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295692A (en) * 2016-08-05 2017-01-04 北京航空航天大学 Product initial failure root primordium recognition methods based on dimensionality reduction Yu support vector machine
CN110032665A (en) * 2019-03-25 2019-07-19 阿里巴巴集团控股有限公司 Determine the method and device of node of graph vector in relational network figure
CN110490331A (en) * 2019-08-23 2019-11-22 北京明略软件系统有限公司 The processing method and processing device of knowledge mapping interior joint
CN111813951A (en) * 2020-06-18 2020-10-23 国网上海市电力公司 Key point identification method based on technical map

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114417837A (en) * 2022-01-19 2022-04-29 合肥工业大学 Method for measuring popularity and frontier of science and technology big data based on subject evolution trend
CN114417837B (en) * 2022-01-19 2024-02-13 合肥工业大学 Scientific and technological big data popularity and frontier measurement method based on subject evolution trend
CN114567562A (en) * 2022-03-01 2022-05-31 重庆邮电大学 Method for identifying key nodes of coupling network of power grid and communication network
CN114567562B (en) * 2022-03-01 2024-02-06 重庆邮电大学 Method for identifying key nodes of coupling network of power grid and communication network
CN116595192A (en) * 2023-05-18 2023-08-15 中国科学技术信息研究所 Technological front information acquisition method and device, electronic equipment and readable storage medium
CN116595192B (en) * 2023-05-18 2023-11-21 中国科学技术信息研究所 Technological front information acquisition method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
AU2020327352B2 (en) 2023-01-05
AU2020327352A1 (en) 2022-01-20
CN111813951A (en) 2020-10-23

Similar Documents

Publication Publication Date Title
WO2021253758A1 (en) Key node identification method based on technology graph
US10380265B2 (en) Statistical process control and analytics for translation supply chain operational management
Wang et al. Data quality requirements analysis and modeling
Ferrell Jr et al. Design of economically optimal acceptance sampling plans with inspection error
JP4920023B2 (en) Inter-object competition index calculation method and system
US20170154385A1 (en) System and method for automatic validation
WO2021254027A1 (en) Method and apparatus for identifying suspicious community, and storage medium and computer device
CN111651552B (en) Structured information determining method and device and electronic equipment
TW201539214A (en) A multidimensional recursive learning process and system used to discover complex dyadic or multiple counterparty relationships
CN109102197A (en) Patent valve estimating system
Wang et al. A quantitative exploration on reasons for citing articles from the perspective of cited authors
CN106971107B (en) Safety grading method for data transaction
JP2019096308A (en) Patent evaluation method utilizing structural equation model and system for performing the method
KR100974342B1 (en) The system for company credit rating using the recommendation of professional group
Kante et al. Use of partial least squares structural equation modelling (PLS-SEM) in privacy and disclosure research on social network sites: A systematic review
Brandas et al. Integrated approach model of risk, control and auditing of accounting information systems
Reda et al. Towards a data quality assessment in big data
Xu Model for evaluating the mechanical product design quality with dual hesitant fuzzy information
Wang et al. Vrdu: A benchmark for visually-rich document understanding
WO2019142391A1 (en) Data analysis assistance system and data analysis assistance method
CN104615740B (en) A kind of volunteer's geography information credit worthiness computational methods
Serra et al. Modeling context for data quality management
CN108399545A (en) E-commerce platform quality determining method and device
Khojasteh et al. A study of the influencing technological and technical factors successful implementation of business intelligence system in internet service providers companies
Sikdar et al. On the effectiveness of the scientific peer-review system: a case study of the Journal of High Energy Physics

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2020327352

Country of ref document: AU

Date of ref document: 20201214

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20941287

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20941287

Country of ref document: EP

Kind code of ref document: A1