CN116739085A - Urban science and technology innovation assessment method based on knowledge graph - Google Patents

Urban science and technology innovation assessment method based on knowledge graph Download PDF

Info

Publication number
CN116739085A
CN116739085A CN202311025040.7A CN202311025040A CN116739085A CN 116739085 A CN116739085 A CN 116739085A CN 202311025040 A CN202311025040 A CN 202311025040A CN 116739085 A CN116739085 A CN 116739085A
Authority
CN
China
Prior art keywords
entity
index
calculating
city
author
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311025040.7A
Other languages
Chinese (zh)
Inventor
盛振婷
袁会会
韩宏博
邢运占
张鹏
郑少杰
薛旋
黄美玲
李欣谚
宋健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhipu Huazhang Technology Co ltd
Original Assignee
Beijing Zhipu Huazhang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhipu Huazhang Technology Co ltd filed Critical Beijing Zhipu Huazhang Technology Co ltd
Priority to CN202311025040.7A priority Critical patent/CN116739085A/en
Publication of CN116739085A publication Critical patent/CN116739085A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A30/00Adapting or protecting infrastructure or their operation
    • Y02A30/60Planning or developing urban green infrastructure

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Tourism & Hospitality (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Economics (AREA)
  • Mathematical Physics (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a knowledge graph-based urban technological innovation assessment method, belongs to the technical field of urban planning management, and solves the problems of low accuracy and reliability of urban technological innovation assessment in the prior art. The technical scheme of the invention mainly comprises the following steps: obtaining basic corpus data; constructing a knowledge graph: extracting an entity and a relation according to the basic corpus data, and constructing a scientific knowledge graph by taking the entity as a node and the relation as an edge; calculating paper indexes according to the number of papers and the first average quotation times, calculating scholars indexes according to the number of authors and the second average quotation times, and calculating institution indexes according to the number of institutions and the third average quotation times; acquiring the cooperation times of the city and the cities of other countries, and calculating an international index according to the cooperation times; and calculating the urban technological innovation index according to the paper index, the scholars index, the institutions index and the international index.

Description

Urban science and technology innovation assessment method based on knowledge graph
Technical Field
The invention belongs to the technical field of urban planning management, and particularly relates to a knowledge-graph-based urban science and technology innovation assessment method.
Background
The evaluation of urban technological innovation is an important measure for measuring urban technological innovation capability, and has important significance for sustainable development and economic growth of cities. With the increasing importance of technological innovation, more and more cities are focusing on and enhancing their own technological innovation capabilities. However, how to accurately evaluate the technological innovation of a city, find the shortages and advantages of the city and put forward improvement strategies is always a difficult problem in the field of city planning and management.
At present, urban technological innovation assessment is mainly based on statistical data and expert experience, for example, data in each dimension are collected through a plurality of third party data sources for price reduction, the method has the problems of insufficient data sources, poor data quality, data lag and the like, and meanwhile, relations and effects among key elements of urban technological innovation are difficult to deeply mine, so that accuracy and reliability of urban technological innovation assessment are limited. Therefore, the research and development of an effective urban technological innovation assessment method has important significance.
Disclosure of Invention
In view of the above analysis, the embodiment of the invention aims to provide a knowledge graph-based urban technological innovation assessment method, which is used for solving the problems of low accuracy and reliability of urban technological innovation assessment in the prior art, and comprises the following steps:
basic corpus data acquisition: the method comprises the steps of obtaining a paper collection from a database as the basic corpus data;
constructing a knowledge graph: extracting an entity and a relation according to the basic corpus data, and constructing a scientific knowledge graph by taking the entity as a node and the relation as an edge;
acquiring the number of papers owned by a city, the first average quotation times of papers owned by the city, the number of authors, the second average quotation times of authors owned by the city in the paper collection, the number of institutions and the third average quotation times of institutions owned by the city in the paper collection according to the scientific knowledge graph;
calculating paper indexes according to the number of papers and the first average quotation times, calculating scholars indexes according to the number of authors and the second average quotation times, and calculating institution indexes according to the number of institutions and the third average quotation times;
acquiring the cooperation times of the city and the cities of other countries, and calculating an international index according to the cooperation times;
calculating an urban technological innovation index according to the paper index, the scholars index, the institutions index and the international index, wherein the calculation formula is expressed as follows:
wherein a+b+c+d=1, innovationindex represents an urban technological innovation index, PI represents an thesis index, SI represents a scholars index, isI represents an institutional index, and ItI represents an international index.
In some embodiments, the collection of papers includes at least the title of the paper, the author, and the organization to which the author belongs.
In some embodiments, the extracting an entity and a relationship according to the basic corpus data, taking the entity as a node, and taking the relationship as an edge to construct a scientific knowledge graph includes:
entity extraction, namely defining entity types including authors, institutions, papers and cities, and extracting entity attributes of each entity according to the basic corpus data;
relationship extraction, namely defining relationship types including 'author-paper', 'author-organization' and 'organization-city', and extracting relationship attributes of each relationship according to the basic language data number; and
and storing the entity attribute and the relation attribute into a Neo4j database.
In some embodiments, the extracting entity attributes of each entity according to the basic corpus data includes:
word vectorization is carried out on each author entity and each organization entity respectively;
and calculating the similarity between different author entities, and calculating the similarity between different mechanism entities, wherein a similarity calculation formula is expressed as follows:
wherein ,representing similarity, A representing a word vector of a first entity, B representing a word vector of a second entity of the same type as the first entity, and->Represents the modular length of A, < >>Representing the modular length of B, +.>Elements representing A, ">An element representing B;
if the similarity is greater than or equal to a preset threshold, deleting the first entity or the second entity, wherein the preset threshold is between 0.9 and 0.95.
In some embodiments, the performing word vectorization on each author entity and each organization entity includes:
obtaining author information and affiliated institution information of a paper, wherein the author information comprises names, affiliated institutions, research fields and profile information, and the affiliated institution information comprises institution names, institution profiles and main research fields of institutions;
carrying out word vectorization on an author entity according to the author information;
and carrying out word vectorization on the mechanism entity according to the affiliated mechanism information.
In some embodiments, the calculating the paper index from the number of papers and the first average number of references is expressed as:
wherein PI represents the thesis index, ++>Number of papers representing city, ++>Representing the first average number of references, log represents the log operation and std represents the normalization operation.
In some embodiments, the calculating the learner index based on the number of authors and the second average number of references is expressed as:
wherein SI represents a learner index, +.>Representing the number of authors in a city, < >>Represents the second average reference number, log represents the pairNumber operation, std indicates that normalization operation is performed.
In some embodiments, the calculating the institution index based on the institution number and the third average number of references, the calculating formula is expressed as:
wherein IsI denotes the institutional index, +.>Representing the number of institutions in a city, +.>Representing the third average number of references, log represents the log operation, std represents the normalization operation.
In some embodiments, the calculating the international index according to the number of cooperatives is expressed as:
wherein ItI represents the international index,indicating the number of cooperations.
In some embodiments, a=0.3, b=0.3, c=0.3, d=0.1.
The above embodiments of the present invention have at least the following beneficial effects:
1. the academic paper is adopted as a data source, so that the visual representation of the innovation result is realized, and the evaluation result is more objective and accurate.
2. The number and influence evaluation of urban innovation main bodies are combined, so that the urban innovation capability can be comprehensively reflected, and the method is more scientific compared with the traditional method.
3. The international innovation level of the city is reflected by counting the cooperation times of the city and other international cities by considering the international cooperation capability of the city, and the comprehensiveness and accuracy of the evaluation result are improved.
4. The urban technological innovation assessment system is designed, so that the technological innovation index of the city can be automatically calculated, and the assessment efficiency and accuracy are improved.
5. The logarithmic transformation and the standardization operation are adopted in the calculation process, so that the evaluation result is more scientific and accurate.
Drawings
In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present description, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
Fig. 1 is a schematic flow chart of an urban scientific innovation assessment method based on a knowledge graph according to an embodiment of the invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. It should be noted that embodiments and features of embodiments in the present disclosure may be combined, separated, interchanged, and/or rearranged with one another without conflict. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Furthermore, when the terms "comprises" and/or "comprising," and variations thereof, are used in the present specification, the presence of stated features, integers, steps, operations, elements, components, and/or groups thereof is described, but the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof is not precluded. It is also noted that, as used herein, the terms "substantially," "about," and other similar terms are used as approximation terms and not as degree terms, and as such, are used to explain the inherent deviations of measured, calculated, and/or provided values that would be recognized by one of ordinary skill in the art.
The knowledge graph construction and dynamic expansion method provided by the embodiment of the first aspect of the present invention is described below through specific embodiments. Referring to fig. 1, the urban scientific innovation assessment method based on the knowledge graph provided by the embodiment of the invention includes:
step one, basic corpus data acquisition: the method comprises the step of acquiring paper collection from a database as the basic corpus data.
Preferably, in some embodiments, the collection of papers includes at least the title of the paper, the author, and the institution to which the author belongs.
Specifically, the discourse is selected as input as desired. The method can customize a certain technical field, and the paper collection of a specific field can evaluate the technical innovation of the city in the field, and can also be a plurality of technical fields. The input paper collection needs to be organized in a specific format, including the information of paper title, author name, organization of author, abstract, keyword, etc. The goal of this step is to collect or acquire basic corpus-related data for assessing urban technological innovation capabilities.
Step two, constructing a knowledge graph: extracting an entity and a relation according to the basic corpus data, and constructing a scientific knowledge graph by taking the entity as a node and the relation as an edge.
Preferably, in some embodiments, the extracting an entity and a relationship according to the basic corpus data, taking the entity as a node, and taking the relationship as an edge, constructs a scientific knowledge graph, including:
entity extraction, namely defining entity types including authors, institutions, papers and cities, and extracting entity attributes of each entity according to the basic corpus data;
relationship extraction, namely defining relationship types including 'author-paper', 'author-organization' and 'organization-city', and extracting relationship attributes of each relationship according to the basic language data number; and
and storing the entity attribute and the relation attribute into a Neo4j database.
Specifically, a graph database Neo4j is used to store knowledge maps. Before storing entities and relationships, it is necessary to define the types and attributes of the entities and relationships. The "author", "organization", "paper", "city" entity types are defined, while attribute information of these entities, such as author name, organization name, paper title and abstract, etc., is defined. In addition, relationship types between entities, such as "author-paper", "author-organization" and "organization-city", etc., need to be defined. Preferably, in some embodiments, the "author-paper" and "author-organization" relationships are extracted through the fields of the author information in the paper (in the set of author information, there is a name of each author and the name of the organization at which it is located). The relationship of "organization-city" is obtained by geographic information location technique
And materializing the collected basic corpus data. One paper is a paper entity; the author of the paper and the institution where the author is located are materialized to obtain an author entity and an institution entity, and then the institution entity is positioned to the city through the geographic position to obtain a city entity.
After defining the entity and relationship types, the entity and relationship information extracted from the basic corpus information is stored in a Neo4j database. Specifically, each entity is represented as a node, and the node contains the type and attribute information of the entity; the relationship between entities is expressed as a single edge, which contains relationship type and attribute information.
Preferably, in some embodiments, the extracting entity attributes of each entity according to the basic corpus data includes:
word vectorization is carried out on each author entity and each organization entity respectively;
and calculating the similarity between different author entities, and calculating the similarity between different mechanism entities, wherein a similarity calculation formula is expressed as follows:
wherein ,representing similarity, A representing a word vector of a first entity, B representing a word vector of a second entity of the same type as the first entity, and->Represents the modular length of A, < >>Representing the modular length of B, +.>Elements representing A, ">An element representing B;
if the similarity is greater than or equal to a preset threshold, deleting the first entity or the second entity, wherein the preset threshold is between 0.9 and 0.95.
Preferably, in some embodiments, the performing word vectorization on each author entity and each organization entity includes:
obtaining author information and affiliated institution information of a paper, wherein the author information comprises names, affiliated institutions, research fields and profile information, and the affiliated institution information comprises institution names, institution profiles and main research fields of institutions;
carrying out word vectorization on an author entity according to the author information;
and carrying out word vectorization on the mechanism entity according to the affiliated mechanism information.
Specifically, at this stage, the physical ambiguity is identified and disambiguated using a similarity-based algorithm. Firstly, each entity is expressed as a vector, and for a learner entity, the name, the organization, the research field and the brief introduction information of the learner are subjected to word vectorization to obtain a vector to express the characteristics of the vector; for the institution entity, the name, profile and research area information of the institution is converted into a vector representation by word vectorization. And then respectively calculating the similarity between the scholars and between the institutions and the institutions. If the similarity reaches a predetermined threshold of 0.95, the entities are considered identical, only one of which is retained in the database.
The relationship between the entity elements can be modeled and expressed by constructing a scientific knowledge graph so as to describe the essence and structure of the knowledge more comprehensively and accurately, thereby providing a more complete and detailed basis for subsequent analysis and application. In the urban science and technology innovation assessment, the entity and the relation can be extracted from the paper collection by constructing the knowledge graph, so that the science and technology knowledge graph can be constructed, and the related situation of the city in the science and technology field, such as indexes of the number of papers, the number of authors, the number of institutions and the like owned by the city, and the relation and interaction among the indexes can be more intuitively known. In contrast, if statistics data is directly obtained from the thesaurus, only a single index or a simple relation can be obtained, complex relation and interaction between entity elements are difficult to comprehensively reflect, and problems such as information deletion or ambiguity are easy to occur. Therefore, the knowledge graph is constructed, a more comprehensive, accurate and reliable data basis can be provided for urban technological innovation assessment, and the assessment result is more scientific and reliable. The invention utilizes the knowledge graph technology to evaluate the urban technological innovation, and models and expresses the relation between the urban technological innovation and the knowledge elements, thereby improving the evaluation precision and reliability and simultaneously providing more detailed and deep analysis and guidance for urban technological innovation.
And thirdly, acquiring the number of papers owned by the city, the first average quotation times of papers owned by the city, the number of authors, the second average quotation times of authors owned by the city in the paper collection, the number of institutions and the third average quotation times of institutions owned by the city in the paper collection according to the scientific knowledge graph.
And step four, calculating paper indexes according to the number of papers and the first average reference times, calculating scholars indexes according to the number of authors and the second average reference times, calculating organization indexes according to the number of institutions and the third average reference times, obtaining the cooperation times of cities and other national cities, and calculating international indexes according to the cooperation times.
Specifically, in some embodiments, the first average number of citations may be a sum of citations of all papers of the city and a number of all papers of the city. The second average reference number represents the average reference value of the authors in a city; in a city, the sum of all paper citations of an author is the number of citations of the author; the second average number of references may be the sum of the reference values of all authors of the city and the number of all authors of the city. The third average reference number represents the average reference value of the institutions in one city; in a city, the sum of all paper citations for an organization is the number of citations for that organization; the third average number of references may be the sum of the reference values of all the establishments of the city and the number of all the establishments of the city.
Where the number of partnerships is defined as the number of international partnerships for two cities from different countries is increased by 1 whenever the authors of the two cities co-occur in one paper. Thus, the number of collaborations is a count of combinations of paper authors, where each combination consists of two authors from different countries' cities.
The international cooperation times of the city can be obtained by the following steps:
1. all authors in each paper are identified as the cities and countries to which they belong.
2. For each city, all international partners of the city are identified. These partners are cities from other countries and cooperate in the same paper with the authors of the city.
3. For all international partners of each city, the combination of the city and the partner is counted as 1 partner.
In some embodiments, the calculating the paper index from the number of papers and the first average number of references is expressed as:
wherein PI represents the thesis index, ++>Number of papers representing city, ++>Representing the first average number of references, log represents the log operation and std represents the normalization operation.
In some embodiments, the calculating the learner index based on the number of authors and the second average number of references is expressed as:
wherein SI represents a learner index, +.>Representing the number of authors in a city, < >>Representing the second average reference number, log represents the log operation, std represents the normalization operation.
In some embodiments, the calculating the institution index based on the institution number and the third average number of references, the calculating formula is expressed as:
wherein IsI denotes the institutional index, +.>Representing the number of institutions in a city, +.>Representing the third average number of references, log represents the log operation, std represents the normalization operation.
In some embodiments, the calculating the international index according to the number of cooperatives is expressed as:
wherein ItI represents the international index, +.>Indicating the number of cooperations.
Specifically, the data range can be narrowed down to a smaller value range by logarithmic operation transformation and normalization operation, and the data range can be conformed to normal distribution.
The dimensions of different features are unified through standardized operation, so that the features have comparability under the unified dimensions. The normalization includes two steps: centralizing and scaling.
Centering refers to moving the mean of the data to 0, i.e., subtracting the mean of each feature from that feature, such that the mean of the features is 0. This eliminates dimensional differences between different features so that the features are on the same horizontal line. The centralized formula is expressed as:
wherein X represents the original data, +.>Represents the data after centering, μ represents the mean value of X.
Scaling refers to scaling the data to a certain scale so that the individual features have the same scale, thereby avoiding the influence of variance differences between different features on the model. This embodiment employs z-score scaling.
The min-max scaling scales the data between [0,1], transformed by the following formula:
, wherein ,/>Representing the data after centeringXmin and Xmax represent the minimum and maximum values of data, respectively, +.>Representing the scaled data.
The z-score scale scales the data to a normal distribution with a mean of 0 and standard deviation of 1, transformed by the following formula:
, wherein ,/>Represents the data after centering, μ represents the mean value of the data, σ represents the standard deviation of the data, ++>Representing the scaled data.
Fifthly, calculating an urban technological innovation index according to the paper index, the scholars index, the institutions index and the international index, wherein a calculation formula is expressed as follows:
wherein a+b+c+d=1, innovationindex represents an urban technological innovation index, PI represents an thesis index, SI represents a scholars index, isI represents an institutional index, and ItI represents an international index.
In some embodiments, a=0.3, b=0.3, c=0.3, d=0.1.
And step six, ranking and visualizing the urban technological innovation indexes.
Specifically, this embodiment ranks all cities according to the previously calculated city technological innovation index, and ranks the cities from high to low according to the innovation index. Meanwhile, the embodiment also visualizes the knowledge graph, and clearly displays the association between the innovation entity and the city and the effect and weight of each index in the scientific and technological innovation evaluation system in a visualized chart mode.
Specifically, the system adopts a data visualization library Matplotlib to draw charts such as bar charts or thermodynamic diagrams so as to display the scientific and technological innovation ranking condition of the city and the weight and contribution degree of each index in an intuitive way. Meanwhile, the system can also use a geographic information visualization library, such as Folium and the like, to visualize the urban technological innovation index on a map so as to conveniently analyze and compare the geographic position and distribution of the city.
By the method, users can quickly know the ranking condition and strength level of the technological innovations of each city, and meanwhile, the effect and contribution degree of each index in the technological innovation evaluation system can be clearly known. In addition, the visualization of the knowledge graph can also help the user to better understand the association and effect between the innovation entities so as to facilitate deeper analysis and research, thereby providing more accurate and powerful support for technological innovation decisions.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of function in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1. The urban technological innovation assessment method based on the knowledge graph is characterized by comprising the following steps of:
basic corpus data acquisition: the method comprises the steps of obtaining a paper collection from a database as the basic corpus data;
constructing a knowledge graph: extracting an entity and a relation according to the basic corpus data, and constructing a scientific knowledge graph by taking the entity as a node and the relation as an edge;
acquiring the number of papers owned by a city, the first average quotation times of papers owned by the city, the number of authors, the second average quotation times of authors owned by the city in the paper collection, the number of institutions and the third average quotation times of institutions owned by the city in the paper collection according to the scientific knowledge graph;
calculating paper indexes according to the number of papers and the first average quotation times, calculating scholars indexes according to the number of authors and the second average quotation times, and calculating institution indexes according to the number of institutions and the third average quotation times; acquiring the cooperation times of the city and the cities of other countries, and calculating an international index according to the cooperation times;
calculating an urban technological innovation index according to the paper index, the scholars index, the institutions index and the international index, wherein the calculation formula is expressed as follows:
wherein a+b+c+d=1, innovationindex represents an urban technological innovation index, PI represents an thesis index, SI represents a scholars index, isI represents an institutional index, and ItI represents an international index.
2. The knowledge-graph-based urban technological innovation assessment method according to claim 1, wherein the method is characterized in that: the collection of papers includes at least the title, author, and organization to which the author belongs.
3. The knowledge-graph-based urban technological innovation assessment method according to claim 2, wherein the method is characterized in that: the extracting the entity and the relation according to the basic corpus data, taking the entity as a node and taking the relation as an edge to construct a scientific knowledge graph comprises the following steps:
entity extraction, namely defining entity types including authors, institutions, papers and cities, and extracting entity attributes of each entity according to the basic corpus data;
relationship extraction, namely defining relationship types including 'author-paper', 'author-organization' and 'organization-city', and extracting relationship attributes of each relationship according to the basic language data number; and storing the entity attribute and the relationship attribute in a Neo4j database.
4. The knowledge-based urban technological innovation assessment method according to claim 3, wherein the method is characterized in that: the extracting the entity attribute of each entity according to the basic corpus data comprises the following steps:
word vectorization is carried out on each author entity and each organization entity respectively;
and calculating the similarity between different author entities, and calculating the similarity between different mechanism entities, wherein a similarity calculation formula is expressed as follows:
wherein ,representing similarity, A representing a word vector of a first entity, B representing a word vector of a second entity of the same type as the first entity, and->Represents the modular length of A, < >>Representing the modular length of B, +.>Elements representing A, ">An element representing B;
if the similarity is greater than or equal to a preset threshold, deleting the first entity or the second entity, wherein the preset threshold is between 0.9 and 0.95.
5. The knowledge-graph-based urban technological innovation assessment method as claimed in claim 4, wherein the method is characterized in that: the performing word vectorization on each author entity and each organization entity respectively comprises the following steps:
obtaining author information and affiliated institution information of a paper, wherein the author information comprises names, affiliated institutions, research fields and profile information, and the affiliated institution information comprises institution names, institution profiles and main research fields of institutions;
carrying out word vectorization on an author entity according to the author information;
and carrying out word vectorization on the mechanism entity according to the affiliated mechanism information.
6. The knowledge-graph-based urban technological innovation assessment method according to claim 1, wherein the method is characterized in that: the paper index is calculated according to the paper quantity and the first average reference times, and the calculation formula is expressed as:
wherein PI represents the thesis index, ++>Number of papers representing city, ++>Representing the first average number of references, log represents the log operation and std represents the normalization operation.
7. The knowledge-graph-based urban technological innovation assessment method according to claim 1, wherein the method is characterized in that: and calculating a scholars index according to the number of authors and the second average reference times, wherein a calculation formula is expressed as follows:
wherein SI represents a learner index, +.>Representing the number of authors in a city, < >>Representing the second average reference number, log represents the log operation, std represents the normalization operation.
8. The knowledge-graph-based urban technological innovation assessment method according to claim 1, wherein the method is characterized in that: and calculating an institution index according to the institution number and the third average reference number, wherein a calculation formula is expressed as follows:
wherein IsI denotes the institutional index, +.>Representing the number of institutions in a city, +.>Representing a third average referenceThe number of times, log, represents the log operation, std represents the normalization operation.
9. The knowledge-graph-based urban technological innovation assessment method according to claim 1, wherein the method is characterized in that: and calculating an international index according to the cooperation times, wherein a calculation formula is expressed as follows:
wherein ItI represents the international index,indicating the number of cooperations.
10. The knowledge-graph-based urban technological innovation assessment method according to claim 1, wherein the method is characterized in that: a=0.3, b=0.3, c=0.3, d=0.1.
CN202311025040.7A 2023-08-15 2023-08-15 Urban science and technology innovation assessment method based on knowledge graph Pending CN116739085A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311025040.7A CN116739085A (en) 2023-08-15 2023-08-15 Urban science and technology innovation assessment method based on knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311025040.7A CN116739085A (en) 2023-08-15 2023-08-15 Urban science and technology innovation assessment method based on knowledge graph

Publications (1)

Publication Number Publication Date
CN116739085A true CN116739085A (en) 2023-09-12

Family

ID=87906474

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311025040.7A Pending CN116739085A (en) 2023-08-15 2023-08-15 Urban science and technology innovation assessment method based on knowledge graph

Country Status (1)

Country Link
CN (1) CN116739085A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110825780A (en) * 2019-09-18 2020-02-21 广州都市圈网络科技有限公司 Innovative resource assessment method and device based on aggregation algorithm
CN111667164A (en) * 2020-05-28 2020-09-15 广州珠江黄埔大桥建设有限公司 Enterprise scientific and technological innovation capability evaluation optimization method, system and storage medium
CN112836060A (en) * 2019-11-25 2021-05-25 中国科学技术信息研究所 Map construction method and device for scientific and technological innovation data
CN113515644A (en) * 2021-05-26 2021-10-19 中国医学科学院医学信息研究所 Hospital science and technology portrait method and system based on knowledge graph

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110825780A (en) * 2019-09-18 2020-02-21 广州都市圈网络科技有限公司 Innovative resource assessment method and device based on aggregation algorithm
CN112836060A (en) * 2019-11-25 2021-05-25 中国科学技术信息研究所 Map construction method and device for scientific and technological innovation data
CN111667164A (en) * 2020-05-28 2020-09-15 广州珠江黄埔大桥建设有限公司 Enterprise scientific and technological innovation capability evaluation optimization method, system and storage medium
CN113515644A (en) * 2021-05-26 2021-10-19 中国医学科学院医学信息研究所 Hospital science and technology portrait method and system based on knowledge graph

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李妍;幸雯;: "广东科技创新指数构建与评价研究", 科技管理研究, no. 20 *

Similar Documents

Publication Publication Date Title
Kallner Laboratory statistics: methods in chemistry and health sciences
Olczyk A systematic retrieval of international competitiveness literature: a bibliometric study
KR101957760B1 (en) System for estimating market price of real estate using sales cases determined based on similarity score and method thereof
Zhou et al. Context-aware sampling of large networks via graph representation learning
CN106776672A (en) Technology development grain figure determines method
CN116848490A (en) Document analysis using model intersection
CN110705307A (en) Information change index monitoring method and device, computer equipment and storage medium
JP7027419B2 (en) Technological emergence scoring and analysis platform
Yurtcu et al. Bibliometric analysis of articles on computerized adaptive testing
CN116629258B (en) Structured analysis method and system for judicial document based on complex information item data
CN116739085A (en) Urban science and technology innovation assessment method based on knowledge graph
JPWO2009113494A1 (en) Question answering system capable of descriptive answers using WWW as information source
Haining Data problems in spatial econometric modeling
Alguliyev et al. Development of a Decision Support System with the use of OLAP-Technologies in the National Terminological Information Environment.
Wen et al. Measuring 3D process plant model similarity based on topological relationship distribution
EP4359958A1 (en) A system and method for examining relevancy of documents
Riyantoko Southeast Asia happiness report in 2020 using exploratory data analysis
An et al. Data-driven pattern analysis of acknowledgments in the biomedical domain
CN111325235B (en) Multilingual-oriented universal place name semantic similarity calculation method and application thereof
CN110033862B (en) Traditional Chinese medicine quantitative diagnosis system based on weighted directed graph and storage medium
Ericson et al. Regional house price index construction–the case of Sweden
Supriya et al. House Price Prediction System using Machine Learning Algorithms and Visualization
CN116561605B (en) Method, device, equipment and medium for clustering research interest graphs of document completers
Sailesh et al. Context driven data mining to classify students of higher educational institutions
Gubareva Extracting temporal patterns from smart city data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination