CN115309885A - Knowledge graph construction, retrieval and visualization method and system for scientific and technological service - Google Patents

Knowledge graph construction, retrieval and visualization method and system for scientific and technological service Download PDF

Info

Publication number
CN115309885A
CN115309885A CN202211030854.5A CN202211030854A CN115309885A CN 115309885 A CN115309885 A CN 115309885A CN 202211030854 A CN202211030854 A CN 202211030854A CN 115309885 A CN115309885 A CN 115309885A
Authority
CN
China
Prior art keywords
scientific
technological
retrieval
knowledge
resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211030854.5A
Other languages
Chinese (zh)
Inventor
费敏锐
吕泽昊
周文举
易开祥
徐昱琳
王海宽
李晨辉
沈赟怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN202211030854.5A priority Critical patent/CN115309885A/en
Publication of CN115309885A publication Critical patent/CN115309885A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a knowledge graph construction, retrieval and visualization method and a system for scientific and technological service, wherein the method comprises the following steps: acquiring resource metadata of a scientific and technological service platform and preprocessing the metadata; performing data cleaning and knowledge extraction on the metadata to obtain a scientific and technological resource entity list, an entity attribute list and an entity relation list; carrying out knowledge correction on the scientific and technological resource entities; calculating the membership degree of the screened scientific and technological resource entities; importing the processed scientific and technological resources into a knowledge map database; determining scientific and technological resource categories, and designing accurate filtering conditions for user retrieval; responding to the personalized retrieval requirement of the user and completing the query operation; and presenting the user retrieval content in a visual map form. According to the invention, scientific and technological service resources on the scientific and technological service platform are subjected to knowledge mapping construction and storage, personalized retrieval service and visual presentation effect are provided for platform users, dynamic requirements of the users are responded more flexibly, and the resource conversion rate of the scientific and technological service platform is improved.

Description

Knowledge graph construction, retrieval and visualization method and system for scientific and technological service
Technical Field
The invention relates to the field of modern service industry, in particular to the field of scientific and technological services, and specifically relates to a method and a system for establishing, retrieving and visualizing a knowledge graph of scientific and technological services.
Background
The scientific and technological service industry is one of the important components of the modern service industry and is a new industry for providing services for the scientific and technological innovation of a whole chain. Compared with the foreign science and technology service industry, china has the characteristics of late start and high development speed in the industry and is at the stage of small overall development scale and high growth speed at present.
The Knowledge Graph was originally derived from Google Knowledge Graph, which describes entities, events, attributes and their relationships in the objective world in a structured form, expressing information in a form more close to human cognition. Meanwhile, in the aspect of data storage, the graph database is extended to model and manage data in a simple and visual mode, so that data units can be more conveniently miniaturized and normalized, rich relation connection is realized, and complex relations among data can be more clearly described. This technology provides us with the ability to better organize, manage, and understand vast amounts of information.
The knowledge map technology is used for scientific and technological services, fusion and supplement of various information such as knowledge, behaviors and data can be achieved, knowledge map modeling is conducted on scientific and technological service resources, the intelligent retrieval service is accurate, scene-oriented and personalized, and the conversion rate of the scientific and technological service resources on a scientific and technological service platform can be effectively improved.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method and a system for constructing, retrieving and visualizing a knowledge graph for scientific and technological services, which have the advantages of high accuracy, good flexibility and high resource conversion rate.
In order to achieve the above purpose, the knowledge graph construction, retrieval and visualization method and system for scientific and technological services of the invention are as follows:
the knowledge graph construction, retrieval and visualization method for scientific and technological services is mainly characterized by comprising the following steps of:
(1) Collecting data of resources of a scientific and technological service platform and preprocessing word segmentation and part-of-speech tagging;
(2) Performing data cleaning and knowledge extraction on the preprocessed metadata to obtain a scientific and technological resource entity list, an entity attribute list and an entity relation list;
(3) Carrying out knowledge correction on the scientific and technological service resource entity to determine scientific and technological service resources capable of being introduced into a knowledge map;
(4) Performing resource membership calculation on the scientific and technological service resources which are screened and determined to be led into the knowledge graph;
(5) Importing the processed scientific and technological service resources into a knowledge map database;
(6) Determining the category of scientific and technological resources, and designing precise filtering conditions for user retrieval;
(7) Responding to the personalized retrieval requirements of the user, and finishing query operation in a knowledge map database;
(8) The user retrieval content is presented in a visual map form, and the user retrieval experience is improved.
Preferably, the scientific and technological service platform is a scientific and technological service resource sharing platform with a scientific and technological service transaction function, and supports users to register and log in according to roles, wherein the roles comprise a scientific and technological service provider and a scientific and technological service demander.
Preferably, the scientific and technological service platform resource metadata comprise service providers, service products, instruments and equipment, park services, intellectual property, investment and experts and the like.
A scientific and technological service knowledge graph constructing, retrieving and visualizing system applied to the scientific and technological service knowledge graph constructing, retrieving and visualizing method comprises the following steps: the system comprises a data acquisition and preprocessing module, a knowledge extraction module, a knowledge correction module, a membership calculation module, a map import module, a retrieval resource classification module, a resource retrieval module and a visualization module;
the data acquisition and preprocessing module is used for acquiring resource information of a scientific and technological service platform, dividing the resource information into structured data, semi-structured data and unstructured data, and meanwhile, preprocessing the semi-structured data and the unstructured data by word segmentation and part-of-speech tagging;
the knowledge extraction module is used for performing data cleaning and traversal operation on the structured data and performing knowledge extraction operation on the semi-structured data and the unstructured data, and specifically comprises entity identification, relationship extraction and attribute extraction;
the knowledge correction module is used for performing knowledge correction on the cleaned scientific and technological resources and determining complete scientific and technological resource information which can be imported into a knowledge map database;
the membership calculation module is used for calculating the resource membership of the scientific and technological service resources which are determined to be imported into the knowledge map after knowledge correction;
the map importing module is used for importing the scientific and technological resources subjected to data processing into a knowledge map database;
the retrieval resource classification module is used for classifying scientific and technological resources on the platform and determining filtering conditions during user retrieval;
the resource retrieval module is used for responding to a user retrieval request, generating a corresponding query statement and completing query operation in a knowledge graph database;
the visualization module is used for presenting the retrieval result of the user in a knowledge graph visualization mode in the front-end interface.
By adopting the knowledge graph construction, retrieval and visualization method and system for scientific and technological services, scientific and technological service resources on a scientific and technological service platform are subjected to knowledge graph storage, and compared with the traditional relational database storage, the method and system have the advantages that the structured data are stored in a network to form a graph instead of a table, and the storage efficiency of the scientific and technological resources is greatly improved. The invention uses the advantages of the nodes and the relational architecture in the knowledge graph for the scientific and technological service resource retrieval, and can accurately capture the personalized requirements of the user through the retrieval mode, thereby improving the matching degree of the supply and demand resources of the platform and improving the transaction success rate of the scientific and technological service resources of the platform.
Drawings
FIG. 1 is a flow chart of a method for knowledge graph construction, retrieval and visualization for scientific and technical services in accordance with the present invention.
FIG. 2 is a schematic diagram of the knowledge-graph construction, retrieval and visualization system for scientific and technical services according to the present invention.
FIG. 3 is a schematic flow chart of the method for knowledge-graph construction, retrieval and visualization for scientific and technical services of the present invention.
Detailed Description
In order to more clearly describe the technical contents of the present invention, the following further description is given in conjunction with specific embodiments.
The invention relates to a knowledge graph construction, retrieval and visualization method for scientific and technological services, which comprises the following steps:
(1) Acquiring data metadata of scientific and technological service platform resources and preprocessing word segmentation and part-of-speech tagging;
(2) Performing data cleaning and knowledge extraction on the preprocessed metadata to obtain a scientific and technological resource entity list, an entity attribute list and an entity relation list;
(3) Carrying out knowledge correction on the scientific and technological service resource entity to determine scientific and technological service resources capable of being introduced into a knowledge map;
(4) Performing resource membership calculation on the scientific and technological service resources which are screened and determined to be led into the knowledge graph;
(5) Importing the processed scientific and technological service resources into a knowledge map database;
(6) Determining scientific and technological resource categories, and designing accurate filtering conditions for user retrieval;
(7) Responding to the personalized retrieval requirements of the user, and finishing query operation in a knowledge map database;
(8) The user retrieval content is presented in a visual map form, and the user retrieval experience is improved.
Preferably, the scientific and technological service platform is a scientific and technological service resource sharing platform with a scientific and technological service transaction function, and supports users to register and log in according to roles, wherein the roles comprise a scientific and technological service provider and a scientific and technological service demander.
Preferably, the scientific and technological service platform resource metadata comprise service providers, service products, instruments and equipment, park services, intellectual property, investment and experts and the like.
A scientific and technological service knowledge graph constructing, retrieving and visualizing system applied to the scientific and technological service knowledge graph constructing, retrieving and visualizing method comprises the following steps: the system comprises a data acquisition and preprocessing module, a knowledge extraction module, a knowledge correction module, a membership calculation module, a map import module, a retrieval resource classification module, a resource retrieval module and a visualization module;
the data acquisition and preprocessing module is used for acquiring resource information of a scientific and technological service platform, dividing the resource information into structured data, semi-structured data and unstructured data, and meanwhile, preprocessing the semi-structured data and the unstructured data by word segmentation and part-of-speech tagging;
the knowledge extraction module is used for performing data cleaning and traversal operation on the structured data and performing knowledge extraction operation on the semi-structured data and the unstructured data, and specifically comprises entity identification, relationship extraction and attribute extraction;
the knowledge correction module is used for carrying out knowledge correction on the cleaned scientific and technological resources and determining complete scientific and technological resource information which can be imported into a knowledge map database;
the membership calculation module is used for calculating the resource membership of the scientific and technological service resources which are determined to be imported into the knowledge map after knowledge correction;
the map importing module is used for importing the scientific and technological resources subjected to data processing into a knowledge map database;
the retrieval resource classification module is used for classifying scientific and technological resources on the platform and determining filtering conditions during user retrieval;
the resource retrieval module is used for responding to a user retrieval request, generating a corresponding query statement and completing query operation in a knowledge graph database;
the visualization module is used for presenting the retrieval result of the user in a knowledge graph visualization mode in the front-end interface.
In a specific embodiment of the present invention, a method for constructing, retrieving and visualizing a knowledge graph for scientific and technological services is provided, as shown in fig. 1 and 3, the method of the present invention is implemented and operated according to the following steps:
the method comprises the following steps of firstly, collecting resource metadata of a scientific and technological service platform and preprocessing word segmentation and part of speech tagging.
Collecting scientific and technological resource metadata from a scientific and technological service platform resource library, and dividing the metadata into structured data, semi-structured data and unstructured data. The structured data comprises table data, a relational database and the like, the semi-structured data comprises log files, XML documents, JSON documents and the like, and the unstructured data comprises office documents, various reports, texts and the like in all formats.
Meanwhile, preprocessing word segmentation and part-of-speech tagging are carried out on the semi-structured data and the unstructured data, and the word segmentation and part-of-speech tagging are important preparations for completing complex NLP tasks such as knowledge extraction and the like;
the method is different from the Latin language system represented by English, letters are used as basic structures of words, spaces are used as separators between different words, chinese character sequences are firstly segmented on Chinese sentences during Chinese meaning identification, and the segmented sentences are divided into independent words so that a machine can correctly understand text meanings;
determining the part of speech of each word in each sentence in the text through part of speech tagging, wherein the types of the parts of speech comprise noun (/ n), verb (/ v), verb (/ vn), adjective (/ a), adverb (/ d) and the like;
the specific word segmentation and part-of-speech tagging steps are as follows:
step 1.1: determining a corresponding industry corpus dictionary according to industry categories to which different scientific and technological resources belong, and matching texts to be segmented with entries in the dictionary;
step 1.2: setting the starting length of the longest match in the forward direction, dividing the dictionary according to the lengths of different Chinese characters, taking words according to the sequence from front to back, and searching the dictionary relative to the number of the Chinese characters during each match;
step 1.3: setting the starting length of the reverse longest matching, dividing the dictionary according to the lengths of different Chinese characters, taking words according to the sequence from back to front, and searching the dictionary relative to the number of the Chinese characters during each matching;
step 1.4: determining a preference rule, and if the word number after the forward longest matching and the reverse longest matching are segmented is the same, taking the minimum of the non-dictionary words and the single words as a segmentation result;
step 1.5: performing part-of-speech tagging according to the selected industry corpus dictionary, and designing a characteristic template;
step 1.6: creating a blank part-of-speech annotator, setting relevant parameters by using a conditional random field model and finishing model training;
step 1.7: and inputting the text subjected to word segmentation to perform a part-of-speech tagging task.
Specifically, taking the introduction of a certain item in the platform as an example to perform data preprocessing: "project in Shenzhen, aiming at the Internet APP project developed by schools and communities, is in the research and development stage at present, has 300 ten thousand equity financing, and has capital mainly used for research and development of products, perfection, and development to the whole country in the future with the small-scale popularization and commissioning of local markets in the later period. "
The results obtained after word segmentation and part-of-speech tagging are as follows: "[ project/n, in/p, shenzhen/ns,/w, for/p, school/n,/w, community/n, development/v,/uj, internet/n, APP/nx, project/n,/w, now/t, in/v, development/j, stage/n,/w, equity/n, financing/vn, 300 ten thousand/m,/w, fund/n, main/b, for/v, product/n, uj, development/j,/w, perfection/v,/w, heel/p, late/f, local/s, market/n, of/uj, small/a, range/n, promotion/v,/w, line/v,/w, future/t, re/d, to/p, nationwide/n, development/vn. And w.
And secondly, performing data cleaning and knowledge extraction on the preprocessed metadata to obtain a scientific and technological resource entity list, an entity attribute list and an entity relation list.
And performing data cleaning and traversing operation on the structured data obtained in the first step to obtain a scientific and technological resource entity list, an entity attribute list and an entity relation list.
And performing knowledge extraction on the semi-structured data and the structured data obtained in the first step, determining a named entity, entity attributes and a relationship among entities, and further obtaining a scientific and technological resource entity list, an entity attribute list and a relationship list among entities.
The entities to be extracted in the invention are named entities related to scientific and technical service resources, such as service providers, service products, instruments, parks, patents, investment institutions, experts and the like. Each named entity extracted may be eventually added to the knowledge-graph as a related entity of the science and technology service.
The entity attributes to be extracted in the present invention are various attributes of different scientific and technological service resources, such as patent inventors, patent applicants, patent agencies, all organizations of instruments and equipment, service products provided by service providers, units to which experts belong, and the like, and each extracted entity attribute may be finally added to the knowledge graph as a related entity attribute of the scientific and technological service.
In the invention, the relationships among the entities to be extracted are various relationships among different scientific and technological service resources, for example, the patent owned by a certain scientific research institution, the technology owned by a certain expert, the cooperation relationship between a certain investor and a certain college, the provision of a certain scientific and technological service product by a certain service provider, and the like, and each extracted entity relationship can be finally added into the knowledge graph as the related entity relationship of the scientific and technological service.
Optionally, the entity attribute is regarded as a part-of-speech relationship between the entity and the attribute value, and the attribute extraction task is converted into a relationship extraction task.
Optionally, a supervised Lattice-LSTM Chinese naming recognition algorithm model is adopted to realize the extraction of the scientific and technological service named entities:
step 2.1: manually labeling scientific and technological resource data to be extracted, carrying out Embedding operation on the data, and extracting text context characteristics;
step 2.2: training the preprocessed text by using a Word vector training tool Word2Vec to construct Word vectors corresponding to text characters;
step 2.3: calculating and acquiring context relation expression vectors by using an LSTM neural network layer, and finally predicting the expression vectors as features through a CRF label reasoning layer to obtain a label classification result, wherein 'B' represents a starting word, 'E' represents an ending word, and 'S' represents a single word;
step 2.4: taking 70% of labeled corpus as a training set, 30% of corpus as a test set, setting the dimensionality of a word vector to be 8logN, setting the dimensionality of a word table to be N, setting the number of hidden layer neurons of an LSTM model to be 150, and realizing naming identification of an entity based on a Lattice-LSTM model;
step 2.5: extracting the relation between the entities of different scientific and technological resources, obtaining the semantic similarity between the relation and the relation words through cosine similarity calculation, and selecting the relation words with high semantic similarity as the relation between the entities;
step 2.6: on the basis of extracting the relation based on the semantic similarity, the relation is manually verified, the description of the correction language is perfected, and the name attribute and other characteristics of the relation are standardized.
And thirdly, carrying out knowledge correction on the scientific and technological service resource entity to determine the scientific and technological service resource which can be imported into the knowledge map.
Scientific and technological resource data extracted from a large-scale multi-source heterogeneous database of a scientific and technological service platform have the characteristics of data repetition, unclear reference and the like, so that the work of knowledge fusion, reference disambiguation and the like on scientific and technological resources before a knowledge graph is constructed is very necessary.
After the scientific and technological resource entities are obtained, different expression forms of the same entities of data from different sources are fused to complete entity identification, relationship linkage and ontology generation, for example, two patents with the same patent number and publication number are repeated patents, and only one repeated patent is reserved.
Optionally, a Limes tool is used for knowledge correction, limes is an entity matching discovery framework based on a metric space, and is suitable for large-scale data link, a programming language is Java, and the specific steps are as follows:
step 3.1: compiling a configuration file which comprises information such as a data source, a fusion algorithm, fusion conditions and the like;
step 3.2: a source data set S, a target data set T and a threshold value theta are given;
step 3.3: calculating the distance m (S, E) between S belonging to S and E belonging to E, filtering by using a triangle inequality, and filtering out an entity pair (S, t) of m (S, E) -m (E, t) > theta;
step 3.4: calculating the distance m (s, t) of the remaining entity pair (s, t), and storing the distance m (s, t) in a user-specified format;
step 3.5: and completing knowledge fusion according to the entity similarity, and determining scientific and technological service resource data which can be imported into a knowledge map database.
And fourthly, calculating the resource membership degree of the scientific and technological service resources which are screened and determined to be led in the knowledge graph.
The system is developed based on a scientific and technological resource sharing service platform, the scientific and technological service sharing platform converges various scientific and technological resource information required by development of medium-sized and small enterprises on the basis of a traditional platform, has the characteristics of wide data sources, disordered data standards and various related fields, and needs to calculate the resource membership degree of the overall scientific and technological resources around platform operation and unified resource allocation in order to meet the platform service development requirements;
taking a Hainan scientific and creative island platform as an example, the Hainan scientific and creative island comprises four industries including modern service industry, ocean economy, international medical care and cultural tourism, resource classification is carried out on scientific and technological resource entities in the platform, and the membership degrees of different industries to which the scientific and technological resource entities belong are calculated for offline resource classified use and online resource classified display of the platform;
the method is used for constructing an open ecological service type industrial system which is dominant in precise Hainan modern service industry, marine economy, medical care and cultural tourism and serving the high-quality development of the modern economic system of the self-trade area, and the method is realized through five development modes of a platform: the integrated scientific and technical service development of the self-trade district of Hainan is promoted by a centralized type of Hainan scientific and technical park service, a distributed type of Hainan village and town scientific and technical service, a customized type of Hainan enterprise service, a professional type of Hainan characteristic industry service and an imported type of island surgical technical service resources together, the scientific and technical resource entities in the platform are subjected to resource classification, the membership degrees of different development modes of the scientific and technical resource entities are calculated, and the platform is used for offline resource classification and online resource classification display;
the resource membership calculation process is specifically described by taking resource membership calculation of four major industries as an example:
step 4.1: acquiring word segmentation and part-of-speech tagging results of each technical entity in the technical resource entity list obtained in the step one;
step 4.2: screening and segmenting word segmentation results of the scientific and technological resource entity list according to the stop word dictionary;
step 4.3: carrying out weight accumulation on each technical entity by means of the classification weight of each technical resource vocabulary in the database of the four industries of the scientific island platform;
step 4.4: calculating the average value and standard deviation of the original data, and performing normalization processing on the data by dividing the difference obtained by subtracting the average value from the original data by the standard deviation;
step 4.5: establishing a relationship between the calculation result serving as the membership degree attribute of the four major industries and the original scientific and technological entity;
step 4.6: and sequencing the membership degrees of the four industries to which the scientific and technological resources belong, and guiding background data scheduling and foreground operation management of the scientific and creative island platform.
And fifthly, importing the processed scientific and technological service resources into a knowledge map database.
Neo4j is a high-performance NOSQL graph database that stores structured data on a network as a graph rather than a table, and can also be viewed as a high-performance graph engine with all the features of a full-fledged database.
Graph databases contain two basic data types in one data unit: nodes and Relationships. Nodes and Relationships contain attributes in the form of key/value, and Nodes are connected through Relationships defined by Relationships to form a relational network structure.
Optionally, the Neo4j graph database is used to store the acquired scientific and technological resource entities and the relationships between the entities, the processed names, attributes and relationships of the scientific and technological resources are sequentially read and traversed, and a Neo4j graph database script is operated to create the scientific and technological resource entities and construct the relationships between the scientific and technological resource entities.
The specific process of importing Neo4j database is as follows:
step 5.1: generating a csv file from the processed scientific and technological resource data by using tools such as Excsl and MySql;
step 5.2: placing the csv file into an import folder in Neo4j, and reading data in Neo4jDesktop by using a Load statement;
step 5.3: creating scientific and technological resource entity nodes, entity attributes and relationships among entities by using Create statements;
step 5.4: according to the data size, the required node attributes are selected to establish indexes, and the efficiency of large-scale data retrieval is improved;
step 5.5: and testing whether the nodes and the relations are successfully imported or not by using a Match statement.
And sixthly, determining the category of scientific and technological resources and designing accurate filtering conditions for user retrieval.
When a user retrieval behavior is responded, in order to determine user requirements in a refined mode, scientific and technological service resources are presented to the user in a precise mode, and two layers of retrieval categories are divided according to the types of the scientific and technological resources in a scientific and technological service platform: a first-level retrieval category and a second-level retrieval category;
determining the contents of a first-level retrieval category and a second-level retrieval category of different scientific and technological resources according to the names of the scientific and technological resource entities imported into a graph database, wherein the specific retrieval categories relate to the following tables:
first-level retrieval category of scientific and technological resources Second level search category of scientific and technological resources
Expert Expert talent, colleges and universities and scientific research team
Technique of Technical services, papers, patents, other intellectual property rights
Instrument for measuring the position of a moving object Instrument device, instrument service
Finance Investor, investment project, investment fund, investment organization
Policy Policy information and policy consultation
Park area Park, park service
Product(s) Service product and service provider
In order to furthest mine and meet the personalized requirements of users, based on a node-relation-node architecture in a knowledge graph, a filtering condition during user retrieval is established, specifically comprising retrieval object content, relation object content associated with the retrieval object, and a relation between the retrieval object and the relation object, as shown in the following table:
Figure BDA0003817308840000091
specifically, the search object content and the relationship object content include subunit contents including: the belonged first-level retrieval category, the belonged second-level retrieval category, the retrieval (relation) object content and the retrieval (relation) object attribute are shown in the following table:
first-level search categories Category of second level search Retrieving (relational) object content Retrieving (relational) object properties
And seventhly, responding to the personalized retrieval requirements of the user, and finishing the query operation in the knowledge graph database.
Aiming at the personalized retrieval requirements of different users, the relationship between different scientific and technological resources can be clearly explored by utilizing the unique complex relationship network of the knowledge graph, and the retrieval effect which cannot be achieved by the traditional relational database is achieved in fact by taking the relationship as a retrieval object.
The main idea of the query step is as follows:
step 7.1: determining the contents of a scientific and technological service resource retrieval object, primary and secondary retrieval categories and the attributes of the contents of the retrieval object by a user according to requirements;
step 7.2: the user determines other scientific and technological service resource retrieval contents associated with the retrieval contents according to the actual requirements of the user, and determines other scientific and technological service resource retrieval content attributes and primary and secondary retrieval categories associated with the retrieval contents;
step 7.3: the background responds to the user retrieval conditions and generates corresponding Cypher query sentences;
step 7.4: and searching the Neo4j database by using the generated Cypher query sentence, and returning a search result.
The following two examples specifically describe the user search process:
example one:
the user needs to provide a Shanghai region service provider of the face recognition access control system, and the service provider is required to provide service for Shanghai university, and according to the need, the searching conditions are determined as follows:
Figure BDA0003817308840000092
after receiving a user retrieval instruction, the background generates a corresponding Cypher statement according to the template:
match (n: service) - [: co- (p: colleges and universities)
Name = 'entrance guard system' and n.location = 'shanghai' and p.name = 'university of shanghai'
Return n,Count(n);
The system uses the generated Cypher sentences to realize the retrieval of the Neo4j database and returns the retrieved result.
Example two:
the user has medical endoscope detection technology, seeks for investment of 1500 ten thousand, requires investment merchants to be located in Beijing, have cooperative projects with scientific research institutions of Shanghai university and have participated in the construction of medical equipment industry parks, and determines the retrieval conditions according to the requirements as follows:
Figure BDA0003817308840000101
after receiving a user retrieval instruction, the background generates a corresponding Cypher statement according to the template:
match (p: scientific research institution) - [: collaboration ] - (n: investor) - [: providing ] - (q: park service)
Where n.name = 'medical endoscope' and n.location = 'beijing' and p.name = 'Shanghai university' and q.name = 'medical device'
Return n,Count(n);
The system uses the generated Cypher sentences to realize the search of the Neo4j database and returns the searched result.
And eighthly, presenting the user retrieval content in a visual map form, and improving the user retrieval experience.
And establishing scientific and technological resource retrieval filtering conditions through the sixth step and the seventh step, displaying the retrieval content of the user in a knowledge map visual form after completing a user retrieval instruction, and recommending scientific and technological resource services for the user more accurately to improve the trading success rate of the scientific and technological resources of the platform.
Under the form of a visual map, various external contacts of the retrieved scientific and technological resources and contacts among different scientific and technological resources can be presented in all directions by retrieval in any form, so that the insight which cannot be provided by the traditional texts, tables and pictures is provided, and the decision efficiency is improved for users.
Preferably, the front-end knowledge-graph visualization is constructed using neovis.
The component seamlessly integrates JavaScript visualization with Neo4j, and is an embedded tool capable of directly connecting Neo4 j.
The component is built on the basis of a property graph model of Neo4j, so that the Neovis.js data format and the Neo4j are consistent, custom coloring styles based on labels, properties, nodes and relations are defined in a single configuration object, and developers are allowed to set the visualization styles according to the nodes, the relations or specific properties.
The visual component setting step is as follows:
step 8.1: configuring a Neo4j server port number, a user name and a password, and connecting the port number, the user name and the password to a Neo4j database to acquire real-time data;
step 8.2: selecting DOM elements presenting visualization effects and setting styles (nodes and relationships) of the visualization elements;
step 8.3: appointing a label and an attribute to be displayed, appointing a node attribute for a URL (uniform resource locator) of an image of a node, appointing an edge attribute of edge thickness and appointing a node attribute of the size of the node;
step 8.4: responding to the retrieval designation of a user, receiving a retrieval result returned by the database, and constructing a front-end visual component;
step 8.5: and dynamically interacting user instructions, and responding to user operation in real time at the front end.
Referring to fig. 2, the knowledge graph construction, retrieval, and visualization system for scientific and technological services according to the present invention includes a data acquisition and preprocessing module, a knowledge extraction module, a knowledge correction module, a membership calculation module, a graph import module, a retrieved resource classification module, a resource retrieval module, and a visualization module.
The data acquisition and preprocessing module is used for acquiring resource information of a scientific and technological service platform, dividing the resource information into structured data, semi-structured data and unstructured data, and meanwhile, carrying out preprocessing of word segmentation and part-of-speech tagging on the semi-structured data and the unstructured data;
the knowledge extraction module is used for carrying out data cleaning and traversal operation on the structured data and carrying out knowledge extraction on the semi-structured data and the unstructured data, and specifically comprises entity identification, relation extraction and attribute extraction;
the knowledge correction module is used for performing knowledge correction on the cleaned scientific and technological resources and determining complete scientific and technological resource information which can be imported into a knowledge map database;
the membership calculation module is used for calculating the resource membership of the scientific and technological service resources which are determined to be imported into the knowledge map after knowledge correction;
the map importing module is used for importing the scientific and technological resources subjected to data processing into a knowledge map database;
the retrieval resource classification module is used for classifying scientific and technological resources on the platform and determining filtering conditions during user retrieval;
the resource retrieval module is used for responding to a user retrieval request, generating a corresponding query statement and completing query operation in a knowledge map database;
the visualization module is used for presenting the retrieval result of the user in a form of knowledge graph visualization in the front-end interface.
For a specific implementation of this embodiment, reference may be made to the relevant description in the above embodiments, which is not described herein again.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.
It should be noted that the terms "first," "second," and the like in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present invention, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution device. For example, if implemented in hardware, as in another embodiment, any one or combination of the following technologies, which are well known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, and the corresponding program may be stored in a computer readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
By adopting the knowledge graph construction, retrieval and visualization method and system for scientific and technological services, scientific and technological service resources on a scientific and technological service platform are subjected to knowledge graph storage, and compared with the traditional relational database storage, the method and system have the advantages that the structured data are stored in a network to form a graph instead of a table, and the storage efficiency of the scientific and technological resources is greatly improved. The invention uses the advantages of the nodes and the relational architecture in the knowledge graph for the scientific and technological service resource retrieval, and can accurately capture the personalized requirements of the user through the retrieval mode, thereby improving the matching degree of the supply and demand resources of the platform and improving the transaction success rate of the scientific and technological service resources of the platform.
In this specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims (6)

1. A knowledge graph construction, retrieval and visualization method for scientific and technological services is characterized by comprising the following steps:
(1) Collecting metadata of scientific and technological service platform resources, and preprocessing word segmentation and part-of-speech tagging;
(2) Performing data cleaning and knowledge extraction on the preprocessed metadata to obtain a scientific and technological resource entity list, an entity attribute list and an entity relation list;
(3) Carrying out knowledge correction on the scientific and technological service resource entity to determine scientific and technological service resources capable of being introduced into a knowledge map;
(4) Performing resource membership calculation on the scientific and technological service resources which are screened and determined to be led into the knowledge graph;
(5) Importing the processed scientific and technological service resources into a knowledge map database;
(6) Determining scientific and technological resource categories, and designing accurate filtering conditions for user retrieval;
(7) Responding to the personalized retrieval requirements of the user, and finishing query operation in a knowledge map database;
(8) The user retrieval content is presented in a visual map form, and the user retrieval experience is improved.
2. The knowledge graph constructing, retrieving and visualizing method for scientific and technological services according to claim 1, wherein the scientific and technological service platform is a scientific and technological service resource sharing platform with a scientific and technological service transaction function, and supports users to register and log in with roles, and the roles comprise a scientific and technological service supplier and a scientific and technological service demander.
3. The method for knowledge graph construction, retrieval and visualization for scientific and technical services according to claim 1, wherein the refined filtering conditions for the user retrieval in step (6) are specifically:
dividing two layers of retrieval categories according to the types of scientific and technological resources in a scientific and technological service platform: a first-level retrieval category and a second-level retrieval category;
based on a node-relation-node architecture in the knowledge graph, establishing a filtering condition during user retrieval, wherein the filtering condition specifically comprises retrieval object content, relation object content associated with the retrieval object and a relation between the retrieval object and the relation object;
the retrieval object content and the relation object content contain subunit contents which comprise: the first level search category, the second level search category, the search relation object content and the search relation object attribute.
4. The method of claim 1, wherein the scientific and technological service platform resources comprise intellectual property, scientific policies, scientific finances, technical transfer, research and development, venture incubation, inspection and inspection certification, instrumentation, scientific teams, and integrated scientific and technological services.
5. The method for knowledge graph construction, retrieval and visualization for scientific and technological services according to claim 1, wherein the step (2) of performing data cleaning and knowledge extraction on metadata specifically comprises the following steps: traversing the structured data, extracting knowledge from the unstructured data and the semi-structured data, determining the named entities, the entity attributes and the relationships among the entities, and further obtaining a scientific and technological resource entity list, an entity attribute list and an entity relationship list;
the named entities after the knowledge extraction operation are related named entities of scientific and technological service resources, and each extracted named entity can be selectively used as a related entity of scientific and technological service and is finally added into a knowledge graph;
the named entity attributes after the knowledge extraction operation are various attributes of different scientific and technological service resources, and each extracted entity attribute can be selectively used as a related entity attribute of scientific and technological service and is finally added into a knowledge graph;
the named relationships among the entities after the knowledge extraction operation are various relationships among different scientific and technological service resources, and each extracted entity relationship can be selectively used as a related entity relationship of scientific and technological services and is finally added into the knowledge graph.
6. A scientific and technological service knowledge map construction, retrieval and visualization system applied to the method of claims 1-5, wherein the system comprises a data acquisition and preprocessing module, a knowledge extraction module, a knowledge correction module, a membership calculation module, a map import module, a retrieval resource classification module, a resource retrieval module and a visualization module,
the data acquisition and preprocessing module is used for acquiring resource information of a scientific and technological service platform, dividing the resource information into structured data, semi-structured data and unstructured data, and meanwhile, preprocessing the semi-structured data and the unstructured data by word segmentation and part-of-speech tagging;
the knowledge extraction module is connected with the data acquisition and preprocessing module and is used for carrying out data cleaning and traversing operation on the structured data and carrying out knowledge extraction operation on the semi-structured data and the unstructured data, wherein the knowledge extraction operation specifically comprises entity identification, relation extraction and attribute extraction;
the knowledge correction module is connected with the knowledge extraction module and is used for correcting knowledge of the cleaned scientific and technological resources and determining complete scientific and technological resource information which can be imported into a knowledge map database;
the membership calculation module is connected with the knowledge correction module and is used for calculating the resource membership of the scientific and technological service resources which are determined to be imported into the knowledge map after knowledge correction;
the map importing module is connected with the membership calculation module and is used for importing the scientific and technological resources subjected to data processing into a knowledge map database;
the retrieval resource classification module is connected with the map importing module and is used for classifying scientific and technological resources on the platform and determining filtering conditions during user retrieval;
the resource retrieval module is connected with the retrieval resource classification module and used for responding to a user retrieval request, generating a corresponding query statement and completing query operation in a knowledge graph database;
and the visualization module is connected with the resource retrieval module and is used for presenting the retrieval result of the user in a form of knowledge graph visualization in the front-end interface.
CN202211030854.5A 2022-08-26 2022-08-26 Knowledge graph construction, retrieval and visualization method and system for scientific and technological service Pending CN115309885A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211030854.5A CN115309885A (en) 2022-08-26 2022-08-26 Knowledge graph construction, retrieval and visualization method and system for scientific and technological service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211030854.5A CN115309885A (en) 2022-08-26 2022-08-26 Knowledge graph construction, retrieval and visualization method and system for scientific and technological service

Publications (1)

Publication Number Publication Date
CN115309885A true CN115309885A (en) 2022-11-08

Family

ID=83864324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211030854.5A Pending CN115309885A (en) 2022-08-26 2022-08-26 Knowledge graph construction, retrieval and visualization method and system for scientific and technological service

Country Status (1)

Country Link
CN (1) CN115309885A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116127086A (en) * 2022-11-23 2023-05-16 广东省国土资源测绘院 Geographical science data demand analysis method and device based on scientific and technological literature resources
CN117786179A (en) * 2023-11-07 2024-03-29 河南省科技创新促进中心 Scientific research result retrieval method based on high-level talent key attribute

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116127086A (en) * 2022-11-23 2023-05-16 广东省国土资源测绘院 Geographical science data demand analysis method and device based on scientific and technological literature resources
CN116127086B (en) * 2022-11-23 2023-09-19 广东省国土资源测绘院 Geographical science data demand analysis method and device based on scientific and technological literature resources
CN117786179A (en) * 2023-11-07 2024-03-29 河南省科技创新促进中心 Scientific research result retrieval method based on high-level talent key attribute

Similar Documents

Publication Publication Date Title
CN110825882B (en) Knowledge graph-based information system management method
Zubrinic et al. The automatic creation of concept maps from documents written using morphologically rich languages
WO2021213314A1 (en) Data processing method and device, and computer readable storage medium
CN115309885A (en) Knowledge graph construction, retrieval and visualization method and system for scientific and technological service
Miao et al. A dynamic financial knowledge graph based on reinforcement learning and transfer learning
Paulus et al. Recent advances and future challenges of semantic modeling
Aladakatti et al. Exploring natural language processing techniques to extract semantics from unstructured dataset which will aid in effective semantic interlinking
Repke et al. Extraction and representation of financial entities from text
Simperl et al. Combining human and computation intelligence: the case of data interlinking tools
Ali et al. CLOE: a cross-lingual ontology enrichment using multi-agent architecture
Rahul et al. Social media sentiment analysis for Malayalam
Zhou et al. Corpus-based relation extraction by identifying and refining relation patterns
Bandi et al. Affinity Propagation Initialisation Based Proximity Clustering For Labeling in Natural Language Based Big Data Systems
Oliveira et al. Combining prompt-based language models and weak supervision for labeling named entity recognition on legal documents
Dai Construction of English and American literature corpus based on machine learning algorithm
CN116595192B (en) Technological front information acquisition method and device, electronic equipment and readable storage medium
Jiang et al. A refinement strategy for identification of scientific software from bioinformatics publications
Eke Cross-Platform Software Developer Expertise Learning
Hassner et al. Computation and palaeography: potentials and limits
Oliveira et al. CONTEXT MONITORING FOR DIALOGUES IN PORTUGUESE
Yu et al. Identifying business information through deep learning: analyzing the tender documents of an Internet-based logistics bidding platform
Orăsan et al. 11 Exploiting Data-Driven Hybrid Approaches to Translation in the EXPERT Project
Ordina Classification Problem in Real Estate Corpora: Furniture Detection in Real Estate Listings
Araújo et al. alBERTUM: A Portuguese Search Engine for Scientific and Academic Language
Schuh et al. Feasibility Analysis of Entity Recognition as a Means to Create an Autonomous Technology Radar

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination