CN109783484A - The construction method and system of the data service platform of knowledge based map - Google Patents
The construction method and system of the data service platform of knowledge based map Download PDFInfo
- Publication number
- CN109783484A CN109783484A CN201811640313.8A CN201811640313A CN109783484A CN 109783484 A CN109783484 A CN 109783484A CN 201811640313 A CN201811640313 A CN 201811640313A CN 109783484 A CN109783484 A CN 109783484A
- Authority
- CN
- China
- Prior art keywords
- data
- module
- service platform
- knowledge based
- entity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses the construction methods and system of a kind of data service platform of knowledge based map, comprising the following steps: cleans multi-source heterogeneous data;It is inquired for the data after cleaning, the data after inquiry is generated into resource ID by redis;Building OWL ontology is simultaneously managed plug-in unit, and the data are stored using columnar database.The invention has the advantages that: object-oriented with neatly storing data, the knowledge information contained in unstructured and semi-structured data is sufficiently excavated, helps to provide the structural data of high quality for later period various application fields.
Description
Technical field
The present invention relates to industrial internet of things field, it particularly relates to a kind of data service of knowledge based map
The construction method and system of platform.
Background technique
Knowledge mapping is intended to describe various entities or concept present in real world and the association between them is closed
System, its each entity is identified with the ID of globally unique determination, as everyone has an ID card No.;Second
Exactly two entities are connected with relationship, portray the association between them to come the intrinsic characteristic of portraying entity with attribute-value.
The rapid development of information technology especially internet, pushes the arriving of big data era, and all trades and professions are daily all
In the fragmentation of data for generating enormous amount, data metering unit develops to PB, EB, ZB, YB very from Byte, KB, MB, GB, TB
It is measured to BB, NB, DB, the acquisition to big data data is no longer technical problem, but its knowledge contained largely exists
In the structural data of non-structured text data and a large amount of semi-structured tables and webpage and production system;Tradition
Data information memory use relevant database, design is complicated, redundancy is big and search efficiency is low, can not directly acquire number
According to the middle Latent Semantic information for needing reasoning, excavation.
For the problems in the relevant technologies, currently no effective solution has been proposed.
Summary of the invention
For above-mentioned technical problem in the related technology, the present invention proposes a kind of data service platform of knowledge based map
Construction method and system, can object-oriented and neatly storing data, the knowledge information contained in abundant mining data,
Help to provide the structural data of high quality for later period various application fields.
To realize the above-mentioned technical purpose, the technical scheme of the present invention is realized as follows:
A kind of construction method of the data service platform of knowledge based map, comprising the following steps:
Multi-source heterogeneous data are cleaned;
It is inquired for the data after cleaning, the data after inquiry is generated into resource ID by redis;
Building OWL ontology is simultaneously managed plug-in unit, and the data are stored using columnar database.
Further, it is described by multi-source heterogeneous data carry out cleaning include:
ETL rule is obtained for different data sources load ETL plug-in unit, the relationship between entity is obtained after building entity;
Resource service subsystem is called to obtain resource ID;
Data after recycling are generated to the data object of structuring.
Further, it is described multi-source heterogeneous data are cleaned before further include being acquired using data collection client
Multi-source heterogeneous data.
Further, the data collection client includes Data Acquisition Program component, association ID formation component, association ID
Sending assembly and non-active service response component.
Further, the data for after cleaning, which inquire, includes
Global ID is accessed using full-text search engine;
In chart database, the entity that is mutually related is retrieved according to the Global ID, returns to the relevant ID of institute;
In distributed data-storage system, according to the association ID index structure data, respective attributes result is returned to.
Another aspect of the present invention provides a kind of building system of the data service platform of knowledge based map, comprising:
Data cleansing module, for cleaning multi-source heterogeneous data;
Resource service subsystem module passes through the data after inquiry for being inquired for the data after cleaning
Redis generates resource ID;
The data are utilized columnar database for constructing OWL ontology and being managed to plug-in unit by ontology management module
It is stored.
Further, the data cleansing module includes:
Entity constructs module, for obtaining ETL rule for different data sources load ETL plug-in unit, obtains after constructing entity
Relationship between entity;
Recycling module, for calling resource service subsystem to obtain resource ID;
Structural data objects module, for the data after recycling to be generated to the data object of structuring.
Further, which further includes data acquisition module, and the data acquisition module is used to acquire visitor using data
Family end acquires multi-source heterogeneous data.
Further, data collection client includes Data Acquisition Program component, association ID in the data acquisition module
Formation component, association ID sending assembly and non-active service response component.
Further, the data inquiry module includes
Global ID's module, for accessing Global ID using full-text search engine;
It is associated with ID module, for retrieving the entity that is mutually related according to the Global ID in chart database, is returned all
It is associated with ID;
Structural data module is used in distributed data-storage system, according to the association ID index structure number
According to return respective attributes result.
Beneficial effects of the present invention: object-oriented and neatly storing data, the knowledge contained in abundant mining data
Information helps to provide the structural data of high quality for later period various application fields.
Detailed description of the invention
It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment
Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention
Example, for those of ordinary skill in the art, without creative efforts, can also obtain according to these attached drawings
Obtain other attached drawings.
Fig. 1 is the process of the construction method of the data service platform of the knowledge based map described according to embodiments of the present invention
Figure;
Fig. 2 is the structure of the building system of the data service platform of the knowledge based map described according to embodiments of the present invention
Schematic diagram;
Fig. 3 is the entirety of the building system of the data service platform of the knowledge based map described according to embodiments of the present invention
Architecture diagram.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art's every other embodiment obtained belong to what the present invention protected
Range.
As shown in Figure 1, the building of the data service platform of a kind of knowledge based map according to embodiments of the present invention
Method, comprising the following steps:
Multi-source heterogeneous data are cleaned;
It is inquired for the data after cleaning, the data after inquiry is generated into resource ID by redis;
Specifically, resource service subsystem, by redis self-generating resource ID, and provides resource service interface;Industry is right
Image data after data acquisition is with data cleansing process to Global ID generation module application ID, will be same after object acquisition Global ID
Step is stored into each storage medium, to guarantee the possibility of correlation inquiry;Global ID's generation module is based on Redis database
Number device function is realized, can be generated from the long id increased, simultaneously because natural support of the Redis to thread-safe, ensure that
Entity object applies for the uniqueness of id under the conditions of multithreading.
Building OWL ontology is simultaneously managed plug-in unit, and the data are stored using columnar database.
Specifically, ontology management, OWL ontology is constructed according to business demand, and realizes that the additions and deletions of ontology change and looks into and ontology
Reasoning, specific steps are as follows: the ontology of design is converted to by tool by owl file and is imported into system;It realizes to ontology
The functions such as modification, inquiry, deletion;Realize the rule-based reasoning based on ontology.
Plug-in management provides version managements and the insert body mapping managements such as the online upgrading of plug-in unit, hot repair be multiple.
Data storage includes solid data storage and relation data storage;
Wherein, solid data storage specifically includes:
Storage system completes the storage to industrial bodies based on HBase.HBase is a kind of distributed towards column storage
Database.The table of HBase can have several column clusters (family), can store multiple key-value forms under each column cluster
Key-value pair.Data line is identified with line unit (Rowkey), the quantity for the key-value pair that each row of data is included can flexibly change.
In view of the load balancing of HBase subregion, line unit of the reversion character string of Global ID as HBase table is used in the design.Often
The non-empty field of storing data in data line is to optimize space hold.HBase, which is only realized, inquires industry by Global ID
The function of entity details data, there is no need to more Computer Aided Designs.
Relation data storage specifically includes:
Neo4j is a kind of chart database, can be good at storing existing relationship between different data.In a Neo4j
Include two kinds of data in figure, is node and relationship respectively.Node can have the attribute of multiple key-value pair forms, and relationship can be
To be also possible to it is undirected.Neo4j distributes each node the included ID in a Neo4j.In view of in optimization Neo4j
Optimization of the data to space hold, the design is merely with the relation data between Neo4j storage entity without storage entity
Specific object.Specific method is, is two classes: entity object and dimension data by the node division in Neo4j.Wherein entity object
In addition ID attribute is also set other than possessing the ID value that Neo4j is distributed automatically for storing the corresponding Global ID of the object
Value.Dimension data refers to the associated field value of different entities, such as category of employment, product category, geographical location etc..
Further, it is described by multi-source heterogeneous data carry out cleaning include:
The ETL rule based on conditional random field models is obtained for different data sources load ETL plug-in unit, then building is real
Body obtains the relationship between entity;
Wherein, condition random field (conditional random fields, abbreviation CRF or CRFs) is a kind of discriminate
Probabilistic model is one kind of random field, is usually used in mark or analytical sequence data, such as natural language text or biological sequence,
Such as Markov random field, condition random field is that vertex with undirected graph model, in figure represents stochastic variable, between vertex
Line represent the dependence relation between stochastic variable, in condition random field, stochastic variable Y's is distributed as conditional probability, give
Observed value be then stochastic variable X;In principle, the graph model layout of condition random field can be any given, general common
Layout be chain eliminant framework, though chain eliminant framework training (training), inference (inference) or decoding
(decoding) on, there is the higher algorithm of efficiency all for calculation.
Condition random field is used for the morphological analyses such as Chinese word segmentation and part-of-speech tagging work, and General Sequences disaggregated model is usually
Using hidden Markov model (HMM), such as class-based Chinese word segmentation, but in hidden Markov model, there are two hypothesis:
It exports independence assumption and Markov property is assumed.Wherein, output independence assumption requires sequence data stringent mutually indepedent
Can guarantee the correctness of derivation, and in fact most of sequence datas cannot be expressed as a series of independent events, and condition with
Airport then uses a kind of probability graph model, has the ability of expression long-distance dependence and overlapping property feature, can preferably solve
The advantages of the problems such as (classification) biasing, is infused in award of bid, and all features can carry out global normalization, can acquire the overall situation most
Excellent solution;
Condition random field variable according to the observationXAnd stochastic variableYIt is defined as follows:
G=(V, E) is enabled to indicate that a figure, the figure have the property that Y=(Yv)v∈V, i.e. stochastic variable Y can be by figure G
Vertex index access, in this way, work as stochastic variable YvCondition depends on observation variable X, then (X, Y) is just known as condition random
, and defer to the Markov property of graph structure:
p(Yv| X, Yw, w ≠ v) and=p (Yv| X, Yw, w~v)
Wherein, w~v indicates that w and v is adjacent vertex in figure G, and the algorithm realization of CRF has had multiple well-known at present
Open source projects, and be widely used in academia research and industry application in.
CRF++ be one can be used for segmenting/continuous data mark it is simple, customizable and increase income condition random field
(CRFs) tool;CRF++ be for general purpose design customization, and will be used for natural language information processing (NLP) it is each
Aspect, such as name Entity recognition, information extraction and chunk parsing.
By taking industry data cleans as an example, steps are as follows for calculating:
1. obtaining irregular industry data from data source, data include level-one trade classification data Org1 and second level industry
Classification data Org2;
2. by being handled as follows according to the initial data situation of reading:
When Org1 and Org2 are not sky:
If 1) Org1=Standard1, Org2=Standard2, return (Org1, Org2);
If 2) Org1=Standard2, Org2=Standard1, return (Org2, Org1);
If 3) Org2=Standard2, Org1 ≠ Standard1 is returned (Standard1, Org2);
If 4) Org1=Standard2, Org2 ≠ Standard1 is returned (Standard1, Org1);
If 5) Org1=Standard1, Org2 ≠ Standard2, then Org2 is divided by CRF algorithm model
Word removes stop words, is denoted as set A.Each second level industry data is divided by CRF algorithm model in Standard Industrial Classification
Word removes stop words, and each second level industry obtains a set, and the set of whole industries is denoted as LIST (B).Pass through background technique
The definition of the Jaccard distance of middle introduction:
The Jaccard distance for calculating each set in A and LIST (B), selects minimum Jaccard apart from corresponding standard
Second level trade classification Min is returned (Org1, Min);
If 6) Org2=Stadard1, Org1 ≠ Standard2, then Org1 is divided by CRF algorithm model
Word removes stop words, is denoted as set A.Each second level industry data is divided by CRF algorithm model in Standard Industrial Classification
Word removes stop words, and each second level industry obtains a set, and the set of whole industries is denoted as LIST (B).Pass through background technique
The definition of the Jaccard distance of middle introduction:
The Jaccard distance for calculating each set in A and LIST (B), selects minimum Jaccard apart from corresponding standard
Second level trade classification Min is returned (Org2, Min);
If 7) Org1 ≠ Stadard1, Org2 ≠ Standard2, then being calculated after Org1 is connect with Org2 by CRF
Method model is segmented, and is removed stop words, is denoted as set A.Each level-one, second level industry data carry out in Standard Industrial Classification
It is segmented after character string connection by CRF algorithm model, removes stop words, each industry obtains a set, whole industries
Set be denoted as LIST (B).Pass through the definition for the Jaccard distance introduced in background technique:
The Jaccard distance for calculating each set in A and LIST (B), selects minimum Jaccard apart from corresponding standard
Level-one trade classification Min1, standard second level trade classification Min2 are returned (Min1, Min2);
When it is empty that Org1, which is not sky Org2:
If 1) Ogr1=Standard2, then returning to (Standard1, Org1);
If 2) Ogr1=Standard1, then returning to (Org1, Standard2);
If 3) Org1 ≠ Standard1 and Org1 ≠ Standard2, Org1 is carried out by CRF algorithm model
Participle removes stop words, is denoted as set A.Each level-one, second level industry data carry out character string connection in Standard Industrial Classification
It is segmented afterwards by CRF algorithm model, removes stop words, each industry obtains a set, and the set of whole industries is denoted as
LIST(B).Pass through the definition for the Jaccard distance introduced in background technique:
The Jaccard distance for calculating each set in A and LIST (B), selects minimum Jaccard apart from corresponding standard
Level-one trade classification Min1, standard second level trade classification Min2 are returned (Min1, Min2);
When it is empty that Org2, which is not sky Org1:
If 1) Ogr2=Standard2, then returning to (Standard1, Org2);
If 2) Ogr2=Standard1, then returning to (Org2, Standard2);
If 3) Org2 ≠ Standard1 and Org2 ≠ Standard2, Org2 is carried out by CRF algorithm model
Participle removes stop words, is denoted as set A.Each level-one, second level industry data carry out character string connection in Standard Industrial Classification
It is segmented afterwards by CRF algorithm model, removes stop words, each industry obtains a set, and the set of whole industries is denoted as
LIST(B).Pass through the definition for the Jaccard distance introduced in background technique:
The Jaccard distance for calculating each set in A and LIST (B), selects minimum Jaccard apart from corresponding standard
Level-one trade classification Min1, standard second level trade classification Min2 are returned (Min1, Min2);
Above step is the process flow of single data, and parallel form can be used for mass data while carrying out
Processing, can significantly improve the efficiency of data processing.
Calling resource service subsystem is treated data acquisition resource ID, realizes the recycling of data;
Data structured, the data after making recycling become the data object of structuring, and data, are stored in database at once
In, the data that can be realized with two-dimentional table structure come logical expression.
In one particular embodiment of the present invention, it is described multi-source heterogeneous data are cleaned before further include, utilize
Data collection client acquires multi-source heterogeneous data, wherein data acquisition specifically includes:
Initial data is introduced directly into or provides different data collection clients, acquires multi-source heterogeneous data, data acquisition
Client includes: Data Acquisition Program component, for obtaining the extremely corresponding description information of unstructured data;ID is associated with to generate
Component, for the unique association ID of description information distribution for unstructured data;It is associated with ID sending assembly, for the pass
Connection ID is sent to the service managing server of client, the structural data association for keeping non-structural words data corresponding;It is non-master
Dynamic service response component sends the extremely corresponding description information of unstructured data to number for passively data acquisition service
According to acquisition platform.
Unified acquisition interface obtains structuring and time series data from the internal business systems such as EMS, CPS, CRM, SRM,
Except common http protocol internet data acquisition is supported, go back the ModBus, OPC in supporting industry field, CAN, ControlNet,
DeviceNet, Profibus, Zigbee etc. all types of industrial protocols or even the production of each automation equipment and integrator are certainly
Oneself develops various privately owned industrial protocols, realizes the effective parsing and acquisition of different agreement data.
In one particular embodiment of the present invention, the data collection client includes Data Acquisition Program component, closes
Join ID formation component, association ID sending assembly and non-active service response component.
In one particular embodiment of the present invention, the data for after cleaning, which inquire, includes
Full-text search engine is used by keyword, returns to unique Global ID;Including creation index and retrieving;Its
In, 1) creation index specifically include:
Being indexed file is the unstructured data including industrial data stored in Full-text database, will be former
Data are transmitted to segmenter, and data are divided into individual word one by one, remove punctuation mark, and removal stops word;The word that will be obtained
Member is transmitted to Language Processing component, by Language Processing, obtains a series of words;Obtained word is transmitted to indexing component, using obtaining
Word create a dictionary, dictionary alphabet sequence is ranked up, merges identical word as the document table of falling row chain;Pass through rope
Draw storage and hard disk is written into index;So far, index has created, we can find the data that we want by it.
2) retrieving specifically includes:
User input query sentence;
Query statement is the same with our common language, the grammer of query statement according to the realization of text retrieval system without
Together.
A series of words are obtained by syntactic analysis and language analysis for query statement;
A query tree is obtained by syntactic analysis;
Index is read into memory by index storage;
It is searched for and is indexed using query tree, to obtain the document chained list of each word, reported to the leadship after accomplishing a task document chained list, and obtain
Result document;
In reverse indexing table, the document chained list comprising each keyword is found out respectively, to the chain comprising each keyword
Table merges operation, obtains the not only document chained list comprising keyword 1 but also comprising keyword 2, and then, it is poor that multiple chained lists are carried out
Operation obtains the not only data link table comprising keyword 1 but also comprising keyword 2, finally returns to query result.
In chart database, the entity that is mutually related is retrieved according to the Global ID, returns to the relevant ID of institute;Specific packet
It includes:
1) graph data structure models
By analyzing the data information including industrial data, the entity node and entity of each information are therefrom extracted
Between relationship the graph structure model of data is generated by entity node and incidence relation.
2) data directory
Wherefrom started in graphic data base using index with determining, the index of chart database passes through specific attribute value
Search node or relationship;
3) user query input by sentence
The grammer of query statement is different according to the use of database;
4) data traversal
Based on depth-first and breadth first algorithm, optimal algorithm is selected according to diagram data model using effect.
5) query result is returned.
In distributed data-storage system, according to the association ID index structure data, return to respective attributes as a result,
Specifically include:
1) attribute retrieval
The grammer of user input query sentence, query statement is different according to the use of database, wherein database is non-
Relevant database;
2) corresponding structural data is inquired in the database according to Global ID
The information in the information and .META. in relevant-ROOT- that client passes through inner buffer is directly connected to, request
The HRegionserver of Data Matching, navigates to region corresponding with client's request on the server, and client's request first can
Inquire the caching-memstore of the region in memory;
Client is directly returned result to if finding result in memstore;It is not found in memstore
Next matched data can read the data in the storefile file of persistence;Storefile is the tree sorted by key
The file of shape structure, hbase reading disk file read data by its basic I/O unit;
It returns the result, otherwise attends school corresponding if it can find the data to be made in BlockCache
Data block is just put by the data that block is read in storefile file if reading the data to be looked into not yet
In the blockcache of HRegion Server, it is then followed by and reads next block data, until recycle in this way
Block data are until finding the data to be requested and returning the result;If the data in the region, which are not all found, to look for
Data, be most followed by directly returning to null, indicate the matched data do not looked for;
3) analytic structure data return to the data information of expected form.
As shown in Fig. 2, another aspect of the present invention, provides a kind of building system of the data service platform of knowledge based map
System, comprising:
Data cleansing module, for cleaning multi-source heterogeneous data;
Resource service subsystem module passes through the data after inquiry for being inquired for the data after cleaning
Redis generates resource ID;
Specifically, resource service subsystem, by redis self-generating resource ID, and provides resource service interface;Industry is right
Image data after data acquisition is with data cleansing process to Global ID generation module application ID, will be same after object acquisition Global ID
Step is stored into each storage medium, to guarantee the possibility of correlation inquiry;Global ID's generation module is based on Redis database
Number device function is realized, can be generated from the long id increased, simultaneously because natural support of the Redis to thread-safe, ensure that
Entity object applies for the uniqueness of id under the conditions of multithreading.
The data are utilized columnar database for constructing OWL ontology and being managed to plug-in unit by ontology management module
It is stored.
In one particular embodiment of the present invention, the data cleansing module includes:
Entity constructs module, for obtaining ETL rule for different data sources load ETL plug-in unit, obtains after constructing entity
Relationship between entity;
Recycling module, for calling resource service subsystem to obtain resource ID;
Structural data objects module, for the data after recycling to be generated to the data object of structuring.
In one particular embodiment of the present invention, which further includes data acquisition module, the data acquisition module
It is specific as follows for acquiring multi-source heterogeneous data using data collection client:
Initial data is introduced directly into or provides different data collection clients, acquires multi-source heterogeneous data, data acquisition
Client includes: Data Acquisition Program component, for obtaining the extremely corresponding description information of unstructured data;ID is associated with to generate
Component, for the unique association ID of description information distribution for unstructured data;It is associated with ID sending assembly, for the pass
Connection ID is sent to the service managing server of client, the structural data association for keeping non-structural words data corresponding;It is non-master
Dynamic service response component sends the extremely corresponding description information of unstructured data to number for passively data acquisition service
According to acquisition platform.
Unified acquisition interface obtains structuring and time series data from the internal business systems such as EMS, CPS, CRM, SRM,
Except common http protocol internet data acquisition is supported, go back the ModBus, OPC in supporting industry field, CAN, ControlNet,
DeviceNet, Profibus, Zigbee etc. all types of industrial protocols or even the production of each automation equipment and integrator are certainly
Oneself develops various privately owned industrial protocols, realizes the effective parsing and acquisition of different agreement data.
In one particular embodiment of the present invention, client includes Data Acquisition Program group in the data acquisition module
Part, association ID formation component, association ID sending assembly and non-active service response component.
In one particular embodiment of the present invention, the data inquiry module includes
Global ID's module, for accessing Global ID using full-text search engine;Wherein, 1) creation index specifically includes:
Being indexed file is the unstructured data including industrial data stored in Full-text database, will be former
Data are transmitted to segmenter, and data are divided into individual word one by one, remove punctuation mark, and removal stops word;The word that will be obtained
Member is transmitted to Language Processing component, by Language Processing, obtains a series of words;Obtained word is transmitted to indexing component, using obtaining
Word create a dictionary, dictionary alphabet sequence is ranked up, merges identical word as the document table of falling row chain;Pass through rope
Draw storage and hard disk is written into index;So far, index has created, we can find the data that we want by it.
2) retrieving specifically includes:
User input query sentence;
Query statement is the same with our common language, the grammer of query statement according to the realization of text retrieval system without
Together.
A series of words are obtained by syntactic analysis and language analysis for query statement;
A query tree is obtained by syntactic analysis;
Index is read into memory by index storage;
It is searched for and is indexed using query tree, to obtain the document chained list of each word, reported to the leadship after accomplishing a task document chained list, and obtain
Result document;
In reverse indexing table, the document chained list comprising each keyword is found out respectively, to the chain comprising each keyword
Table merges operation, obtains the not only document chained list comprising keyword 1 but also comprising keyword 2, and then, it is poor that multiple chained lists are carried out
Operation obtains the not only data link table comprising keyword 1 but also comprising keyword 2, finally returns to query result.
It is associated with ID module, for retrieving the entity that is mutually related according to the Global ID in chart database, is returned all
It is associated with ID, specific as follows:
1) graph data structure models
By analyzing the data information including industrial data, the entity node and entity of each information are therefrom extracted
Between relationship the graph structure model of data is generated by entity node and incidence relation.
2) data directory
Wherefrom started in graphic data base using index with determining, the index of chart database passes through specific attribute value
Search node or relationship;
3) user query input by sentence
The grammer of query statement is different according to the use of database;
4) data traversal
Based on depth-first and breadth first algorithm, optimal algorithm is selected according to diagram data model using effect.
5) query result is returned.
Structural data module is used in distributed data-storage system, according to the association ID index structure number
According to return respective attributes are as a result, specific as follows:
1) attribute retrieval
The grammer of user input query sentence, query statement is different according to the use of database, wherein database is non-
Relevant database;
2) corresponding structural data is inquired in the database according to Global ID
The information in the information and .META. in relevant-ROOT- that client passes through inner buffer is directly connected to, request
The HRegionserver of Data Matching, navigates to region corresponding with client's request on the server, and client's request first can
Inquire the caching-memstore of the region in memory;
Client is directly returned result to if finding result in memstore;It is not found in memstore
Next matched data can read the data in the storefile file of persistence;Storefile is the tree sorted by key
The file of shape structure, hbase reading disk file read data by its basic I/O unit;
It returns the result, otherwise attends school corresponding if it can find the data to be made in BlockCache
Data block is just put by the data that block is read in storefile file if reading the data to be looked into not yet
In the blockcache of HRegion Server, it is then followed by and reads next block data, until recycle in this way
Block data are until finding the data to be requested and returning the result;If the data in the region, which are not all found, to look for
Data, be most followed by directly returning to null, indicate the matched data do not looked for;
3) analytic structure data return to the data information of expected form.
In order to facilitate understanding above-mentioned technical proposal of the invention, below by way of in specifically used mode to of the invention above-mentioned
Technical solution is described in detail.
When specifically used, the building system of the data service platform of knowledge based map according to the present invention, from
Service logic angle is set out, as shown in figure 3, the bottom is data storage layer, what is stored in MySQL is initial data, and upwards
It is supplied to data collection client;Redis is responsible for generating resource ID, provides branch for the data resource after ETL rule process
Support, Kafka receive the data of data acquisition interface acquisition;The data that all final process are completed are stored in chart database;Number
It is Data Persistence Layer according to accumulation layer upper level, a mapping solution is provided between Object-relational Database;Service layer
The operation such as modification, inquiry and deletion including plug-in unit and ontology, and resource ID service and outbound data inquiry clothes are provided for data
Business etc.;Web layers provide the parameter verification of plug-in unit and ontology management, and specific business is responsible for processing by service layer;Top layer, which provides, to insert
The terminals such as part Web page and ontology management Web page are shown and the open interfaces such as resource ID, acquisition service, data query.
In conclusion by means of above-mentioned technical proposal of the invention, object-oriented and neatly storing data, sufficiently dig
The knowledge information contained in unstructured and semi-structured data is dug, helps to provide high quality for later period various application fields
Structural data.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Within mind and principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Claims (10)
1. a kind of construction method of the data service platform of knowledge based map, which comprises the following steps:
Multi-source heterogeneous data are cleaned;
It is inquired for the data after cleaning, the data after inquiry is generated into resource ID by redis;
Building OWL ontology is simultaneously managed plug-in unit, and the data are stored using columnar database.
2. the construction method of the data service platform of knowledge based map according to claim 1, which is characterized in that described
Multi-source heterogeneous data, which are carried out cleaning, includes:
ETL rule is obtained for different data sources load ETL plug-in unit, the relationship between entity is obtained after building entity;
Resource service subsystem is called to obtain resource ID;
Data after recycling are generated to the data object of structuring.
3. the construction method of the data service platform of knowledge based map according to claim 1, which is characterized in that described
It further include acquiring multi-source heterogeneous data using data collection client before multi-source heterogeneous data are cleaned.
4. the construction method of the data service platform of knowledge based map according to claim 3, which is characterized in that described
Data collection client includes Data Acquisition Program component, association ID formation component, association ID sending assembly and non-active service
Response assemblies.
5. the construction method of the data service platform of knowledge based map according to claim 1-4, feature
It is, the data for after cleaning carry out inquiry and include
Global ID is accessed using full-text search engine;
In chart database, the entity that is mutually related is retrieved according to the Global ID, returns to the relevant ID of institute;
In distributed data-storage system, according to the association ID index structure data, respective attributes result is returned to.
6. a kind of building system of the data service platform of knowledge based map characterized by comprising
Data cleansing module, for cleaning multi-source heterogeneous data;
Resource service subsystem module, it is for being inquired for the data after cleaning, the data after inquiry are raw by redis
At resource ID;
Ontology management module is carried out the data using columnar database for constructing OWL ontology and being managed to plug-in unit
Storage.
7. the building system of the data service platform of knowledge based map according to claim 6, which is characterized in that described
Data cleansing module includes:
Entity constructs module, for obtaining ETL rule for different data sources load ETL plug-in unit, obtains entity after constructing entity
Between relationship;
Recycling module, for calling resource service subsystem to obtain resource ID;
Structural data objects module, for the data after recycling to be generated to the data object of structuring.
8. the building system of the data service platform of knowledge based map according to claim 6, which is characterized in that this is
System further includes data acquisition module, and the data acquisition module is used to acquire multi-source heterogeneous data using data collection client.
9. the building system of the data service platform of knowledge based map according to claim 8, which is characterized in that described
Data collection client includes Data Acquisition Program component, association ID formation component, association ID transmission group in data acquisition module
Part and non-active service response component.
10. according to the building system of the data service platform of the described in any item knowledge based maps of claim 6-9, feature
It is, the data inquiry module includes
Global ID's module, for accessing Global ID using full-text search engine;
It is associated with ID module, for the entity that is mutually related being retrieved according to the Global ID, it is relevant returning to institute in chart database
ID;
Structural data module, for according to the association ID index structure data, returning in distributed data-storage system
Return respective attributes result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811640313.8A CN109783484A (en) | 2018-12-29 | 2018-12-29 | The construction method and system of the data service platform of knowledge based map |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811640313.8A CN109783484A (en) | 2018-12-29 | 2018-12-29 | The construction method and system of the data service platform of knowledge based map |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109783484A true CN109783484A (en) | 2019-05-21 |
Family
ID=66499103
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811640313.8A Pending CN109783484A (en) | 2018-12-29 | 2018-12-29 | The construction method and system of the data service platform of knowledge based map |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109783484A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110289101A (en) * | 2019-07-02 | 2019-09-27 | 京东方科技集团股份有限公司 | A kind of computer equipment, system and readable storage medium storing program for executing |
CN110442753A (en) * | 2019-07-17 | 2019-11-12 | 北京飞利信电子技术有限公司 | A kind of chart database auto-creating method and device based on OPC UA |
CN111046115A (en) * | 2019-12-24 | 2020-04-21 | 四川文轩教育科技有限公司 | Knowledge graph-based heterogeneous database interconnection management method |
CN112364046A (en) * | 2020-10-29 | 2021-02-12 | 北京航空航天大学 | Knowledge graph-based main data management method in heterogeneous environment |
CN112749216A (en) * | 2019-10-30 | 2021-05-04 | 北京国双科技有限公司 | Rule analysis-based data import method, device and equipment |
CN113342808A (en) * | 2021-05-26 | 2021-09-03 | 电子科技大学 | Knowledge graph inference engine architecture system based on electromechanical equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106815293A (en) * | 2016-12-08 | 2017-06-09 | 中国电子科技集团公司第三十二研究所 | System and method for constructing knowledge graph for information analysis |
CN107180059A (en) * | 2016-03-11 | 2017-09-19 | 北大方正集团有限公司 | Data retrieval method and data retrieval system |
CN107358315A (en) * | 2017-06-26 | 2017-11-17 | 深圳市金立通信设备有限公司 | A kind of information forecasting method and terminal |
CN107766483A (en) * | 2017-10-13 | 2018-03-06 | 华中科技大学 | The interactive answering method and system of a kind of knowledge based collection of illustrative plates |
CN107783973A (en) * | 2016-08-24 | 2018-03-09 | 慧科讯业有限公司 | The methods, devices and systems being monitored based on domain knowledge spectrum data storehouse to the Internet media event |
US10157226B1 (en) * | 2018-01-16 | 2018-12-18 | Accenture Global Solutions Limited | Predicting links in knowledge graphs using ontological knowledge |
-
2018
- 2018-12-29 CN CN201811640313.8A patent/CN109783484A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107180059A (en) * | 2016-03-11 | 2017-09-19 | 北大方正集团有限公司 | Data retrieval method and data retrieval system |
CN107783973A (en) * | 2016-08-24 | 2018-03-09 | 慧科讯业有限公司 | The methods, devices and systems being monitored based on domain knowledge spectrum data storehouse to the Internet media event |
CN106815293A (en) * | 2016-12-08 | 2017-06-09 | 中国电子科技集团公司第三十二研究所 | System and method for constructing knowledge graph for information analysis |
CN107358315A (en) * | 2017-06-26 | 2017-11-17 | 深圳市金立通信设备有限公司 | A kind of information forecasting method and terminal |
CN107766483A (en) * | 2017-10-13 | 2018-03-06 | 华中科技大学 | The interactive answering method and system of a kind of knowledge based collection of illustrative plates |
US10157226B1 (en) * | 2018-01-16 | 2018-12-18 | Accenture Global Solutions Limited | Predicting links in knowledge graphs using ontological knowledge |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110289101A (en) * | 2019-07-02 | 2019-09-27 | 京东方科技集团股份有限公司 | A kind of computer equipment, system and readable storage medium storing program for executing |
CN110442753A (en) * | 2019-07-17 | 2019-11-12 | 北京飞利信电子技术有限公司 | A kind of chart database auto-creating method and device based on OPC UA |
CN112749216A (en) * | 2019-10-30 | 2021-05-04 | 北京国双科技有限公司 | Rule analysis-based data import method, device and equipment |
CN111046115A (en) * | 2019-12-24 | 2020-04-21 | 四川文轩教育科技有限公司 | Knowledge graph-based heterogeneous database interconnection management method |
CN111046115B (en) * | 2019-12-24 | 2023-08-08 | 四川文轩教育科技有限公司 | Heterogeneous database interconnection management method based on knowledge graph |
CN112364046A (en) * | 2020-10-29 | 2021-02-12 | 北京航空航天大学 | Knowledge graph-based main data management method in heterogeneous environment |
CN112364046B (en) * | 2020-10-29 | 2022-07-29 | 北京航空航天大学 | Knowledge graph-based main data management method in heterogeneous environment |
CN113342808A (en) * | 2021-05-26 | 2021-09-03 | 电子科技大学 | Knowledge graph inference engine architecture system based on electromechanical equipment |
CN113342808B (en) * | 2021-05-26 | 2022-11-08 | 电子科技大学 | Knowledge graph inference engine architecture system based on electromechanical equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109783484A (en) | The construction method and system of the data service platform of knowledge based map | |
Zhu et al. | Intelligent learning for knowledge graph towards geological data | |
CN104239513B (en) | A kind of semantic retrieving method of domain-oriented data | |
CN112131449B (en) | Method for realizing cultural resource cascade query interface based on ElasticSearch | |
CN107798387B (en) | Knowledge service system and method suitable for full life cycle of high-end equipment | |
CN108509543B (en) | Streaming RDF data multi-keyword parallel search method based on Spark Streaming | |
CN102087669A (en) | Intelligent search engine system based on semantic association | |
US20130232147A1 (en) | Generating a taxonomy from unstructured information | |
Varfolomeyev et al. | Smart personal assistant for historical tourism | |
Nesi et al. | Ge (o) Lo (cator): Geographic information extraction from unstructured text data and Web documents | |
Younis et al. | Hybrid geo-spatial query methods on the Semantic Web with a spatially-enhanced index of DBpedia | |
Tayal et al. | Fast retrieval approach of sentimental analysis with implementation of bloom filter on Hadoop | |
CN101916260A (en) | Method for establishing semantic mapping between disaster body and relational database | |
CN116467291A (en) | Knowledge graph storage and search method and system | |
CN111949649A (en) | Dynamic body storage system, storage method and data query method | |
CN105183736A (en) | Universal searching system according to network equipment configuration and state information, and universal searching method thereof | |
CN113377739A (en) | Knowledge graph application method, knowledge graph application platform, electronic equipment and storage medium | |
CN114880483A (en) | Metadata knowledge graph construction method, storage medium and system | |
Rana et al. | An analysis of semantic heterogeneity issues and their countermeasures prevailing in semantic web | |
Zhang et al. | Semantic web and geospatial unique features based geospatial data integration | |
CN112597305A (en) | Scientific and technological literature author name disambiguation method based on deep learning and web end disambiguation device | |
Paramartha et al. | Integration of Region-based Open Data Using Semantic Web | |
Sukumar et al. | Knowledge Graph Generation for Unstructured Data Using Data Processing Pipeline | |
Palligkinis et al. | Extending YAGO2geo with geospatial information from other countries | |
Feng et al. | Intelligent question answering system based on entrepreneurial incubation knowledge graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190521 |