CN106570081A - Semantic net based large scale offline data analysis framework - Google Patents

Semantic net based large scale offline data analysis framework Download PDF

Info

Publication number
CN106570081A
CN106570081A CN201610907501.7A CN201610907501A CN106570081A CN 106570081 A CN106570081 A CN 106570081A CN 201610907501 A CN201610907501 A CN 201610907501A CN 106570081 A CN106570081 A CN 106570081A
Authority
CN
China
Prior art keywords
data
layer
analysis
semantic
static
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610907501.7A
Other languages
Chinese (zh)
Inventor
王坚
凌卫青
程进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN201610907501.7A priority Critical patent/CN106570081A/en
Publication of CN106570081A publication Critical patent/CN106570081A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating

Abstract

The invention relates to a semantic net based large scale offline data analysis framework. The large scale offline data analysis framework includes a data acquisition layer, a body layer, a data storage layer, a semantic layer, a data analysis layer and an application layer. A data source includes dynamic data and static data, and the static data includes data and database internal logic semantic and structure type. The static data is established into a body model in the analysis framework; the static data is extracted and modeled, and then the static data orients a user or an upper analysis task in a semantic service manner. The large scale offline data analysis framework can effectively improve the ability to organizing multi-source heterogeneous offline data and has a uniform interface to upper data; and application users or data analysis workers can access a lower data source through a semantic interface without knowing all the information of different data sources, and relevant data information is acquired. The large scale offline data analysis framework can effectively update the whole data source from a global perspective by correction of the body structure having changed content and update and inference service built in an application tool.

Description

Large-scale off-line data analysiss framework based on semantic net
Technical field
The present invention relates to a kind of large-scale off-line data analysiss framework based on semantic net.
Background technology
With the maturation of generation information technology, Internet of Things, mobile interchange, cloud computing concept are gradually by personal and institute of enterprise Receive, substantial amounts of data and information are being increased daily with the quantity of PB levels, how big data processed, are analyzed and have been obtained The knowledge of value becomes the whole emphasis of each great institutions and company's research.Big data is enterprise's reconstruct value chain under the New Times, is dug Potential point of economic increase is dug, the valuable source of autonomous innovation is driven.The research of big data also relates to every field, such as traffic, Medical treatment, finance, the Internet, public administration, industry, service occupation and University Scientific Research.
The process and analysis of same big data resource is also important research topic.The feature of big data first includes:Dimension Degree disaster, a large amount of different types of data bring difficulty for data analysiss, and traditional analysis method is difficult to process high-dimensional Data set;Crossing domain, with data resource explosive growth, big data analysis is gradually intended to combine the number of crossing domain According to resource, need analysis personnel that there is comprehensive domain knowledge;Data structure is complicated, with social networkies, picture and opposite The use of record data is produced, the storage of a large amount of unstructured datas is processed and analyzed and brings challenges for data analysis system;Data Relation implies, and the higher-dimension degrees of data of blowout makes analysis personnel be difficult to grasp the meaning of mass data behind, and in mass data The effectiveness of implicit relation strong influence analysis result;Data variation is quick:A large amount of electronic equipments are with the speed of Millisecond Data are produced, is that data storage and management proposes new requirement;Data source is not fixed:Ample resources is difficult under big data link Permanent storage is realized, in the face of constantly producing change and again data storage source, how information system is tackled and be switched fast and not The data source of disconnected change is the problem that all enterprises face.Value density is low:Mass data, especially sensing data, data volume It is big but be worth not high.
And process of the existing data system to big data resource has difficulties, it is mainly reflected in:
(1)Data acquisition type ossifys.Traditional data analysis system is easy to by building on data warehouse and data cube The Data Management Analysis in later stage.But the analysis system autgmentability of this framework is poor, it is difficult to the data that reply constantly changes and increases Type, and the structure elapsed time of data warehouse, it is difficult to catch quick data variation, make data analysiss lose value;
(2)Data storage is isolated.Because there are a variety of data acquisition equipments in same field, initial data is result in sky Between on be stored in different physical address, meanwhile, same type data be not standardized yet storage process, be data inspection Rope and calculating bring very big difficulty;
(3)Data traceability is difficult.In data acquisition, in addition to the relation between the data that collecting device itself is limited, Clear and definite logical relation is not formulated between most of data, needs are appointed according to different inquiry and calculating when causing data analysiss Business repeats configuration, and separates source and relevant information of the data resource of storage to inquiring about data and cause obstruction;
(4)Data storage isomerism.Although crowd raises the data resource of system causes mass data data representation and storage organization not Together, but same data resource is represented.The collection of data and integrating process variation, and different acquisition department only focuses on it The demand data of body, therefore, the data of different department's collections have very strong semantic ambiguity in same field, cause information to exchange It is difficult, it is impossible to reuse.
Therefore, new data resource environment needs a kind of new expansible, it is easy to the big data of data organization and management point Analysis framework, strengthens application and analysis ability of the enterprise to data, farthest lifts the applying value of data.
The content of the invention
It is an object of the invention to provide a kind of large-scale off-line data analysiss framework based on semantic net.
Large-scale off-line data analysiss framework based on semantic net proposed by the present invention, is divided into from bottom to top data acquisition Layer, body layer, data storage layer, semantic layer, data analysiss layer and application layer;Wherein:
Source data is platform exterior data, is used for Platform Analysis and process, with concentration or distributed storage in other data bases Or in other platforms;Including sensing data, text data, form data, network data, view data and other data, institute State source data and be divided into dynamic data and static data;Dynamic data is the quick data for producing change, and such data is generally produced Time interval is shorter, takes substantial amounts of data space;Static data generation time interval is relatively long, is for inhomogeneity Type and the basic data in source, including mathematical logic relation, Data Physical information and data semantic information;
Data collection layer includes that structural data is extracted, semi-structured data is extracted, unstructured data is extracted and artificial data Resource is divided and sorted out;Enterprise or all potential source datas are carried out artificial division and sorted out for building ontology library, structure Change that data pick-up, semi-structured data are extracted, unstructured data is extracted primarily to according to different data types, with reference to Ontology library carries out unified process to the data of respective type, and for data storage layer data pick-up service is provided;Data collection layer By artificial cognition or recognition function is write to the mathematical logic relation in source data, data-voice information and data physical message Three class static datas are identified, and are stored in the form of electronic document or record, and static data is usually structural data;Simultaneously To solid data content, that is, being stored in substantial amounts of dynamic data in data base or other platforms carries out artificial cognition, mainly for The structure of dynamic data, type, size and storage mode, dynamic data includes finishing structure, semi-structured and destructuring class Type, different interface API are write according to the structure of disparate databases and outside platform based on body layer, are carried out structural data and are taken out Take, semi-structured data is extracted and unstructured data is extracted, and be stored in data storage layer;
Body layer is substantially carried out the structure of ontology library, and main foundation including ontology model, mapped file being write and realizing On the one hand the renewal of ontology model, body layer is stored in instances of ontology data in data base, is on the other hand the language of semantic layer Justice retrieval provides support;Based on the Data Integration of semantic net, source data is identified first, then data is mapped as into RDF Triple form, ultimately generates ontology library and supports that SPARQL is inquired about;Body layer is mainly soft using prot é g é according to static data Part builds ontology model and generates ontology library, and D2R engines include mapping engine, ontology model and mapped file, by ontology model Mapped file is generated by artificial, semi- or fully automated mode, mapped file is mainly the physical message of source data and storage The mapping relations of the physical message of layer memory element, mapping engine is embedded in the data extraction module of data collection layer, body Update service interface to be individually present and body layer, be mainly used in the renewal of ontology library;
Data storage layer stores the dynamic data for collecting and static data in distributed memory system, and static data can be with Using structurized data base such as Hbase, dynamic data storage is in the data bases such as such as Hive, HDFS;For big data analysis For, general distributed memory system adopts master/slave framework, and host node is management node, is responsible for record data storage location etc. Information;It is back end from node, is the real physical storage locations of data;
Semantic layer is mainly directed towards big data inquiry and is designed, including inquiry is drawn with the generation of reasoning task, inquiry proxy, inquiry Hold up, the module such as inference engine, semantic primary recipient layer by layer, and please according to semantic parsing and inference function from the request of user Ask and be converted into query task, then call query engine to carry out big data inquiry, most at last result is passed to data analysiss layer and is easy to Follow-up big data is calculated.Semantic layer encapsulates lower module and in the way of interface API as top service, and upper layer application passes through API accesses first the query task maker of semantic layer, and query task is converted to into the SPARQL language of body, is drawn by reasoning Hold up and query engine, inquire about ontology library and return the physical message of corresponding static data content and dynamic data.
Data analysiss layer provides parser according to different big data analysis demands, and this layer is on the mould of service interface Formula provides support for the application and development of user.Data are called from bottom to lower utilization parallel computation interface, the layer is mainly by appointing Business scheduler module coordinates the progress of analysis task being scheduled to analysis task, wherein using fundamental analysiss algorithms library and multiple Miscellaneous data analysis algorithm storehouse is packaged to different algorithms, strengthens the ability of second development and autgmentability of whole system.Data Analysis layer is realized the information exchange of other modules of this layer and levels module by task scheduling modules.Application service interface is provided The analysis task of user is converted to algorithm instruction calls complex data analysis algorithms library, complex data analysis algorithms library bag by API A large amount of independent data analysis algorithms are included, and has been realized to fundamental analysiss algorithm using fundamental analysiss algorithms library calling interface API Storehouse is called, and parallel computation interface API is used for extracted data layer data and is calculated under parallel environment.Application layer then with The pattern of Web, application program or APP provides the analysis application service of independent for domestic consumer.
The result calculated using data analysiss layer, provides the user in the form of services big data Analysis Service, according to The different demand of each user can call one or more analysis modules to complete analysis task, while can increase in data analysiss layer Plus new computing module is meeting new demand.User can both carry out service request, it is also possible to be opened according to itself actual demand The service for sending out new, and big data analysis is carried out based on this framework.
The beneficial effects of the present invention is:Can effectively be lifted to multi-source heterogeneous off-line data using the analysis framework Organizational capacity, it is unified interface on, and using user or data analyst all letters of different data sources need not be understood Breath, is only conducted interviews by semantic interface to lower data source, obtains related data message.Ontology modeling tool is stronger more New and inference service, bottom data amount increase or data resource change when, user need not again pull up global sheet Body Model, it is only necessary to be modified by the body construction to changing content and renewal that application tool is carried and inference service, Just whole data resource can effectively be updated from the overall situation.In terms of data query, semantic technology more hommization and intelligence Energyization, the data retrieval being suitable under large-scale data environment is mainly reflected in the inferential capability that semantic technology possesses, exploitation Personnel can allow user to carry out fuzzy data query, i.e. user without the need for understanding after artificial or semi-artificial setting The accurate definition of data resource, just can find approximate or similar data resource, it is ensured that user is not knowing about overall data Quick lock in and the related data of analysis object in the case of environment.
Description of the drawings
Fig. 1 is the framework diagram of the present invention.
Fig. 2 is the data analysis module call flow chart of the present invention.
Fig. 3 is the data storage and arrangement module call flow chart of the present invention.
Fig. 4 is that present configuration and unstructured data abstraction module are illustrated.Wherein:A () is structural data extraction Module, (b) is unstructured data abstraction module.
Fig. 5 is Data analysis logic flow chart of the present invention.
Specific embodiment
Accompanying drawing is combined below by embodiment further illustrate the present invention.
Embodiment 1:
Analytical framework is divided into into upper and lower two parts centered on data storage layer, lower floor is data storage and integrated part, mainly Carry out ontological construction, RDF to generate and the work such as big data storage, upper strata is using realizing part, to be substantially carried out application request, big The work such as data calculating, semantic query and result displaying.Top section contains altogether 18 main flows, and underclad portion is included altogether 9 main flows, sequence number represents the order of action generation in figure.
By taking Hadoop platform as an example, data storage is divided into RDF data storage and source data storage, RDF data to the present embodiment It is to be stored in Hbase data bases by the instance data and unstructured data and semi-structured data of Ontology learning, structuring Data are stored in Hive data bases in the form of big table.Stream is called based on the data analysis module of semantic web data analytical framework Journey figure is as shown in Figure 2.Whole call flow is related to the application layer of framework, data analysiss layer, semantic layer and data storage layer, altogether 18 overall procedures.Comprise the following steps that:
(1)First application layer user sends application request and selects to lower floor according to demand;
(2)Platform receives user input is parsed after request, is matched using analysis task matching module and Initialization Analysis Task or query task;
(3)Incoming task scheduling modules are selected the call flow of corresponding complex data analysis algorithm and give sound by the instruction Should, there is provided algorithm calls service;
(4)Instruction first has to extract the data resource of bottom, and task scheduling modules will inquire about constraints and pass according to query task Whether the inference engine and matching for passing semantic layer has corresponding data resource and returns result to task scheduling modules;
(5)Then directly pass through flow process if there is no data resource(1)-(3)User is returned to, solution is further generated if existing Analysis model, by the analytical model after matching generation inquiry plan module is passed to;
(6)Corresponding ontology information is passed to into ARQ inquiries under the support of ontology model;
(7)ARQ instructs the RDF data in being stored under HBase for inquiry;
(8)Query task is converted to the instruction for being suitable to distributed networks database query by HMater main frames;
(9)After concrete RDF data is obtained, the effective data table addresses information relevant with query task is returned to into task Scheduler module;
(10)Inquiry data table information is returned, task scheduling modules pass to the parameter that data table addresses and analysis task need Big data table such as Hive and HBase data bases, the dynamic big data resource of inquiry;
(11)And return the data result of inquiry;
(12)Task scheduling modules are according to step(3)Need generate complex data analysis algorithms library job sequence, extract correspondence Algorithm model;
(13)Complicated algorithm storehouse includes a large amount of fundamental analysiss algorithms again, completes to return to result of calculation and basis point after a step is calculated Analysis algorithms library job sequence;
(14)According to sequential calling fundamental analysiss algorithm computation model;
(15)Result of calculation is returned to task scheduling modules;
(16)Flow process(4)-(15)It is one and circulates the flow process for carrying out, the complexity of scale and algorithm depending on analytical data. Obtain task scheduling modules after end product and final data result is passed to into service interface;
(17)Service interface packs data to front end;
(18)Suitable Visualization Model is selected by user, the whole big data analysis process of platform is completed.
Data analysis module call flow explanation
String routine Number Effect Explanation
(1) Send application request and response The analysis request of user input and inquiry constraints are passed to lower module and given and responded
(2) Send service request and response User input is parsed, is matched using analysis task matching module and Initialization Analysis task or query task, select the call flow of corresponding complex data analysis algorithm And give and respond
(3) Send call request and response The service request for having parsed and query argument are passed to into task scheduling modules and the structure of inquiry is returned
(4) Output data inquiry request and response Constraints will be inquired about according to query task and pass to whether semantic net and matching have corresponding data resource and return result to task scheduling modules
(5) Output analytical model Analytical model after matching is passed to into generation inquiry plan module
(6) Output query task Corresponding ontology information is passed to into ARQ inquiries
(7) Send call request RDF data is called from HBase(Corresponding data table addresses)
(8) Output query statement Query statement
(9) Return inquiry data table information The effective data table addresses information relevant with query task is returned to into task scheduling modules.
(10) Return inquiry data table information The parameter that data table addresses and analysis task need is transferred to into big data table
(11) Return inquiry data result Return the data result of inquiry
(12) Complex data analysis algorithms library job sequence is called in input According to(3)Need generate complex data analysis algorithms library job sequence, extract corresponding algorithm computation model
(13) Structure is called in output Return result of calculation and fundamental analysiss algorithms library job sequence
(14) Input fundamental analysiss algorithm call instruction According to(13)Need produce call instruction, extract corresponding algorithm computation model
(15) Return computation structure Return result of calculation
(16) Return request results Return final result
(17) Return service result Data packing is returned to into front end
(18) Results expression The Visualization Model of matched data expression, shows result
Data storage and arrangement module call flow chart based on semantic web data analytical framework is as shown in Figure 3.Fig. 3 is one big The process that data are administered, includes data collection layer, body layer and data storage layer.Comprise the following steps that:
1. classify firstly the need of the static nature to data and describe, data to be integrated are carried out into combing and is written as mapped file Utilize;
2. Ontology Modeling is carried out with reference to domain background;
3. after Ontology Modeling is realized.Dynamic data resource is stored in big data table using data extraction module;
3. mapping engine combines mapped file;
5. static data is extracted;
6. RDF data is generated;
7. RDF data resource is imported under Hadoop environment by RDF data storage,
8. and based on Java API will reflect incoming RDF data and carry out parsing and be easy to be stored in HBase;
9. access of the Hbase to tlv triple data is realized using RDF data point table storage, completes the storage of whole offline big data With organization flow.
Data storage and arrangement module call flow explanation
Flow process sequence number Effect Explanation
Mapped file is write Treat the data of integration and carry out combing and write mapped file
Ontology Modeling Ontology model is set up according to mapped file and domain model
Data are imported Dynamic data is imported in Hbase and Hive data bases
Call mapping engine Mapping engine is called based on mapped file and ontology model
Data map Mapping engine is called to bottom data according to mapped file and maps it onto RDF data
Output RDF data Mapping engine exports the RDF data for having mapped
RDF data is stored RDF data is stored in the HDFS systems of Hadoop clusters
RDF data is parsed Incoming RDF data will be reflected based on Java API carry out parsing and be easy to be stored in HBase
RDF data point table storage In HBase, six tables are set up according to the various combination indexed mode of subject, predicate and object, RDF data is stored in.
The design principle of part of module is described in detail
First in data collection layer, in order that semantic net can fast and effectively access all of data set, to each data set It is wherein important link to build ontology model and realize associating, and in data collection layer, this patent is directed to different data types Three kinds of different pieces of information abstraction modules are employed, its main purpose is:1st, in importing data to new memory space, data are lifted Inquiry velocity;2nd, the static data of data base is extracted, is easy to build ontology model.
Fig. 4 is described in detail data extraction module, and whole link includes altogether 7 steps.In extracted data Before, according to the different data storage storehouse of the different choice of data source types, structural data is stored in SQL data in present case In storehouse, such data base adopts structuralized query mode, such as DB2, SQL/DS, ORACLE, Hive, semi-structured and non-structural Change data to be stored using NoSQL data bases, such as Cassandra, HBase etc..After type of database is chosen, start Data are moved.
Step 1:Artificial or semi-artificial mode is taken, for the feature of source data the extraction of static data is carried out, for Structural data mainly obtains static data by the storage in reading source data and Table Header information.For destructuring and half hitch Structure data, then need manually to extract the feature of such data;
Step 2:From source data, the static data that structural data is extracted includes:Table name, Property Name, source word segment type, Data produce cycle, source table address and source word segment length;The static data that unstructured data is extracted includes:Data length, pass Keyword, tally set, data generation time, source word segment length and crucial word frequency.Semi-structured data belongs to structuring and non-structural Change the combination of data, static data is above a combination of both;
Step 3:Data import component and receive after the instruction that step 2 is completed, and open the importing work to source data;
Step 4:The data of source database are derived;
Step 5:The data of source database are imported in new database;
Step 6:Data import after, according to data new database storage condition auto-returned static data;
Step 7:Static information in new database environment includes:New table address, newer field type, newer field length, new table name Title, data base querying type.
Fig. 5 gives the Data analysis logic flow chart under data analysiss framework of the present invention.Whole framework is appointed according to analysis The usage frequency difference of business is divided into two large divisions:Part I is the analysis task for more frequently using, and the framework is provided The perfect hierarchical level Analysis Service of encapsulation and interface, user only needs to be compiled configuration to the correlation module in Fig. 2, by stream Journey(Ⅰ)Whole analysis task is rapidly completed, idiographic flow is as follows:
By interactive interface, user selects application task, platform to directly invoke corresponding data and algorithm instruction flag symbol, platform Correspondence algorithm is called no longer to user's return course value and by task scheduling modules and finally return that result of calculation, select history The Visualization Model called is that user shows result;Part II is the portion for first or task indefinite analysis task Door is divided into two links according to framework feature, and first link is to be selected based on semantic data, and the maximum feature of the link exists In using semantic technology, analysis personnel can carry out fuzzy data query according to mission requirements, it is allowed to which user is to whole number The data that formula is interacted in the case of uncomprehending according to source are selected, and in semantic query link this patent existing technology can be adopted And software(Such as SPARQL the and ARQ semantic query engines in Jena), after the scope that user have selected analytical data collection, pass through The data that the algorithms library of analysis layer is realized under parallel environment are called to calculate, the part this patent is not clearly defined, can To be chosen whether to carry out parallelization data processing according to platform features, while algorithms library can be in existing big data analysis platform (MapReduce or Spark)Under expanded, the numerical result of ultimate analysis is by the application service interface transmission in Fig. 1 To the user side on foreground, the feasible data visualization model of selection that user can be voluntarily according to data characteristicses.

Claims (1)

1. the large-scale off-line data analysiss framework of semantic net is based on, it is characterised in that be divided into data collection layer, sheet from bottom to top Body layer, data storage layer, semantic layer, data analysiss layer and application layer;Wherein:
Source data is platform exterior data, is used for Platform Analysis and process, with concentration or distributed storage in other data bases Or in other platforms;Including sensing data, text data, form data, network data, view data and other data, institute State source data and be divided into dynamic data and static data;Dynamic data is the quick data for producing change, and such data is generally produced Time interval is shorter, takes substantial amounts of data space;Static data generation time interval is relatively long, is for inhomogeneity Type and the basic data in source, including mathematical logic relation, Data Physical information and data semantic information;
Data collection layer includes that structural data is extracted, semi-structured data is extracted, unstructured data is extracted and artificial data Resource is divided and sorted out;Enterprise or all potential source datas are carried out artificial division and sorted out for building ontology library, structure Change that data pick-up, semi-structured data are extracted, unstructured data is extracted primarily to according to different data types, with reference to Ontology library carries out unified process to the data of respective type, and for data storage layer data pick-up service is provided;Data collection layer By artificial cognition or recognition function is write to the mathematical logic relation in source data, data-voice information and data physical message Three class static datas are identified, and are stored in the form of electronic document or record, and static data is usually structural data;Simultaneously To solid data content, that is, being stored in substantial amounts of dynamic data in data base or other platforms carries out artificial cognition, mainly for The structure of dynamic data, type, size and storage mode, dynamic data includes finishing structure, semi-structured and destructuring class Type, different interface API are write according to the structure of disparate databases and outside platform based on body layer, are carried out structural data and are taken out Take, semi-structured data is extracted and unstructured data is extracted, and be stored in data storage layer;
Body layer is substantially carried out the structure of ontology library, and main foundation including ontology model, mapped file being write and realizing On the one hand the renewal of ontology model, body layer is stored in instances of ontology data in data base, is on the other hand the language of semantic layer Justice retrieval provides support;Based on the Data Integration of semantic net, source data is identified first, then data is mapped as into RDF Triple form, ultimately generates ontology library and supports that SPARQL is inquired about;Body layer is mainly soft using prot é g é according to static data Part builds ontology model and generates ontology library, and D2R engines include mapping engine, ontology model and mapped file, by ontology model Mapped file is generated by artificial, semi- or fully automated mode, mapped file is mainly the physical message of source data and storage The mapping relations of the physical message of layer memory element, mapping engine is embedded in the data extraction module of data collection layer, body Update service interface to be individually present and body layer, be mainly used in the renewal of ontology library;
Data storage layer stores the dynamic data for collecting and static data in distributed memory system, and static data can be with Using structurized data base such as Hbase, dynamic data storage is in the data bases such as such as Hive, HDFS;For big data analysis For, general distributed memory system adopts master/slave framework, and host node is management node, is responsible for record data storage location etc. Information;It is back end from node, is the real physical storage locations of data;
Semantic layer is mainly directed towards big data inquiry and is designed, including inquiry is drawn with the generation of reasoning task, inquiry proxy, inquiry Hold up, the module such as inference engine, semantic primary recipient layer by layer, and please according to semantic parsing and inference function from the request of user Ask and be converted into query task, then call query engine to carry out big data inquiry, most at last result is passed to data analysiss layer and is easy to Follow-up big data is calculated;Semantic layer encapsulates lower module and in the way of interface API as top service, and upper layer application passes through API accesses first the query task maker of semantic layer, and query task is converted to into the SPARQL language of body, is drawn by reasoning Hold up and query engine, inquire about ontology library and return the physical message of corresponding static data content and dynamic data;
Data analysiss layer provides parser according to different big data analysis demands, and this layer is with the pattern of service interface on The application and development of user provides support;Data are called from bottom to lower utilization parallel computation interface, the layer is mainly adjusted by task Spend module to be scheduled analysis task, coordinate the progress of analysis task, wherein using fundamental analysiss algorithms library and complicated number Different algorithms is packaged according to parser storehouse, strengthens the ability of second development and autgmentability of whole system;Data analysiss Layer is realized the information exchange of other modules of this layer and levels module by task scheduling modules;Application service interface provides API will The analysis task of user is converted to algorithm instruction calls complex data analysis algorithms library, and complex data analysis algorithms library is included greatly The independent data analysis algorithm of amount, and realize the tune to fundamental analysiss algorithms library using fundamental analysiss algorithms library calling interface API With parallel computation interface API is used for extracted data layer data and is calculated under parallel environment;Application layer is then with Web, application The pattern of program or APP provides the analysis application service of independent for domestic consumer;
The result calculated using data analysiss layer, provides the user in the form of services big data Analysis Service, according to each use The different demand in family can call one or more analysis modules to complete analysis task, while can increase new in data analysiss layer Computing module meeting new demand;User can both carry out service request, it is also possible to new according to itself actual demand exploitation Service, and big data analysis is carried out based on this framework.
CN201610907501.7A 2016-10-18 2016-10-18 Semantic net based large scale offline data analysis framework Pending CN106570081A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610907501.7A CN106570081A (en) 2016-10-18 2016-10-18 Semantic net based large scale offline data analysis framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610907501.7A CN106570081A (en) 2016-10-18 2016-10-18 Semantic net based large scale offline data analysis framework

Publications (1)

Publication Number Publication Date
CN106570081A true CN106570081A (en) 2017-04-19

Family

ID=58533177

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610907501.7A Pending CN106570081A (en) 2016-10-18 2016-10-18 Semantic net based large scale offline data analysis framework

Country Status (1)

Country Link
CN (1) CN106570081A (en)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106777372A (en) * 2017-01-26 2017-05-31 语义(上海)信息科技有限公司 A kind of honeybee stream device data water conservancy diversion and data method for transformation based on Ontology on Semantic Web
CN107341675A (en) * 2017-07-17 2017-11-10 重庆邮电大学 A kind of intelligent grid remote bill control decision-making framework and method based on semantic knowledge
CN107368586A (en) * 2017-07-24 2017-11-21 华电重工股份有限公司 A kind of multisystem data analysing method and platform
CN107621979A (en) * 2017-10-27 2018-01-23 郑金林 A kind of Development of Students archives big data algorithm and analysis system
CN107657215A (en) * 2017-09-07 2018-02-02 南京师范大学 Indoor action trail movement semantic analytic method based on Passive Infrared Sensor
CN108121778A (en) * 2017-12-14 2018-06-05 浙江航天恒嘉数据科技有限公司 A kind of heterogeneous database exchange and cleaning system and method
CN108459574A (en) * 2018-03-27 2018-08-28 重庆邮电大学 It is a kind of that system is managed based on the semantic field device information with OPC UA
CN108509486A (en) * 2018-02-08 2018-09-07 浙江大学 A kind of safe big data structural management method of intelligent plant multi-source
CN108733726A (en) * 2017-04-24 2018-11-02 西门子(中国)有限公司 Network semantic model reconfiguration system based on dynamic event and method
CN108959349A (en) * 2018-04-23 2018-12-07 厦门快商通信息技术有限公司 A kind of financial audit circular for confirmation system
CN108985531A (en) * 2017-06-01 2018-12-11 中国科学院深圳先进技术研究院 A kind of multimode isomery electric power big data convergence analysis management system and method
CN109145643A (en) * 2018-08-23 2019-01-04 安思瀚 A kind of personal multi-source data management method and system based on private clound
CN109213909A (en) * 2017-09-11 2019-01-15 南京弹跳力信息技术有限公司 A kind of big data analysis system and its analysis method fusion search and calculated
CN109241179A (en) * 2018-08-01 2019-01-18 协同数据技术(深圳)有限公司 Data administering method, system and computer equipment based on data space
CN109558427A (en) * 2018-11-30 2019-04-02 上海找钢网信息科技股份有限公司 Intelligent inquiry system and method based on steel industry data platform
CN109885542A (en) * 2019-02-18 2019-06-14 中国联合网络通信集团有限公司 Item file management method, device and storage medium
CN109976729A (en) * 2019-05-05 2019-07-05 东北大学 One kind depositing calculation and shows globally configurable Data Analysis Software architecture design method
CN109992252A (en) * 2017-12-29 2019-07-09 中移物联网有限公司 A kind of data analysing method based on Internet of Things, terminal, device and storage medium
CN110196923A (en) * 2019-05-07 2019-09-03 中国科学院声学研究所 A kind of multi-source heterogeneous data preprocessing method and system towards undersea detection
CN110275966A (en) * 2019-07-01 2019-09-24 科大讯飞(苏州)科技有限公司 A kind of Knowledge Extraction Method and device
CN110275919A (en) * 2019-06-18 2019-09-24 合肥工业大学 Data integrating method and device
CN110704688A (en) * 2018-07-09 2020-01-17 上海交通大学 Block chain separation storage system based on associated data
CN111415528A (en) * 2019-01-07 2020-07-14 长沙智能驾驶研究院有限公司 Road safety early warning method and device, road side unit and storage medium
CN112579565A (en) * 2020-11-30 2021-03-30 贵州力创科技发展有限公司 Data model management method and system of data analysis engine
CN112817569A (en) * 2021-02-06 2021-05-18 成都飞机工业(集团)有限责任公司 Analysis-oriented data rapid mapping method, equipment and storage medium
CN113111440A (en) * 2021-04-26 2021-07-13 河北交通职业技术学院 Logic relationship-based cluster unmanned aerial vehicle task model construction method
CN114168624A (en) * 2021-12-08 2022-03-11 掌阅科技股份有限公司 Data analysis method, computing device and storage medium
CN115134421A (en) * 2022-05-10 2022-09-30 北京市遥感信息研究所 Multi-source heterogeneous data cross-system cooperative management system and method
CN116738909A (en) * 2023-06-25 2023-09-12 成都电科星拓科技有限公司 Memory integration method of integrated circuit

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8099382B2 (en) * 2007-04-13 2012-01-17 International Business Machines Corporation Method and system for mapping multi-dimensional model to data warehouse schema
CN104809151A (en) * 2015-03-11 2015-07-29 同济大学 Multi-dimension based traffic heterogeneous data integrating method
CN105183834A (en) * 2015-08-31 2015-12-23 上海电科智能系统股份有限公司 Ontology library based transportation big data semantic application service method
CN105701193A (en) * 2016-01-11 2016-06-22 同济大学 Method for rapidly searching for traffic big data dynamic information and application thereof
CN105808734A (en) * 2016-03-10 2016-07-27 同济大学 Semantic web based method for acquiring implicit relationship among steel iron making process knowledge

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8099382B2 (en) * 2007-04-13 2012-01-17 International Business Machines Corporation Method and system for mapping multi-dimensional model to data warehouse schema
CN104809151A (en) * 2015-03-11 2015-07-29 同济大学 Multi-dimension based traffic heterogeneous data integrating method
CN105183834A (en) * 2015-08-31 2015-12-23 上海电科智能系统股份有限公司 Ontology library based transportation big data semantic application service method
CN105701193A (en) * 2016-01-11 2016-06-22 同济大学 Method for rapidly searching for traffic big data dynamic information and application thereof
CN105808734A (en) * 2016-03-10 2016-07-27 同济大学 Semantic web based method for acquiring implicit relationship among steel iron making process knowledge

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
苏刚等: "基于大数据的智能交通分析系统的设计与实现", 《电脑知识与技术》 *

Cited By (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106777372A (en) * 2017-01-26 2017-05-31 语义(上海)信息科技有限公司 A kind of honeybee stream device data water conservancy diversion and data method for transformation based on Ontology on Semantic Web
CN106777372B (en) * 2017-01-26 2019-08-27 语义(上海)信息科技有限公司 A kind of bee stream device data water conservancy diversion and data method for transformation based on Ontology on Semantic Web
CN108733726A (en) * 2017-04-24 2018-11-02 西门子(中国)有限公司 Network semantic model reconfiguration system based on dynamic event and method
CN108733726B (en) * 2017-04-24 2022-03-29 西门子(中国)有限公司 Network semantic model reconstruction system and method based on dynamic events
CN108985531A (en) * 2017-06-01 2018-12-11 中国科学院深圳先进技术研究院 A kind of multimode isomery electric power big data convergence analysis management system and method
CN107341675A (en) * 2017-07-17 2017-11-10 重庆邮电大学 A kind of intelligent grid remote bill control decision-making framework and method based on semantic knowledge
CN107368586A (en) * 2017-07-24 2017-11-21 华电重工股份有限公司 A kind of multisystem data analysing method and platform
CN107368586B (en) * 2017-07-24 2021-01-19 华电重工股份有限公司 Multi-system data analysis method and platform
CN107657215A (en) * 2017-09-07 2018-02-02 南京师范大学 Indoor action trail movement semantic analytic method based on Passive Infrared Sensor
CN107657215B (en) * 2017-09-07 2020-01-21 南京师范大学 Indoor behavior track motion semantic analysis method based on passive infrared sensor
CN109213909A (en) * 2017-09-11 2019-01-15 南京弹跳力信息技术有限公司 A kind of big data analysis system and its analysis method fusion search and calculated
CN107621979A (en) * 2017-10-27 2018-01-23 郑金林 A kind of Development of Students archives big data algorithm and analysis system
CN108121778A (en) * 2017-12-14 2018-06-05 浙江航天恒嘉数据科技有限公司 A kind of heterogeneous database exchange and cleaning system and method
CN109992252B (en) * 2017-12-29 2022-12-16 中移物联网有限公司 Data analysis method, terminal, device and storage medium based on Internet of things
CN109992252A (en) * 2017-12-29 2019-07-09 中移物联网有限公司 A kind of data analysing method based on Internet of Things, terminal, device and storage medium
CN108509486A (en) * 2018-02-08 2018-09-07 浙江大学 A kind of safe big data structural management method of intelligent plant multi-source
CN108459574A (en) * 2018-03-27 2018-08-28 重庆邮电大学 It is a kind of that system is managed based on the semantic field device information with OPC UA
CN108959349A (en) * 2018-04-23 2018-12-07 厦门快商通信息技术有限公司 A kind of financial audit circular for confirmation system
CN110704688A (en) * 2018-07-09 2020-01-17 上海交通大学 Block chain separation storage system based on associated data
CN109241179A (en) * 2018-08-01 2019-01-18 协同数据技术(深圳)有限公司 Data administering method, system and computer equipment based on data space
CN109145643A (en) * 2018-08-23 2019-01-04 安思瀚 A kind of personal multi-source data management method and system based on private clound
CN109558427A (en) * 2018-11-30 2019-04-02 上海找钢网信息科技股份有限公司 Intelligent inquiry system and method based on steel industry data platform
CN111415528A (en) * 2019-01-07 2020-07-14 长沙智能驾驶研究院有限公司 Road safety early warning method and device, road side unit and storage medium
CN109885542A (en) * 2019-02-18 2019-06-14 中国联合网络通信集团有限公司 Item file management method, device and storage medium
CN109976729A (en) * 2019-05-05 2019-07-05 东北大学 One kind depositing calculation and shows globally configurable Data Analysis Software architecture design method
CN109976729B (en) * 2019-05-05 2021-10-22 东北大学 Storage and computing display globally configurable data analysis software architecture design method
CN110196923A (en) * 2019-05-07 2019-09-03 中国科学院声学研究所 A kind of multi-source heterogeneous data preprocessing method and system towards undersea detection
CN110196923B (en) * 2019-05-07 2021-07-30 中国科学院声学研究所 Underwater detection-oriented multi-source heterogeneous data preprocessing method and system
CN110275919A (en) * 2019-06-18 2019-09-24 合肥工业大学 Data integrating method and device
CN110275966A (en) * 2019-07-01 2019-09-24 科大讯飞(苏州)科技有限公司 A kind of Knowledge Extraction Method and device
CN110275966B (en) * 2019-07-01 2021-10-01 科大讯飞(苏州)科技有限公司 Knowledge extraction method and device
CN112579565A (en) * 2020-11-30 2021-03-30 贵州力创科技发展有限公司 Data model management method and system of data analysis engine
CN112817569A (en) * 2021-02-06 2021-05-18 成都飞机工业(集团)有限责任公司 Analysis-oriented data rapid mapping method, equipment and storage medium
CN112817569B (en) * 2021-02-06 2023-10-17 成都飞机工业(集团)有限责任公司 Analysis-oriented data rapid mapping method, equipment and storage medium
CN113111440A (en) * 2021-04-26 2021-07-13 河北交通职业技术学院 Logic relationship-based cluster unmanned aerial vehicle task model construction method
CN113111440B (en) * 2021-04-26 2023-02-03 河北交通职业技术学院 Cluster unmanned aerial vehicle task model construction method based on logical relationship
CN114168624A (en) * 2021-12-08 2022-03-11 掌阅科技股份有限公司 Data analysis method, computing device and storage medium
CN114168624B (en) * 2021-12-08 2022-09-20 掌阅科技股份有限公司 Data analysis method, computing device and storage medium
CN115134421A (en) * 2022-05-10 2022-09-30 北京市遥感信息研究所 Multi-source heterogeneous data cross-system cooperative management system and method
CN115134421B (en) * 2022-05-10 2024-02-20 北京市遥感信息研究所 Multi-source heterogeneous data cross-system collaborative management system and method
CN116738909A (en) * 2023-06-25 2023-09-12 成都电科星拓科技有限公司 Memory integration method of integrated circuit

Similar Documents

Publication Publication Date Title
CN106570081A (en) Semantic net based large scale offline data analysis framework
Ehrenfeld Would industrial ecology exist without sustainability in the background?
Zhang Intelligent Internet of things service based on artificial intelligence technology
Zheng et al. Construction of the ontology-based agricultural knowledge management system
CN101477572B (en) Method and system of dynamic data base based on TDS transition data storage technology
Patil et al. A survey on graph database management techniques for huge unstructured data
CN102929898B (en) The semantic query engine of structured database
CN112182241A (en) Automatic construction method of knowledge graph in field of air traffic control
CN115237937A (en) Distributed collaborative query processing system based on interplanetary file system
Li et al. Adaptation rule learning for case‐based reasoning
Omollo et al. Data modeling techniques used for big data in enterprise networks
Visser et al. Terminology integration for the management of distributed information resources
CN101382950B (en) Body correlation method based on SWRL-Bridge-Peer model
Schueller et al. Stream fusion using reactive programming, LINQ and magic updates
Shi et al. Integration framework with semantic aspect of heterogeneous system based on ontology and ESB
Liu et al. OPSDS: a semantic data integration and service system based on domain ontology
Tejaswi et al. Semantic inference method using ontologies
Zhang et al. Research and Application of Agriculture Knowledge Graph
Monticolo et al. An agent approach to manage heterogeneous and distributed knowledge
EL BOUHISSI et al. 16 Toward Data Integration
Chen et al. Knowledge Encapsulation and Application Based on Domain Knowledge Graph
Geng et al. A Method for Information Management Based on RDF Model and Ontology Technology
Avdeenko et al. Combination of case-based reasoning and data mining through integration with the domain ontology
Mayadewi et al. Scheme mapping for relational database transformation to ontology: A survey
Shao et al. Ontology-based modeling and semantic query for mobile trajectory data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170419

WD01 Invention patent application deemed withdrawn after publication