CN106570081A - Semantic net based large scale offline data analysis framework - Google Patents
Semantic net based large scale offline data analysis framework Download PDFInfo
- Publication number
- CN106570081A CN106570081A CN201610907501.7A CN201610907501A CN106570081A CN 106570081 A CN106570081 A CN 106570081A CN 201610907501 A CN201610907501 A CN 201610907501A CN 106570081 A CN106570081 A CN 106570081A
- Authority
- CN
- China
- Prior art keywords
- data
- layer
- analysis
- semantic
- static
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
Abstract
The invention relates to a semantic net based large scale offline data analysis framework. The large scale offline data analysis framework includes a data acquisition layer, a body layer, a data storage layer, a semantic layer, a data analysis layer and an application layer. A data source includes dynamic data and static data, and the static data includes data and database internal logic semantic and structure type. The static data is established into a body model in the analysis framework; the static data is extracted and modeled, and then the static data orients a user or an upper analysis task in a semantic service manner. The large scale offline data analysis framework can effectively improve the ability to organizing multi-source heterogeneous offline data and has a uniform interface to upper data; and application users or data analysis workers can access a lower data source through a semantic interface without knowing all the information of different data sources, and relevant data information is acquired. The large scale offline data analysis framework can effectively update the whole data source from a global perspective by correction of the body structure having changed content and update and inference service built in an application tool.
Description
Technical field
The present invention relates to a kind of large-scale off-line data analysiss framework based on semantic net.
Background technology
With the maturation of generation information technology, Internet of Things, mobile interchange, cloud computing concept are gradually by personal and institute of enterprise
Receive, substantial amounts of data and information are being increased daily with the quantity of PB levels, how big data processed, are analyzed and have been obtained
The knowledge of value becomes the whole emphasis of each great institutions and company's research.Big data is enterprise's reconstruct value chain under the New Times, is dug
Potential point of economic increase is dug, the valuable source of autonomous innovation is driven.The research of big data also relates to every field, such as traffic,
Medical treatment, finance, the Internet, public administration, industry, service occupation and University Scientific Research.
The process and analysis of same big data resource is also important research topic.The feature of big data first includes:Dimension
Degree disaster, a large amount of different types of data bring difficulty for data analysiss, and traditional analysis method is difficult to process high-dimensional
Data set;Crossing domain, with data resource explosive growth, big data analysis is gradually intended to combine the number of crossing domain
According to resource, need analysis personnel that there is comprehensive domain knowledge;Data structure is complicated, with social networkies, picture and opposite
The use of record data is produced, the storage of a large amount of unstructured datas is processed and analyzed and brings challenges for data analysis system;Data
Relation implies, and the higher-dimension degrees of data of blowout makes analysis personnel be difficult to grasp the meaning of mass data behind, and in mass data
The effectiveness of implicit relation strong influence analysis result;Data variation is quick:A large amount of electronic equipments are with the speed of Millisecond
Data are produced, is that data storage and management proposes new requirement;Data source is not fixed:Ample resources is difficult under big data link
Permanent storage is realized, in the face of constantly producing change and again data storage source, how information system is tackled and be switched fast and not
The data source of disconnected change is the problem that all enterprises face.Value density is low:Mass data, especially sensing data, data volume
It is big but be worth not high.
And process of the existing data system to big data resource has difficulties, it is mainly reflected in:
(1)Data acquisition type ossifys.Traditional data analysis system is easy to by building on data warehouse and data cube
The Data Management Analysis in later stage.But the analysis system autgmentability of this framework is poor, it is difficult to the data that reply constantly changes and increases
Type, and the structure elapsed time of data warehouse, it is difficult to catch quick data variation, make data analysiss lose value;
(2)Data storage is isolated.Because there are a variety of data acquisition equipments in same field, initial data is result in sky
Between on be stored in different physical address, meanwhile, same type data be not standardized yet storage process, be data inspection
Rope and calculating bring very big difficulty;
(3)Data traceability is difficult.In data acquisition, in addition to the relation between the data that collecting device itself is limited,
Clear and definite logical relation is not formulated between most of data, needs are appointed according to different inquiry and calculating when causing data analysiss
Business repeats configuration, and separates source and relevant information of the data resource of storage to inquiring about data and cause obstruction;
(4)Data storage isomerism.Although crowd raises the data resource of system causes mass data data representation and storage organization not
Together, but same data resource is represented.The collection of data and integrating process variation, and different acquisition department only focuses on it
The demand data of body, therefore, the data of different department's collections have very strong semantic ambiguity in same field, cause information to exchange
It is difficult, it is impossible to reuse.
Therefore, new data resource environment needs a kind of new expansible, it is easy to the big data of data organization and management point
Analysis framework, strengthens application and analysis ability of the enterprise to data, farthest lifts the applying value of data.
The content of the invention
It is an object of the invention to provide a kind of large-scale off-line data analysiss framework based on semantic net.
Large-scale off-line data analysiss framework based on semantic net proposed by the present invention, is divided into from bottom to top data acquisition
Layer, body layer, data storage layer, semantic layer, data analysiss layer and application layer;Wherein:
Source data is platform exterior data, is used for Platform Analysis and process, with concentration or distributed storage in other data bases
Or in other platforms;Including sensing data, text data, form data, network data, view data and other data, institute
State source data and be divided into dynamic data and static data;Dynamic data is the quick data for producing change, and such data is generally produced
Time interval is shorter, takes substantial amounts of data space;Static data generation time interval is relatively long, is for inhomogeneity
Type and the basic data in source, including mathematical logic relation, Data Physical information and data semantic information;
Data collection layer includes that structural data is extracted, semi-structured data is extracted, unstructured data is extracted and artificial data
Resource is divided and sorted out;Enterprise or all potential source datas are carried out artificial division and sorted out for building ontology library, structure
Change that data pick-up, semi-structured data are extracted, unstructured data is extracted primarily to according to different data types, with reference to
Ontology library carries out unified process to the data of respective type, and for data storage layer data pick-up service is provided;Data collection layer
By artificial cognition or recognition function is write to the mathematical logic relation in source data, data-voice information and data physical message
Three class static datas are identified, and are stored in the form of electronic document or record, and static data is usually structural data;Simultaneously
To solid data content, that is, being stored in substantial amounts of dynamic data in data base or other platforms carries out artificial cognition, mainly for
The structure of dynamic data, type, size and storage mode, dynamic data includes finishing structure, semi-structured and destructuring class
Type, different interface API are write according to the structure of disparate databases and outside platform based on body layer, are carried out structural data and are taken out
Take, semi-structured data is extracted and unstructured data is extracted, and be stored in data storage layer;
Body layer is substantially carried out the structure of ontology library, and main foundation including ontology model, mapped file being write and realizing
On the one hand the renewal of ontology model, body layer is stored in instances of ontology data in data base, is on the other hand the language of semantic layer
Justice retrieval provides support;Based on the Data Integration of semantic net, source data is identified first, then data is mapped as into RDF
Triple form, ultimately generates ontology library and supports that SPARQL is inquired about;Body layer is mainly soft using prot é g é according to static data
Part builds ontology model and generates ontology library, and D2R engines include mapping engine, ontology model and mapped file, by ontology model
Mapped file is generated by artificial, semi- or fully automated mode, mapped file is mainly the physical message of source data and storage
The mapping relations of the physical message of layer memory element, mapping engine is embedded in the data extraction module of data collection layer, body
Update service interface to be individually present and body layer, be mainly used in the renewal of ontology library;
Data storage layer stores the dynamic data for collecting and static data in distributed memory system, and static data can be with
Using structurized data base such as Hbase, dynamic data storage is in the data bases such as such as Hive, HDFS;For big data analysis
For, general distributed memory system adopts master/slave framework, and host node is management node, is responsible for record data storage location etc.
Information;It is back end from node, is the real physical storage locations of data;
Semantic layer is mainly directed towards big data inquiry and is designed, including inquiry is drawn with the generation of reasoning task, inquiry proxy, inquiry
Hold up, the module such as inference engine, semantic primary recipient layer by layer, and please according to semantic parsing and inference function from the request of user
Ask and be converted into query task, then call query engine to carry out big data inquiry, most at last result is passed to data analysiss layer and is easy to
Follow-up big data is calculated.Semantic layer encapsulates lower module and in the way of interface API as top service, and upper layer application passes through
API accesses first the query task maker of semantic layer, and query task is converted to into the SPARQL language of body, is drawn by reasoning
Hold up and query engine, inquire about ontology library and return the physical message of corresponding static data content and dynamic data.
Data analysiss layer provides parser according to different big data analysis demands, and this layer is on the mould of service interface
Formula provides support for the application and development of user.Data are called from bottom to lower utilization parallel computation interface, the layer is mainly by appointing
Business scheduler module coordinates the progress of analysis task being scheduled to analysis task, wherein using fundamental analysiss algorithms library and multiple
Miscellaneous data analysis algorithm storehouse is packaged to different algorithms, strengthens the ability of second development and autgmentability of whole system.Data
Analysis layer is realized the information exchange of other modules of this layer and levels module by task scheduling modules.Application service interface is provided
The analysis task of user is converted to algorithm instruction calls complex data analysis algorithms library, complex data analysis algorithms library bag by API
A large amount of independent data analysis algorithms are included, and has been realized to fundamental analysiss algorithm using fundamental analysiss algorithms library calling interface API
Storehouse is called, and parallel computation interface API is used for extracted data layer data and is calculated under parallel environment.Application layer then with
The pattern of Web, application program or APP provides the analysis application service of independent for domestic consumer.
The result calculated using data analysiss layer, provides the user in the form of services big data Analysis Service, according to
The different demand of each user can call one or more analysis modules to complete analysis task, while can increase in data analysiss layer
Plus new computing module is meeting new demand.User can both carry out service request, it is also possible to be opened according to itself actual demand
The service for sending out new, and big data analysis is carried out based on this framework.
The beneficial effects of the present invention is:Can effectively be lifted to multi-source heterogeneous off-line data using the analysis framework
Organizational capacity, it is unified interface on, and using user or data analyst all letters of different data sources need not be understood
Breath, is only conducted interviews by semantic interface to lower data source, obtains related data message.Ontology modeling tool is stronger more
New and inference service, bottom data amount increase or data resource change when, user need not again pull up global sheet
Body Model, it is only necessary to be modified by the body construction to changing content and renewal that application tool is carried and inference service,
Just whole data resource can effectively be updated from the overall situation.In terms of data query, semantic technology more hommization and intelligence
Energyization, the data retrieval being suitable under large-scale data environment is mainly reflected in the inferential capability that semantic technology possesses, exploitation
Personnel can allow user to carry out fuzzy data query, i.e. user without the need for understanding after artificial or semi-artificial setting
The accurate definition of data resource, just can find approximate or similar data resource, it is ensured that user is not knowing about overall data
Quick lock in and the related data of analysis object in the case of environment.
Description of the drawings
Fig. 1 is the framework diagram of the present invention.
Fig. 2 is the data analysis module call flow chart of the present invention.
Fig. 3 is the data storage and arrangement module call flow chart of the present invention.
Fig. 4 is that present configuration and unstructured data abstraction module are illustrated.Wherein:A () is structural data extraction
Module, (b) is unstructured data abstraction module.
Fig. 5 is Data analysis logic flow chart of the present invention.
Specific embodiment
Accompanying drawing is combined below by embodiment further illustrate the present invention.
Embodiment 1:
Analytical framework is divided into into upper and lower two parts centered on data storage layer, lower floor is data storage and integrated part, mainly
Carry out ontological construction, RDF to generate and the work such as big data storage, upper strata is using realizing part, to be substantially carried out application request, big
The work such as data calculating, semantic query and result displaying.Top section contains altogether 18 main flows, and underclad portion is included altogether
9 main flows, sequence number represents the order of action generation in figure.
By taking Hadoop platform as an example, data storage is divided into RDF data storage and source data storage, RDF data to the present embodiment
It is to be stored in Hbase data bases by the instance data and unstructured data and semi-structured data of Ontology learning, structuring
Data are stored in Hive data bases in the form of big table.Stream is called based on the data analysis module of semantic web data analytical framework
Journey figure is as shown in Figure 2.Whole call flow is related to the application layer of framework, data analysiss layer, semantic layer and data storage layer, altogether
18 overall procedures.Comprise the following steps that:
(1)First application layer user sends application request and selects to lower floor according to demand;
(2)Platform receives user input is parsed after request, is matched using analysis task matching module and Initialization Analysis
Task or query task;
(3)Incoming task scheduling modules are selected the call flow of corresponding complex data analysis algorithm and give sound by the instruction
Should, there is provided algorithm calls service;
(4)Instruction first has to extract the data resource of bottom, and task scheduling modules will inquire about constraints and pass according to query task
Whether the inference engine and matching for passing semantic layer has corresponding data resource and returns result to task scheduling modules;
(5)Then directly pass through flow process if there is no data resource(1)-(3)User is returned to, solution is further generated if existing
Analysis model, by the analytical model after matching generation inquiry plan module is passed to;
(6)Corresponding ontology information is passed to into ARQ inquiries under the support of ontology model;
(7)ARQ instructs the RDF data in being stored under HBase for inquiry;
(8)Query task is converted to the instruction for being suitable to distributed networks database query by HMater main frames;
(9)After concrete RDF data is obtained, the effective data table addresses information relevant with query task is returned to into task
Scheduler module;
(10)Inquiry data table information is returned, task scheduling modules pass to the parameter that data table addresses and analysis task need
Big data table such as Hive and HBase data bases, the dynamic big data resource of inquiry;
(11)And return the data result of inquiry;
(12)Task scheduling modules are according to step(3)Need generate complex data analysis algorithms library job sequence, extract correspondence
Algorithm model;
(13)Complicated algorithm storehouse includes a large amount of fundamental analysiss algorithms again, completes to return to result of calculation and basis point after a step is calculated
Analysis algorithms library job sequence;
(14)According to sequential calling fundamental analysiss algorithm computation model;
(15)Result of calculation is returned to task scheduling modules;
(16)Flow process(4)-(15)It is one and circulates the flow process for carrying out, the complexity of scale and algorithm depending on analytical data.
Obtain task scheduling modules after end product and final data result is passed to into service interface;
(17)Service interface packs data to front end;
(18)Suitable Visualization Model is selected by user, the whole big data analysis process of platform is completed.
Data analysis module call flow explanation
String routine Number | Effect | Explanation |
(1) | Send application request and response | The analysis request of user input and inquiry constraints are passed to lower module and given and responded |
(2) | Send service request and response | User input is parsed, is matched using analysis task matching module and Initialization Analysis task or query task, select the call flow of corresponding complex data analysis algorithm And give and respond |
(3) | Send call request and response | The service request for having parsed and query argument are passed to into task scheduling modules and the structure of inquiry is returned |
(4) | Output data inquiry request and response | Constraints will be inquired about according to query task and pass to whether semantic net and matching have corresponding data resource and return result to task scheduling modules |
(5) | Output analytical model | Analytical model after matching is passed to into generation inquiry plan module |
(6) | Output query task | Corresponding ontology information is passed to into ARQ inquiries |
(7) | Send call request | RDF data is called from HBase(Corresponding data table addresses) |
(8) | Output query statement | Query statement |
(9) | Return inquiry data table information | The effective data table addresses information relevant with query task is returned to into task scheduling modules. |
(10) | Return inquiry data table information | The parameter that data table addresses and analysis task need is transferred to into big data table |
(11) | Return inquiry data result | Return the data result of inquiry |
(12) | Complex data analysis algorithms library job sequence is called in input | According to(3)Need generate complex data analysis algorithms library job sequence, extract corresponding algorithm computation model |
(13) | Structure is called in output | Return result of calculation and fundamental analysiss algorithms library job sequence |
(14) | Input fundamental analysiss algorithm call instruction | According to(13)Need produce call instruction, extract corresponding algorithm computation model |
(15) | Return computation structure | Return result of calculation |
(16) | Return request results | Return final result |
(17) | Return service result | Data packing is returned to into front end |
(18) | Results expression | The Visualization Model of matched data expression, shows result |
Data storage and arrangement module call flow chart based on semantic web data analytical framework is as shown in Figure 3.Fig. 3 is one big
The process that data are administered, includes data collection layer, body layer and data storage layer.Comprise the following steps that:
1. classify firstly the need of the static nature to data and describe, data to be integrated are carried out into combing and is written as mapped file
Utilize;
2. Ontology Modeling is carried out with reference to domain background;
3. after Ontology Modeling is realized.Dynamic data resource is stored in big data table using data extraction module;
3. mapping engine combines mapped file;
5. static data is extracted;
6. RDF data is generated;
7. RDF data resource is imported under Hadoop environment by RDF data storage,
8. and based on Java API will reflect incoming RDF data and carry out parsing and be easy to be stored in HBase;
9. access of the Hbase to tlv triple data is realized using RDF data point table storage, completes the storage of whole offline big data
With organization flow.
Data storage and arrangement module call flow explanation
Flow process sequence number | Effect | Explanation |
① | Mapped file is write | Treat the data of integration and carry out combing and write mapped file |
② | Ontology Modeling | Ontology model is set up according to mapped file and domain model |
③ | Data are imported | Dynamic data is imported in Hbase and Hive data bases |
④ | Call mapping engine | Mapping engine is called based on mapped file and ontology model |
⑤ | Data map | Mapping engine is called to bottom data according to mapped file and maps it onto RDF data |
⑥ | Output RDF data | Mapping engine exports the RDF data for having mapped |
⑦ | RDF data is stored | RDF data is stored in the HDFS systems of Hadoop clusters |
⑧ | RDF data is parsed | Incoming RDF data will be reflected based on Java API carry out parsing and be easy to be stored in HBase |
⑨ | RDF data point table storage | In HBase, six tables are set up according to the various combination indexed mode of subject, predicate and object, RDF data is stored in. |
The design principle of part of module is described in detail
First in data collection layer, in order that semantic net can fast and effectively access all of data set, to each data set
It is wherein important link to build ontology model and realize associating, and in data collection layer, this patent is directed to different data types
Three kinds of different pieces of information abstraction modules are employed, its main purpose is:1st, in importing data to new memory space, data are lifted
Inquiry velocity;2nd, the static data of data base is extracted, is easy to build ontology model.
Fig. 4 is described in detail data extraction module, and whole link includes altogether 7 steps.In extracted data
Before, according to the different data storage storehouse of the different choice of data source types, structural data is stored in SQL data in present case
In storehouse, such data base adopts structuralized query mode, such as DB2, SQL/DS, ORACLE, Hive, semi-structured and non-structural
Change data to be stored using NoSQL data bases, such as Cassandra, HBase etc..After type of database is chosen, start
Data are moved.
Step 1:Artificial or semi-artificial mode is taken, for the feature of source data the extraction of static data is carried out, for
Structural data mainly obtains static data by the storage in reading source data and Table Header information.For destructuring and half hitch
Structure data, then need manually to extract the feature of such data;
Step 2:From source data, the static data that structural data is extracted includes:Table name, Property Name, source word segment type,
Data produce cycle, source table address and source word segment length;The static data that unstructured data is extracted includes:Data length, pass
Keyword, tally set, data generation time, source word segment length and crucial word frequency.Semi-structured data belongs to structuring and non-structural
Change the combination of data, static data is above a combination of both;
Step 3:Data import component and receive after the instruction that step 2 is completed, and open the importing work to source data;
Step 4:The data of source database are derived;
Step 5:The data of source database are imported in new database;
Step 6:Data import after, according to data new database storage condition auto-returned static data;
Step 7:Static information in new database environment includes:New table address, newer field type, newer field length, new table name
Title, data base querying type.
Fig. 5 gives the Data analysis logic flow chart under data analysiss framework of the present invention.Whole framework is appointed according to analysis
The usage frequency difference of business is divided into two large divisions:Part I is the analysis task for more frequently using, and the framework is provided
The perfect hierarchical level Analysis Service of encapsulation and interface, user only needs to be compiled configuration to the correlation module in Fig. 2, by stream
Journey(Ⅰ)Whole analysis task is rapidly completed, idiographic flow is as follows:
By interactive interface, user selects application task, platform to directly invoke corresponding data and algorithm instruction flag symbol, platform
Correspondence algorithm is called no longer to user's return course value and by task scheduling modules and finally return that result of calculation, select history
The Visualization Model called is that user shows result;Part II is the portion for first or task indefinite analysis task
Door is divided into two links according to framework feature, and first link is to be selected based on semantic data, and the maximum feature of the link exists
In using semantic technology, analysis personnel can carry out fuzzy data query according to mission requirements, it is allowed to which user is to whole number
The data that formula is interacted in the case of uncomprehending according to source are selected, and in semantic query link this patent existing technology can be adopted
And software(Such as SPARQL the and ARQ semantic query engines in Jena), after the scope that user have selected analytical data collection, pass through
The data that the algorithms library of analysis layer is realized under parallel environment are called to calculate, the part this patent is not clearly defined, can
To be chosen whether to carry out parallelization data processing according to platform features, while algorithms library can be in existing big data analysis platform
(MapReduce or Spark)Under expanded, the numerical result of ultimate analysis is by the application service interface transmission in Fig. 1
To the user side on foreground, the feasible data visualization model of selection that user can be voluntarily according to data characteristicses.
Claims (1)
1. the large-scale off-line data analysiss framework of semantic net is based on, it is characterised in that be divided into data collection layer, sheet from bottom to top
Body layer, data storage layer, semantic layer, data analysiss layer and application layer;Wherein:
Source data is platform exterior data, is used for Platform Analysis and process, with concentration or distributed storage in other data bases
Or in other platforms;Including sensing data, text data, form data, network data, view data and other data, institute
State source data and be divided into dynamic data and static data;Dynamic data is the quick data for producing change, and such data is generally produced
Time interval is shorter, takes substantial amounts of data space;Static data generation time interval is relatively long, is for inhomogeneity
Type and the basic data in source, including mathematical logic relation, Data Physical information and data semantic information;
Data collection layer includes that structural data is extracted, semi-structured data is extracted, unstructured data is extracted and artificial data
Resource is divided and sorted out;Enterprise or all potential source datas are carried out artificial division and sorted out for building ontology library, structure
Change that data pick-up, semi-structured data are extracted, unstructured data is extracted primarily to according to different data types, with reference to
Ontology library carries out unified process to the data of respective type, and for data storage layer data pick-up service is provided;Data collection layer
By artificial cognition or recognition function is write to the mathematical logic relation in source data, data-voice information and data physical message
Three class static datas are identified, and are stored in the form of electronic document or record, and static data is usually structural data;Simultaneously
To solid data content, that is, being stored in substantial amounts of dynamic data in data base or other platforms carries out artificial cognition, mainly for
The structure of dynamic data, type, size and storage mode, dynamic data includes finishing structure, semi-structured and destructuring class
Type, different interface API are write according to the structure of disparate databases and outside platform based on body layer, are carried out structural data and are taken out
Take, semi-structured data is extracted and unstructured data is extracted, and be stored in data storage layer;
Body layer is substantially carried out the structure of ontology library, and main foundation including ontology model, mapped file being write and realizing
On the one hand the renewal of ontology model, body layer is stored in instances of ontology data in data base, is on the other hand the language of semantic layer
Justice retrieval provides support;Based on the Data Integration of semantic net, source data is identified first, then data is mapped as into RDF
Triple form, ultimately generates ontology library and supports that SPARQL is inquired about;Body layer is mainly soft using prot é g é according to static data
Part builds ontology model and generates ontology library, and D2R engines include mapping engine, ontology model and mapped file, by ontology model
Mapped file is generated by artificial, semi- or fully automated mode, mapped file is mainly the physical message of source data and storage
The mapping relations of the physical message of layer memory element, mapping engine is embedded in the data extraction module of data collection layer, body
Update service interface to be individually present and body layer, be mainly used in the renewal of ontology library;
Data storage layer stores the dynamic data for collecting and static data in distributed memory system, and static data can be with
Using structurized data base such as Hbase, dynamic data storage is in the data bases such as such as Hive, HDFS;For big data analysis
For, general distributed memory system adopts master/slave framework, and host node is management node, is responsible for record data storage location etc.
Information;It is back end from node, is the real physical storage locations of data;
Semantic layer is mainly directed towards big data inquiry and is designed, including inquiry is drawn with the generation of reasoning task, inquiry proxy, inquiry
Hold up, the module such as inference engine, semantic primary recipient layer by layer, and please according to semantic parsing and inference function from the request of user
Ask and be converted into query task, then call query engine to carry out big data inquiry, most at last result is passed to data analysiss layer and is easy to
Follow-up big data is calculated;Semantic layer encapsulates lower module and in the way of interface API as top service, and upper layer application passes through
API accesses first the query task maker of semantic layer, and query task is converted to into the SPARQL language of body, is drawn by reasoning
Hold up and query engine, inquire about ontology library and return the physical message of corresponding static data content and dynamic data;
Data analysiss layer provides parser according to different big data analysis demands, and this layer is with the pattern of service interface on
The application and development of user provides support;Data are called from bottom to lower utilization parallel computation interface, the layer is mainly adjusted by task
Spend module to be scheduled analysis task, coordinate the progress of analysis task, wherein using fundamental analysiss algorithms library and complicated number
Different algorithms is packaged according to parser storehouse, strengthens the ability of second development and autgmentability of whole system;Data analysiss
Layer is realized the information exchange of other modules of this layer and levels module by task scheduling modules;Application service interface provides API will
The analysis task of user is converted to algorithm instruction calls complex data analysis algorithms library, and complex data analysis algorithms library is included greatly
The independent data analysis algorithm of amount, and realize the tune to fundamental analysiss algorithms library using fundamental analysiss algorithms library calling interface API
With parallel computation interface API is used for extracted data layer data and is calculated under parallel environment;Application layer is then with Web, application
The pattern of program or APP provides the analysis application service of independent for domestic consumer;
The result calculated using data analysiss layer, provides the user in the form of services big data Analysis Service, according to each use
The different demand in family can call one or more analysis modules to complete analysis task, while can increase new in data analysiss layer
Computing module meeting new demand;User can both carry out service request, it is also possible to new according to itself actual demand exploitation
Service, and big data analysis is carried out based on this framework.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610907501.7A CN106570081A (en) | 2016-10-18 | 2016-10-18 | Semantic net based large scale offline data analysis framework |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610907501.7A CN106570081A (en) | 2016-10-18 | 2016-10-18 | Semantic net based large scale offline data analysis framework |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106570081A true CN106570081A (en) | 2017-04-19 |
Family
ID=58533177
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610907501.7A Pending CN106570081A (en) | 2016-10-18 | 2016-10-18 | Semantic net based large scale offline data analysis framework |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106570081A (en) |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106777372A (en) * | 2017-01-26 | 2017-05-31 | 语义(上海)信息科技有限公司 | A kind of honeybee stream device data water conservancy diversion and data method for transformation based on Ontology on Semantic Web |
CN107341675A (en) * | 2017-07-17 | 2017-11-10 | 重庆邮电大学 | A kind of intelligent grid remote bill control decision-making framework and method based on semantic knowledge |
CN107368586A (en) * | 2017-07-24 | 2017-11-21 | 华电重工股份有限公司 | A kind of multisystem data analysing method and platform |
CN107621979A (en) * | 2017-10-27 | 2018-01-23 | 郑金林 | A kind of Development of Students archives big data algorithm and analysis system |
CN107657215A (en) * | 2017-09-07 | 2018-02-02 | 南京师范大学 | Indoor action trail movement semantic analytic method based on Passive Infrared Sensor |
CN108121778A (en) * | 2017-12-14 | 2018-06-05 | 浙江航天恒嘉数据科技有限公司 | A kind of heterogeneous database exchange and cleaning system and method |
CN108459574A (en) * | 2018-03-27 | 2018-08-28 | 重庆邮电大学 | It is a kind of that system is managed based on the semantic field device information with OPC UA |
CN108509486A (en) * | 2018-02-08 | 2018-09-07 | 浙江大学 | A kind of safe big data structural management method of intelligent plant multi-source |
CN108733726A (en) * | 2017-04-24 | 2018-11-02 | 西门子(中国)有限公司 | Network semantic model reconfiguration system based on dynamic event and method |
CN108959349A (en) * | 2018-04-23 | 2018-12-07 | 厦门快商通信息技术有限公司 | A kind of financial audit circular for confirmation system |
CN108985531A (en) * | 2017-06-01 | 2018-12-11 | 中国科学院深圳先进技术研究院 | A kind of multimode isomery electric power big data convergence analysis management system and method |
CN109145643A (en) * | 2018-08-23 | 2019-01-04 | 安思瀚 | A kind of personal multi-source data management method and system based on private clound |
CN109213909A (en) * | 2017-09-11 | 2019-01-15 | 南京弹跳力信息技术有限公司 | A kind of big data analysis system and its analysis method fusion search and calculated |
CN109241179A (en) * | 2018-08-01 | 2019-01-18 | 协同数据技术(深圳)有限公司 | Data administering method, system and computer equipment based on data space |
CN109558427A (en) * | 2018-11-30 | 2019-04-02 | 上海找钢网信息科技股份有限公司 | Intelligent inquiry system and method based on steel industry data platform |
CN109885542A (en) * | 2019-02-18 | 2019-06-14 | 中国联合网络通信集团有限公司 | Item file management method, device and storage medium |
CN109976729A (en) * | 2019-05-05 | 2019-07-05 | 东北大学 | One kind depositing calculation and shows globally configurable Data Analysis Software architecture design method |
CN109992252A (en) * | 2017-12-29 | 2019-07-09 | 中移物联网有限公司 | A kind of data analysing method based on Internet of Things, terminal, device and storage medium |
CN110196923A (en) * | 2019-05-07 | 2019-09-03 | 中国科学院声学研究所 | A kind of multi-source heterogeneous data preprocessing method and system towards undersea detection |
CN110275966A (en) * | 2019-07-01 | 2019-09-24 | 科大讯飞(苏州)科技有限公司 | A kind of Knowledge Extraction Method and device |
CN110275919A (en) * | 2019-06-18 | 2019-09-24 | 合肥工业大学 | Data integrating method and device |
CN110704688A (en) * | 2018-07-09 | 2020-01-17 | 上海交通大学 | Block chain separation storage system based on associated data |
CN111415528A (en) * | 2019-01-07 | 2020-07-14 | 长沙智能驾驶研究院有限公司 | Road safety early warning method and device, road side unit and storage medium |
CN112579565A (en) * | 2020-11-30 | 2021-03-30 | 贵州力创科技发展有限公司 | Data model management method and system of data analysis engine |
CN112817569A (en) * | 2021-02-06 | 2021-05-18 | 成都飞机工业(集团)有限责任公司 | Analysis-oriented data rapid mapping method, equipment and storage medium |
CN113111440A (en) * | 2021-04-26 | 2021-07-13 | 河北交通职业技术学院 | Logic relationship-based cluster unmanned aerial vehicle task model construction method |
CN114168624A (en) * | 2021-12-08 | 2022-03-11 | 掌阅科技股份有限公司 | Data analysis method, computing device and storage medium |
CN115134421A (en) * | 2022-05-10 | 2022-09-30 | 北京市遥感信息研究所 | Multi-source heterogeneous data cross-system cooperative management system and method |
CN116738909A (en) * | 2023-06-25 | 2023-09-12 | 成都电科星拓科技有限公司 | Memory integration method of integrated circuit |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8099382B2 (en) * | 2007-04-13 | 2012-01-17 | International Business Machines Corporation | Method and system for mapping multi-dimensional model to data warehouse schema |
CN104809151A (en) * | 2015-03-11 | 2015-07-29 | 同济大学 | Multi-dimension based traffic heterogeneous data integrating method |
CN105183834A (en) * | 2015-08-31 | 2015-12-23 | 上海电科智能系统股份有限公司 | Ontology library based transportation big data semantic application service method |
CN105701193A (en) * | 2016-01-11 | 2016-06-22 | 同济大学 | Method for rapidly searching for traffic big data dynamic information and application thereof |
CN105808734A (en) * | 2016-03-10 | 2016-07-27 | 同济大学 | Semantic web based method for acquiring implicit relationship among steel iron making process knowledge |
-
2016
- 2016-10-18 CN CN201610907501.7A patent/CN106570081A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8099382B2 (en) * | 2007-04-13 | 2012-01-17 | International Business Machines Corporation | Method and system for mapping multi-dimensional model to data warehouse schema |
CN104809151A (en) * | 2015-03-11 | 2015-07-29 | 同济大学 | Multi-dimension based traffic heterogeneous data integrating method |
CN105183834A (en) * | 2015-08-31 | 2015-12-23 | 上海电科智能系统股份有限公司 | Ontology library based transportation big data semantic application service method |
CN105701193A (en) * | 2016-01-11 | 2016-06-22 | 同济大学 | Method for rapidly searching for traffic big data dynamic information and application thereof |
CN105808734A (en) * | 2016-03-10 | 2016-07-27 | 同济大学 | Semantic web based method for acquiring implicit relationship among steel iron making process knowledge |
Non-Patent Citations (1)
Title |
---|
苏刚等: "基于大数据的智能交通分析系统的设计与实现", 《电脑知识与技术》 * |
Cited By (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106777372A (en) * | 2017-01-26 | 2017-05-31 | 语义(上海)信息科技有限公司 | A kind of honeybee stream device data water conservancy diversion and data method for transformation based on Ontology on Semantic Web |
CN106777372B (en) * | 2017-01-26 | 2019-08-27 | 语义(上海)信息科技有限公司 | A kind of bee stream device data water conservancy diversion and data method for transformation based on Ontology on Semantic Web |
CN108733726A (en) * | 2017-04-24 | 2018-11-02 | 西门子(中国)有限公司 | Network semantic model reconfiguration system based on dynamic event and method |
CN108733726B (en) * | 2017-04-24 | 2022-03-29 | 西门子(中国)有限公司 | Network semantic model reconstruction system and method based on dynamic events |
CN108985531A (en) * | 2017-06-01 | 2018-12-11 | 中国科学院深圳先进技术研究院 | A kind of multimode isomery electric power big data convergence analysis management system and method |
CN107341675A (en) * | 2017-07-17 | 2017-11-10 | 重庆邮电大学 | A kind of intelligent grid remote bill control decision-making framework and method based on semantic knowledge |
CN107368586A (en) * | 2017-07-24 | 2017-11-21 | 华电重工股份有限公司 | A kind of multisystem data analysing method and platform |
CN107368586B (en) * | 2017-07-24 | 2021-01-19 | 华电重工股份有限公司 | Multi-system data analysis method and platform |
CN107657215A (en) * | 2017-09-07 | 2018-02-02 | 南京师范大学 | Indoor action trail movement semantic analytic method based on Passive Infrared Sensor |
CN107657215B (en) * | 2017-09-07 | 2020-01-21 | 南京师范大学 | Indoor behavior track motion semantic analysis method based on passive infrared sensor |
CN109213909A (en) * | 2017-09-11 | 2019-01-15 | 南京弹跳力信息技术有限公司 | A kind of big data analysis system and its analysis method fusion search and calculated |
CN107621979A (en) * | 2017-10-27 | 2018-01-23 | 郑金林 | A kind of Development of Students archives big data algorithm and analysis system |
CN108121778A (en) * | 2017-12-14 | 2018-06-05 | 浙江航天恒嘉数据科技有限公司 | A kind of heterogeneous database exchange and cleaning system and method |
CN109992252B (en) * | 2017-12-29 | 2022-12-16 | 中移物联网有限公司 | Data analysis method, terminal, device and storage medium based on Internet of things |
CN109992252A (en) * | 2017-12-29 | 2019-07-09 | 中移物联网有限公司 | A kind of data analysing method based on Internet of Things, terminal, device and storage medium |
CN108509486A (en) * | 2018-02-08 | 2018-09-07 | 浙江大学 | A kind of safe big data structural management method of intelligent plant multi-source |
CN108459574A (en) * | 2018-03-27 | 2018-08-28 | 重庆邮电大学 | It is a kind of that system is managed based on the semantic field device information with OPC UA |
CN108959349A (en) * | 2018-04-23 | 2018-12-07 | 厦门快商通信息技术有限公司 | A kind of financial audit circular for confirmation system |
CN110704688A (en) * | 2018-07-09 | 2020-01-17 | 上海交通大学 | Block chain separation storage system based on associated data |
CN109241179A (en) * | 2018-08-01 | 2019-01-18 | 协同数据技术(深圳)有限公司 | Data administering method, system and computer equipment based on data space |
CN109145643A (en) * | 2018-08-23 | 2019-01-04 | 安思瀚 | A kind of personal multi-source data management method and system based on private clound |
CN109558427A (en) * | 2018-11-30 | 2019-04-02 | 上海找钢网信息科技股份有限公司 | Intelligent inquiry system and method based on steel industry data platform |
CN111415528A (en) * | 2019-01-07 | 2020-07-14 | 长沙智能驾驶研究院有限公司 | Road safety early warning method and device, road side unit and storage medium |
CN109885542A (en) * | 2019-02-18 | 2019-06-14 | 中国联合网络通信集团有限公司 | Item file management method, device and storage medium |
CN109976729A (en) * | 2019-05-05 | 2019-07-05 | 东北大学 | One kind depositing calculation and shows globally configurable Data Analysis Software architecture design method |
CN109976729B (en) * | 2019-05-05 | 2021-10-22 | 东北大学 | Storage and computing display globally configurable data analysis software architecture design method |
CN110196923A (en) * | 2019-05-07 | 2019-09-03 | 中国科学院声学研究所 | A kind of multi-source heterogeneous data preprocessing method and system towards undersea detection |
CN110196923B (en) * | 2019-05-07 | 2021-07-30 | 中国科学院声学研究所 | Underwater detection-oriented multi-source heterogeneous data preprocessing method and system |
CN110275919A (en) * | 2019-06-18 | 2019-09-24 | 合肥工业大学 | Data integrating method and device |
CN110275966A (en) * | 2019-07-01 | 2019-09-24 | 科大讯飞(苏州)科技有限公司 | A kind of Knowledge Extraction Method and device |
CN110275966B (en) * | 2019-07-01 | 2021-10-01 | 科大讯飞(苏州)科技有限公司 | Knowledge extraction method and device |
CN112579565A (en) * | 2020-11-30 | 2021-03-30 | 贵州力创科技发展有限公司 | Data model management method and system of data analysis engine |
CN112817569A (en) * | 2021-02-06 | 2021-05-18 | 成都飞机工业(集团)有限责任公司 | Analysis-oriented data rapid mapping method, equipment and storage medium |
CN112817569B (en) * | 2021-02-06 | 2023-10-17 | 成都飞机工业(集团)有限责任公司 | Analysis-oriented data rapid mapping method, equipment and storage medium |
CN113111440A (en) * | 2021-04-26 | 2021-07-13 | 河北交通职业技术学院 | Logic relationship-based cluster unmanned aerial vehicle task model construction method |
CN113111440B (en) * | 2021-04-26 | 2023-02-03 | 河北交通职业技术学院 | Cluster unmanned aerial vehicle task model construction method based on logical relationship |
CN114168624A (en) * | 2021-12-08 | 2022-03-11 | 掌阅科技股份有限公司 | Data analysis method, computing device and storage medium |
CN114168624B (en) * | 2021-12-08 | 2022-09-20 | 掌阅科技股份有限公司 | Data analysis method, computing device and storage medium |
CN115134421A (en) * | 2022-05-10 | 2022-09-30 | 北京市遥感信息研究所 | Multi-source heterogeneous data cross-system cooperative management system and method |
CN115134421B (en) * | 2022-05-10 | 2024-02-20 | 北京市遥感信息研究所 | Multi-source heterogeneous data cross-system collaborative management system and method |
CN116738909A (en) * | 2023-06-25 | 2023-09-12 | 成都电科星拓科技有限公司 | Memory integration method of integrated circuit |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106570081A (en) | Semantic net based large scale offline data analysis framework | |
Ehrenfeld | Would industrial ecology exist without sustainability in the background? | |
Zhang | Intelligent Internet of things service based on artificial intelligence technology | |
Zheng et al. | Construction of the ontology-based agricultural knowledge management system | |
CN101477572B (en) | Method and system of dynamic data base based on TDS transition data storage technology | |
Patil et al. | A survey on graph database management techniques for huge unstructured data | |
CN102929898B (en) | The semantic query engine of structured database | |
CN112182241A (en) | Automatic construction method of knowledge graph in field of air traffic control | |
CN115237937A (en) | Distributed collaborative query processing system based on interplanetary file system | |
Li et al. | Adaptation rule learning for case‐based reasoning | |
Omollo et al. | Data modeling techniques used for big data in enterprise networks | |
Visser et al. | Terminology integration for the management of distributed information resources | |
CN101382950B (en) | Body correlation method based on SWRL-Bridge-Peer model | |
Schueller et al. | Stream fusion using reactive programming, LINQ and magic updates | |
Shi et al. | Integration framework with semantic aspect of heterogeneous system based on ontology and ESB | |
Liu et al. | OPSDS: a semantic data integration and service system based on domain ontology | |
Tejaswi et al. | Semantic inference method using ontologies | |
Zhang et al. | Research and Application of Agriculture Knowledge Graph | |
Monticolo et al. | An agent approach to manage heterogeneous and distributed knowledge | |
EL BOUHISSI et al. | 16 Toward Data Integration | |
Chen et al. | Knowledge Encapsulation and Application Based on Domain Knowledge Graph | |
Geng et al. | A Method for Information Management Based on RDF Model and Ontology Technology | |
Avdeenko et al. | Combination of case-based reasoning and data mining through integration with the domain ontology | |
Mayadewi et al. | Scheme mapping for relational database transformation to ontology: A survey | |
Shao et al. | Ontology-based modeling and semantic query for mobile trajectory data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170419 |
|
WD01 | Invention patent application deemed withdrawn after publication |