CN104834730A - Data analysis system and method - Google Patents
Data analysis system and method Download PDFInfo
- Publication number
- CN104834730A CN104834730A CN201510249589.3A CN201510249589A CN104834730A CN 104834730 A CN104834730 A CN 104834730A CN 201510249589 A CN201510249589 A CN 201510249589A CN 104834730 A CN104834730 A CN 104834730A
- Authority
- CN
- China
- Prior art keywords
- data
- data analysis
- grammer
- metadata
- index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2452—Query translation
- G06F16/24522—Translation of natural language queries to structured queries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a data analysis system and method. The system comprises a scheduler, a data analysis engine, a data warehouse, an analysis database, a query module and a configuration module. The scheduler is used for allocating task information described through a database query language. The data analysis engine is used for converting the task information into distributed grammar so that indexes can be established for data; the data warehouse is used for storing the data with the established indexes; the analysis database and the data warehouse synchronously have the data with the established indexes; the query module is used for receiving metadata associated with at least part of query for the data; the configuration module is used for converting the metadata into first grammar capable of being recognized by the data analysis engine. The data analysis engine is configured to convert the metadata described through the first grammar into second grammar capable of being recognized by the analysis database, and the analysis database is configured to execute the query on the basis of the metadata described through the second grammar.
Description
Technical field
The present invention relates to data processing, more specifically, relate to a kind of data analysis system and method.
Background technology
Along with the development of infotech, enterprise information system creates a large amount of data.How to extract from these mass datas and useful information is analyzed to business decision become the important problem that business decision managerial personnel face.The problem how solving visual flexible analysis and consult on the basis of Data Warehouse for Enterprises is following.
Traditionally, all data analysis requirements all must submit to data department, and result, by performing the map/reduce program of hadoop, then 1 hour soon, then several days slowly, could be supplied to business department by data portion door.Business department needs to carry out data analysis by means of office software or other third party softwares after acquisition data, finally forms analysis result.Along with the continuous change of demand, often business department needs repeatedly repeatedly so like this work, ageing very poor, is difficult to meet service needed.
This traditional data analysis scheme time cycle is long and uncontrollable, lacks efficient systemization management.For changes in demand, this scheme need have the longer response time.In addition, lack visual data analysis system, Consumer's Experience is not good.
Therefore, a kind of data analysis scheme of improvement is needed.
Summary of the invention
The object of this invention is to provide a kind of data analysis system and method, can on the basis of Enterprise Data framework (such as hadoop), the Enterprise Data analytical plan that can pull flexibly, can drill through of user's efficiently (such as, second level) is provided.
According to a first aspect of the invention, provide a kind of data analysis system, comprising: scheduler, for distributing the mission bit stream described with data base query language; Data analysis engine, for mission bit stream is converted to distributed grammer, to set up index to data; Data warehouse, for storing the data with set up index; Analytical database, synchronously has the data of set up index with data warehouse; Enquiry module, for receiving the metadata be associated with the inquiry at least partially for described data; And configuration module, for being the first grammer that data analysis engine can identify by metadata conversion.Described data analysis engine is configured to second grammer that can will identify for analytical database with the metadata conversion of the first syntactic description, and described analytical database is configured to perform described inquiry based on the metadata of the second syntactic description.
In one embodiment, described data base query language is HQL query language.
In one embodiment, described distributed grammer is based on Map/Reduce model, and described index is lucence index.
In one embodiment, described first grammer is based on HQL query language, and described second grammer is based on solr application server.
In one embodiment, described enquiry module comprises user interface, for receiving metadata from user.
In one embodiment, described data analysis engine is also configured to receive Query Result from analytical database, and Query Result is sent to configuration module, described configuration module is also configured to Query Result to send to enquiry module, and described enquiry module is also configured to present Query Result to user.
According to a second aspect of the invention, provide a kind of data analysing method, comprising: the mission bit stream described with data base query language to data analysis engine distribution; Mission bit stream is converted to distributed grammer by data analysis engine, to set up index to data; The data with set up index are stored in data warehouse; By the data syn-chronization with set up index that stores in data warehouse to analytical database; Receive the metadata be associated with the inquiry at least partially for described data; It is the first grammer that data analysis engine can identify by metadata conversion; The second grammer that data analysis engine can will identify for analytical database with the metadata conversion of the first syntactic description; And analytical database performs described inquiry based on the metadata of the second syntactic description.
The embodiment of above-mentioned first aspect is also applicable to second aspect.
According to embodiments of the invention, on the basis of Enterprise Data framework, the Enterprise Data analytical plan that can pull flexibly, can drill through that user is efficient can be provided.
Accompanying drawing explanation
By the preferred embodiments of the present invention being described below in conjunction with accompanying drawing, above-mentioned and other objects, features and advantages of the present invention will be made clearly, wherein:
Fig. 1 is the block diagram of the data analysis system according to the embodiment of the present invention;
Fig. 2 is the process flow diagram of the data analysing method according to the embodiment of the present invention.
Embodiment
Below with reference to the drawings and specific embodiments, embodiments of the invention are described in detail.It should be noted that the present invention should not be limited to specific embodiment hereinafter described.In addition, in order to for simplicity, eliminate the detailed description to the known technology not having direct correlation with the present invention, cause to prevent the understanding of the present invention and obscure.
Fig. 1 is the block diagram of the data analysis system 100 according to the embodiment of the present invention.As shown in the figure, data analysis system 100 comprises scheduler 110, data analysis engine 120, data warehouse 130, analytical database 140, enquiry module 150 and configuration module 160.
Scheduler 110 is for distributing the mission bit stream described with data base query language (such as HQL query language).In one example, scheduler 110, can monitor task carry out the scheduling of task in real time based on Hadoop.If duty cycle and time meet predetermined schedulable condition, the mission bit stream described with HQL query language is issued data analysis engine 120 by scheduler 110.Mission bit stream can indicate and extract particular data, and the scope of extraction can be described by HQL, reduces user and uses threshold.
Data analysis engine 120 for mission bit stream is converted to distributed grammer, to set up index to data.Particularly, distributed grammer can based on Map/Reduce model here, and index can be lucence index.In one example, data analysis engine 120 manages based on the data store optimization of Hadoop, extraction and external service, and as data warehouse 130 and the external interface of analytical database 140.
Data warehouse 130 has the data of set up index for storing.In one example, data warehouse 130 can be Data Warehouse for Enterprises, comprises enterprise's raw data and stores, and store the data with set up index.
Analytical database 140 and data warehouse 130 synchronously have the data of set up index.In one example, analytical database 140 is responsible for scheduling and the service of data.
Enquiry module 150 is for receiving the metadata be associated with the inquiry at least partially for the data with set up index.Enquiry module 150 comprises user interface, for receiving metadata from user.Here, metadata can comprise information associated with the query, as the entry of inquiry, and can be to describe to the form adapted to or language with user interface.In one example, enquiry module 150 is based on on-line analytical processing (OLAP) system of mass data, the data analysis interface of What You See Is What You Get is provided, support the functions such as user flexibility pulls, drill through, and support that final user carries out dynamic multidimensional analysis, comprising across dimension, in the calculating of the span member of different levels.Both met conventional OLAP demand, and solved again the performance issue of mass data based on data analysis engine 120, corresponding hundred million rank data reach level response speed second.
Configuration module 160 is for being the first grammer that data analysis engine 120 can identify by metadata conversion.Here, the first grammer can based on HQL query language.
Then, data analysis engine 120 second grammer that can will identify for analytical database 140 with the metadata conversion of the first syntactic description.Here, the second grammer can based on solr application server.Then, analytical database 140 performs inquiry based on the metadata of the second syntactic description.
Such as, data analysis engine 120 can realize Distributed Calculation.Particularly, metadata is converted into solr request by data analysis engine 120.Data are dispersed in each node of solr.Data analysis engine 120 will distribute solr node, and each node has all shared a part of calculation task, and the checkout result of each node gathers the most at last.
After analytical database 140 performs inquiry, data analysis engine 120 receives Query Result from analytical database 120, and Query Result is sent to configuration module 160.Query Result is sent to enquiry module 150 by configuration module 160.In one example, for different business scenarios, the data, services that configuration module 160 externally provides in conjunction with data analysis engine 120 carries out Initialize installation to the content that will present, and comprises authority, style information etc.Finally, enquiry module 150 presents Query Result to user.
Solve according to the data analysis system 100 of the embodiment of the present invention problem that conventional data analysis instrument effectively cannot support mass data, shield whole ins and outs and make the application data that data analyst can be simple and convenient.In addition, data analysis system 100 forms scheduler task automatically, automatically updating data, Automatic Optimal inquiry velocity, makes it possible to easily analyze magnanimity DBMS.Achieve the large data analysis system based on hadoop, make enterprise data integration simpler.
Corresponding with above-mentioned data analysis system 100, additionally provide a kind of data analysing method 200.Method 200 can be performed by above-mentioned data analysis system 100, comprises the following steps.
In step S210, to the mission bit stream that data analysis engine distribution describes with data base query language.Here, data base query language can be HQL query language.
In step S220, mission bit stream is converted to distributed grammer by data analysis engine, to set up index to data.Here, distributed grammer can based on Map/Reduce model, and index can be lucence index.
In step S230, the data with set up index are stored in data warehouse.
In step S240, by the data syn-chronization with set up index that stores in data warehouse to analytical database.
In step S250, receive the metadata be associated with the inquiry at least partially for described data.Here, metadata can receive from user via user interface.
In step S260, be the first grammer that data analysis engine can identify by metadata conversion.Here, the first grammer can based on HQL query language.
In step S270, the second grammer that data analysis engine can will identify for analytical database with the metadata conversion of the first syntactic description.Here, the second grammer can based on solr application server.
In step S280, analytical database performs described inquiry based on the metadata of the second syntactic description.
Method 200 can also comprise: data analysis engine receives Query Result from analytical database, to present Query Result to user.
Although below show the present invention in conjunction with the preferred embodiments of the present invention, one skilled in the art will appreciate that without departing from the spirit and scope of the present invention, various amendment, replacement and change can be carried out to the present invention.Therefore, the present invention should not limited by above-described embodiment, and should be limited by claims and equivalent thereof.
Claims (12)
1. a data analysis system, comprising:
Scheduler, for distributing the mission bit stream described with data base query language;
Data analysis engine, for mission bit stream is converted to distributed grammer, to set up index to data;
Data warehouse, for storing the data with set up index;
Analytical database, synchronously has the data of set up index with data warehouse;
Enquiry module, for receiving the metadata be associated with the inquiry at least partially for described data; And
Configuration module, for being the first grammer that data analysis engine can identify by metadata conversion,
Wherein, described data analysis engine is configured to second grammer that can will identify for analytical database with the metadata conversion of the first syntactic description,
Described analytical database is configured to perform described inquiry based on the metadata of the second syntactic description.
2. data analysis system according to claim 1, wherein, described data base query language is HQL query language.
3. data analysis system according to claim 1, wherein, described distributed grammer is based on Map/Reduce model, and described index is lucence index.
4. data analysis system according to claim 1, wherein, described first grammer is based on HQL query language, and described second grammer is based on solr application server.
5. data analysis system according to claim 1, wherein, described enquiry module comprises user interface, for receiving metadata from user.
6. data analysis system according to claim 1, wherein
Described data analysis engine is also configured to receive Query Result from analytical database, and Query Result is sent to configuration module,
Described configuration module is also configured to Query Result to send to enquiry module, and
Described enquiry module is also configured to present Query Result to user.
7. a data analysing method, comprising:
To the mission bit stream that data analysis engine distribution describes with data base query language;
Mission bit stream is converted to distributed grammer by data analysis engine, to set up index to data;
The data with set up index are stored in data warehouse;
By the data syn-chronization with set up index that stores in data warehouse to analytical database;
Receive the metadata be associated with the inquiry at least partially for described data;
It is the first grammer that data analysis engine can identify by metadata conversion;
The second grammer that data analysis engine can will identify for analytical database with the metadata conversion of the first syntactic description; And
Analytical database performs described inquiry based on the metadata of the second syntactic description.
8. data analysing method according to claim 7, wherein, described data base query language is HQL query language.
9. data analysing method according to claim 7, wherein, described distributed grammer is based on Map/Reduce model, and described index is lucence index.
10. data analysing method according to claim 7, wherein, described first grammer is based on HQL query language, and described second grammer is based on solr application server.
11. data analysing methods according to claim 7, wherein, described metadata receives from user via user interface.
12. data analysing methods according to claim 7, also comprise:
Data analysis engine receives Query Result from analytical database, to present Query Result to user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510249589.3A CN104834730B (en) | 2015-05-15 | 2015-05-15 | data analysis system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510249589.3A CN104834730B (en) | 2015-05-15 | 2015-05-15 | data analysis system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104834730A true CN104834730A (en) | 2015-08-12 |
CN104834730B CN104834730B (en) | 2018-06-01 |
Family
ID=53812616
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510249589.3A Active CN104834730B (en) | 2015-05-15 | 2015-05-15 | data analysis system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104834730B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106022483A (en) * | 2016-05-11 | 2016-10-12 | 星环信息科技(上海)有限公司 | Method and equipment for conversion between machine learning models |
CN106547807A (en) * | 2015-09-23 | 2017-03-29 | 财团法人工业技术研究院 | Data analysis method and device |
CN107330607A (en) * | 2017-06-27 | 2017-11-07 | 太仓市华安企业管理有限公司 | A kind of enterprise data analysis system |
CN108427689A (en) * | 2017-02-15 | 2018-08-21 | 北京国双科技有限公司 | Information acquisition method and device |
CN109684352A (en) * | 2018-12-29 | 2019-04-26 | 江苏满运软件科技有限公司 | Data analysis system, method, storage medium and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102426609A (en) * | 2011-12-28 | 2012-04-25 | 厦门市美亚柏科信息股份有限公司 | Index generation method and index generation device based on MapReduce programming architecture |
CN102682036A (en) * | 2011-03-18 | 2012-09-19 | 新奥特(北京)视频技术有限公司 | Non-editing based method and system for searching media assets |
CN104102710A (en) * | 2014-07-15 | 2014-10-15 | 浪潮(北京)电子信息产业有限公司 | Massive data query method |
CN104516982A (en) * | 2015-01-06 | 2015-04-15 | 南通大学 | Method and system for extracting Web information based on Nutch |
US20150120695A1 (en) * | 2013-10-31 | 2015-04-30 | Tata Consultancy Services Limited | Indexing of file in a hadoop cluster |
-
2015
- 2015-05-15 CN CN201510249589.3A patent/CN104834730B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102682036A (en) * | 2011-03-18 | 2012-09-19 | 新奥特(北京)视频技术有限公司 | Non-editing based method and system for searching media assets |
CN102426609A (en) * | 2011-12-28 | 2012-04-25 | 厦门市美亚柏科信息股份有限公司 | Index generation method and index generation device based on MapReduce programming architecture |
US20150120695A1 (en) * | 2013-10-31 | 2015-04-30 | Tata Consultancy Services Limited | Indexing of file in a hadoop cluster |
CN104102710A (en) * | 2014-07-15 | 2014-10-15 | 浪潮(北京)电子信息产业有限公司 | Massive data query method |
CN104516982A (en) * | 2015-01-06 | 2015-04-15 | 南通大学 | Method and system for extracting Web information based on Nutch |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106547807A (en) * | 2015-09-23 | 2017-03-29 | 财团法人工业技术研究院 | Data analysis method and device |
CN106547807B (en) * | 2015-09-23 | 2021-01-22 | 财团法人工业技术研究院 | Data analysis method and device |
US11086881B2 (en) | 2015-09-23 | 2021-08-10 | Industrial Technology Research Institute | Method and device for analyzing data |
CN106022483A (en) * | 2016-05-11 | 2016-10-12 | 星环信息科技(上海)有限公司 | Method and equipment for conversion between machine learning models |
CN106022483B (en) * | 2016-05-11 | 2019-06-14 | 星环信息科技(上海)有限公司 | The method and apparatus converted between machine learning model |
CN108427689A (en) * | 2017-02-15 | 2018-08-21 | 北京国双科技有限公司 | Information acquisition method and device |
CN107330607A (en) * | 2017-06-27 | 2017-11-07 | 太仓市华安企业管理有限公司 | A kind of enterprise data analysis system |
CN109684352A (en) * | 2018-12-29 | 2019-04-26 | 江苏满运软件科技有限公司 | Data analysis system, method, storage medium and electronic equipment |
CN109684352B (en) * | 2018-12-29 | 2020-12-01 | 江苏满运软件科技有限公司 | Data analysis system, data analysis method, storage medium, and electronic device |
Also Published As
Publication number | Publication date |
---|---|
CN104834730B (en) | 2018-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yang et al. | A system architecture for manufacturing process analysis based on big data and process mining techniques | |
EP2577507B1 (en) | Data mart automation | |
US20170139952A1 (en) | System and method transforming source data into output data in big data environments | |
CN109522312B (en) | Data processing method, device, server and storage medium | |
CN108073625B (en) | System and method for metadata information management | |
US20160098662A1 (en) | Apparatus and Method for Scheduling Distributed Workflow Tasks | |
CN104834730A (en) | Data analysis system and method | |
CN109656963B (en) | Metadata acquisition method, apparatus, device and computer readable storage medium | |
CN103430144A (en) | Data source analytics | |
Saltz et al. | Exploring the process of doing data science via an ethnographic study of a media advertising company | |
CN106126601A (en) | A kind of social security distributed preprocess method of big data and system | |
CN115335821B (en) | Offloading statistics collection | |
CN105786941B (en) | Information mining method and device | |
Ereshko et al. | Digital platforms clustering model | |
CN105550351B (en) | The extemporaneous inquiry system of passenger's run-length data and method | |
CN111046059B (en) | Low-efficiency SQL statement analysis method and system based on distributed database cluster | |
US20160378830A1 (en) | Data processing system and data processing method | |
CN111159429B (en) | Knowledge graph-based data analysis method and device, equipment and storage medium | |
CN113157978A (en) | Data label establishing method and device | |
CN105630997A (en) | Data parallel processing method, device and equipment | |
CN113220530B (en) | Data quality monitoring method and platform | |
Thaler et al. | The IWi process model corpus | |
CN109446263A (en) | A kind of data relationship correlating method and device | |
CN114707835A (en) | Data processing method and device, electronic equipment and computer readable medium | |
US20120072227A1 (en) | Automatically generating high quality soa design from business process maps based on specified quality goals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |