CN104834730A - Data analysis system and method - Google Patents

Data analysis system and method Download PDF

Info

Publication number
CN104834730A
CN104834730A CN201510249589.3A CN201510249589A CN104834730A CN 104834730 A CN104834730 A CN 104834730A CN 201510249589 A CN201510249589 A CN 201510249589A CN 104834730 A CN104834730 A CN 104834730A
Authority
CN
China
Prior art keywords
data
data analysis
grammer
metadata
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510249589.3A
Other languages
Chinese (zh)
Other versions
CN104834730B (en
Inventor
孙明
苏建倬
朱晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201510249589.3A priority Critical patent/CN104834730B/en
Publication of CN104834730A publication Critical patent/CN104834730A/en
Application granted granted Critical
Publication of CN104834730B publication Critical patent/CN104834730B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • G06F16/24522Translation of natural language queries to structured queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data analysis system and method. The system comprises a scheduler, a data analysis engine, a data warehouse, an analysis database, a query module and a configuration module. The scheduler is used for allocating task information described through a database query language. The data analysis engine is used for converting the task information into distributed grammar so that indexes can be established for data; the data warehouse is used for storing the data with the established indexes; the analysis database and the data warehouse synchronously have the data with the established indexes; the query module is used for receiving metadata associated with at least part of query for the data; the configuration module is used for converting the metadata into first grammar capable of being recognized by the data analysis engine. The data analysis engine is configured to convert the metadata described through the first grammar into second grammar capable of being recognized by the analysis database, and the analysis database is configured to execute the query on the basis of the metadata described through the second grammar.

Description

Data analysis system and method
Technical field
The present invention relates to data processing, more specifically, relate to a kind of data analysis system and method.
Background technology
Along with the development of infotech, enterprise information system creates a large amount of data.How to extract from these mass datas and useful information is analyzed to business decision become the important problem that business decision managerial personnel face.The problem how solving visual flexible analysis and consult on the basis of Data Warehouse for Enterprises is following.
Traditionally, all data analysis requirements all must submit to data department, and result, by performing the map/reduce program of hadoop, then 1 hour soon, then several days slowly, could be supplied to business department by data portion door.Business department needs to carry out data analysis by means of office software or other third party softwares after acquisition data, finally forms analysis result.Along with the continuous change of demand, often business department needs repeatedly repeatedly so like this work, ageing very poor, is difficult to meet service needed.
This traditional data analysis scheme time cycle is long and uncontrollable, lacks efficient systemization management.For changes in demand, this scheme need have the longer response time.In addition, lack visual data analysis system, Consumer's Experience is not good.
Therefore, a kind of data analysis scheme of improvement is needed.
Summary of the invention
The object of this invention is to provide a kind of data analysis system and method, can on the basis of Enterprise Data framework (such as hadoop), the Enterprise Data analytical plan that can pull flexibly, can drill through of user's efficiently (such as, second level) is provided.
According to a first aspect of the invention, provide a kind of data analysis system, comprising: scheduler, for distributing the mission bit stream described with data base query language; Data analysis engine, for mission bit stream is converted to distributed grammer, to set up index to data; Data warehouse, for storing the data with set up index; Analytical database, synchronously has the data of set up index with data warehouse; Enquiry module, for receiving the metadata be associated with the inquiry at least partially for described data; And configuration module, for being the first grammer that data analysis engine can identify by metadata conversion.Described data analysis engine is configured to second grammer that can will identify for analytical database with the metadata conversion of the first syntactic description, and described analytical database is configured to perform described inquiry based on the metadata of the second syntactic description.
In one embodiment, described data base query language is HQL query language.
In one embodiment, described distributed grammer is based on Map/Reduce model, and described index is lucence index.
In one embodiment, described first grammer is based on HQL query language, and described second grammer is based on solr application server.
In one embodiment, described enquiry module comprises user interface, for receiving metadata from user.
In one embodiment, described data analysis engine is also configured to receive Query Result from analytical database, and Query Result is sent to configuration module, described configuration module is also configured to Query Result to send to enquiry module, and described enquiry module is also configured to present Query Result to user.
According to a second aspect of the invention, provide a kind of data analysing method, comprising: the mission bit stream described with data base query language to data analysis engine distribution; Mission bit stream is converted to distributed grammer by data analysis engine, to set up index to data; The data with set up index are stored in data warehouse; By the data syn-chronization with set up index that stores in data warehouse to analytical database; Receive the metadata be associated with the inquiry at least partially for described data; It is the first grammer that data analysis engine can identify by metadata conversion; The second grammer that data analysis engine can will identify for analytical database with the metadata conversion of the first syntactic description; And analytical database performs described inquiry based on the metadata of the second syntactic description.
The embodiment of above-mentioned first aspect is also applicable to second aspect.
According to embodiments of the invention, on the basis of Enterprise Data framework, the Enterprise Data analytical plan that can pull flexibly, can drill through that user is efficient can be provided.
Accompanying drawing explanation
By the preferred embodiments of the present invention being described below in conjunction with accompanying drawing, above-mentioned and other objects, features and advantages of the present invention will be made clearly, wherein:
Fig. 1 is the block diagram of the data analysis system according to the embodiment of the present invention;
Fig. 2 is the process flow diagram of the data analysing method according to the embodiment of the present invention.
Embodiment
Below with reference to the drawings and specific embodiments, embodiments of the invention are described in detail.It should be noted that the present invention should not be limited to specific embodiment hereinafter described.In addition, in order to for simplicity, eliminate the detailed description to the known technology not having direct correlation with the present invention, cause to prevent the understanding of the present invention and obscure.
Fig. 1 is the block diagram of the data analysis system 100 according to the embodiment of the present invention.As shown in the figure, data analysis system 100 comprises scheduler 110, data analysis engine 120, data warehouse 130, analytical database 140, enquiry module 150 and configuration module 160.
Scheduler 110 is for distributing the mission bit stream described with data base query language (such as HQL query language).In one example, scheduler 110, can monitor task carry out the scheduling of task in real time based on Hadoop.If duty cycle and time meet predetermined schedulable condition, the mission bit stream described with HQL query language is issued data analysis engine 120 by scheduler 110.Mission bit stream can indicate and extract particular data, and the scope of extraction can be described by HQL, reduces user and uses threshold.
Data analysis engine 120 for mission bit stream is converted to distributed grammer, to set up index to data.Particularly, distributed grammer can based on Map/Reduce model here, and index can be lucence index.In one example, data analysis engine 120 manages based on the data store optimization of Hadoop, extraction and external service, and as data warehouse 130 and the external interface of analytical database 140.
Data warehouse 130 has the data of set up index for storing.In one example, data warehouse 130 can be Data Warehouse for Enterprises, comprises enterprise's raw data and stores, and store the data with set up index.
Analytical database 140 and data warehouse 130 synchronously have the data of set up index.In one example, analytical database 140 is responsible for scheduling and the service of data.
Enquiry module 150 is for receiving the metadata be associated with the inquiry at least partially for the data with set up index.Enquiry module 150 comprises user interface, for receiving metadata from user.Here, metadata can comprise information associated with the query, as the entry of inquiry, and can be to describe to the form adapted to or language with user interface.In one example, enquiry module 150 is based on on-line analytical processing (OLAP) system of mass data, the data analysis interface of What You See Is What You Get is provided, support the functions such as user flexibility pulls, drill through, and support that final user carries out dynamic multidimensional analysis, comprising across dimension, in the calculating of the span member of different levels.Both met conventional OLAP demand, and solved again the performance issue of mass data based on data analysis engine 120, corresponding hundred million rank data reach level response speed second.
Configuration module 160 is for being the first grammer that data analysis engine 120 can identify by metadata conversion.Here, the first grammer can based on HQL query language.
Then, data analysis engine 120 second grammer that can will identify for analytical database 140 with the metadata conversion of the first syntactic description.Here, the second grammer can based on solr application server.Then, analytical database 140 performs inquiry based on the metadata of the second syntactic description.
Such as, data analysis engine 120 can realize Distributed Calculation.Particularly, metadata is converted into solr request by data analysis engine 120.Data are dispersed in each node of solr.Data analysis engine 120 will distribute solr node, and each node has all shared a part of calculation task, and the checkout result of each node gathers the most at last.
After analytical database 140 performs inquiry, data analysis engine 120 receives Query Result from analytical database 120, and Query Result is sent to configuration module 160.Query Result is sent to enquiry module 150 by configuration module 160.In one example, for different business scenarios, the data, services that configuration module 160 externally provides in conjunction with data analysis engine 120 carries out Initialize installation to the content that will present, and comprises authority, style information etc.Finally, enquiry module 150 presents Query Result to user.
Solve according to the data analysis system 100 of the embodiment of the present invention problem that conventional data analysis instrument effectively cannot support mass data, shield whole ins and outs and make the application data that data analyst can be simple and convenient.In addition, data analysis system 100 forms scheduler task automatically, automatically updating data, Automatic Optimal inquiry velocity, makes it possible to easily analyze magnanimity DBMS.Achieve the large data analysis system based on hadoop, make enterprise data integration simpler.
Corresponding with above-mentioned data analysis system 100, additionally provide a kind of data analysing method 200.Method 200 can be performed by above-mentioned data analysis system 100, comprises the following steps.
In step S210, to the mission bit stream that data analysis engine distribution describes with data base query language.Here, data base query language can be HQL query language.
In step S220, mission bit stream is converted to distributed grammer by data analysis engine, to set up index to data.Here, distributed grammer can based on Map/Reduce model, and index can be lucence index.
In step S230, the data with set up index are stored in data warehouse.
In step S240, by the data syn-chronization with set up index that stores in data warehouse to analytical database.
In step S250, receive the metadata be associated with the inquiry at least partially for described data.Here, metadata can receive from user via user interface.
In step S260, be the first grammer that data analysis engine can identify by metadata conversion.Here, the first grammer can based on HQL query language.
In step S270, the second grammer that data analysis engine can will identify for analytical database with the metadata conversion of the first syntactic description.Here, the second grammer can based on solr application server.
In step S280, analytical database performs described inquiry based on the metadata of the second syntactic description.
Method 200 can also comprise: data analysis engine receives Query Result from analytical database, to present Query Result to user.
Although below show the present invention in conjunction with the preferred embodiments of the present invention, one skilled in the art will appreciate that without departing from the spirit and scope of the present invention, various amendment, replacement and change can be carried out to the present invention.Therefore, the present invention should not limited by above-described embodiment, and should be limited by claims and equivalent thereof.

Claims (12)

1. a data analysis system, comprising:
Scheduler, for distributing the mission bit stream described with data base query language;
Data analysis engine, for mission bit stream is converted to distributed grammer, to set up index to data;
Data warehouse, for storing the data with set up index;
Analytical database, synchronously has the data of set up index with data warehouse;
Enquiry module, for receiving the metadata be associated with the inquiry at least partially for described data; And
Configuration module, for being the first grammer that data analysis engine can identify by metadata conversion,
Wherein, described data analysis engine is configured to second grammer that can will identify for analytical database with the metadata conversion of the first syntactic description,
Described analytical database is configured to perform described inquiry based on the metadata of the second syntactic description.
2. data analysis system according to claim 1, wherein, described data base query language is HQL query language.
3. data analysis system according to claim 1, wherein, described distributed grammer is based on Map/Reduce model, and described index is lucence index.
4. data analysis system according to claim 1, wherein, described first grammer is based on HQL query language, and described second grammer is based on solr application server.
5. data analysis system according to claim 1, wherein, described enquiry module comprises user interface, for receiving metadata from user.
6. data analysis system according to claim 1, wherein
Described data analysis engine is also configured to receive Query Result from analytical database, and Query Result is sent to configuration module,
Described configuration module is also configured to Query Result to send to enquiry module, and
Described enquiry module is also configured to present Query Result to user.
7. a data analysing method, comprising:
To the mission bit stream that data analysis engine distribution describes with data base query language;
Mission bit stream is converted to distributed grammer by data analysis engine, to set up index to data;
The data with set up index are stored in data warehouse;
By the data syn-chronization with set up index that stores in data warehouse to analytical database;
Receive the metadata be associated with the inquiry at least partially for described data;
It is the first grammer that data analysis engine can identify by metadata conversion;
The second grammer that data analysis engine can will identify for analytical database with the metadata conversion of the first syntactic description; And
Analytical database performs described inquiry based on the metadata of the second syntactic description.
8. data analysing method according to claim 7, wherein, described data base query language is HQL query language.
9. data analysing method according to claim 7, wherein, described distributed grammer is based on Map/Reduce model, and described index is lucence index.
10. data analysing method according to claim 7, wherein, described first grammer is based on HQL query language, and described second grammer is based on solr application server.
11. data analysing methods according to claim 7, wherein, described metadata receives from user via user interface.
12. data analysing methods according to claim 7, also comprise:
Data analysis engine receives Query Result from analytical database, to present Query Result to user.
CN201510249589.3A 2015-05-15 2015-05-15 data analysis system and method Active CN104834730B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510249589.3A CN104834730B (en) 2015-05-15 2015-05-15 data analysis system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510249589.3A CN104834730B (en) 2015-05-15 2015-05-15 data analysis system and method

Publications (2)

Publication Number Publication Date
CN104834730A true CN104834730A (en) 2015-08-12
CN104834730B CN104834730B (en) 2018-06-01

Family

ID=53812616

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510249589.3A Active CN104834730B (en) 2015-05-15 2015-05-15 data analysis system and method

Country Status (1)

Country Link
CN (1) CN104834730B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022483A (en) * 2016-05-11 2016-10-12 星环信息科技(上海)有限公司 Method and equipment for conversion between machine learning models
CN106547807A (en) * 2015-09-23 2017-03-29 财团法人工业技术研究院 Data analysis method and device
CN107330607A (en) * 2017-06-27 2017-11-07 太仓市华安企业管理有限公司 A kind of enterprise data analysis system
CN108427689A (en) * 2017-02-15 2018-08-21 北京国双科技有限公司 Information acquisition method and device
CN109684352A (en) * 2018-12-29 2019-04-26 江苏满运软件科技有限公司 Data analysis system, method, storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102426609A (en) * 2011-12-28 2012-04-25 厦门市美亚柏科信息股份有限公司 Index generation method and index generation device based on MapReduce programming architecture
CN102682036A (en) * 2011-03-18 2012-09-19 新奥特(北京)视频技术有限公司 Non-editing based method and system for searching media assets
CN104102710A (en) * 2014-07-15 2014-10-15 浪潮(北京)电子信息产业有限公司 Massive data query method
CN104516982A (en) * 2015-01-06 2015-04-15 南通大学 Method and system for extracting Web information based on Nutch
US20150120695A1 (en) * 2013-10-31 2015-04-30 Tata Consultancy Services Limited Indexing of file in a hadoop cluster

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682036A (en) * 2011-03-18 2012-09-19 新奥特(北京)视频技术有限公司 Non-editing based method and system for searching media assets
CN102426609A (en) * 2011-12-28 2012-04-25 厦门市美亚柏科信息股份有限公司 Index generation method and index generation device based on MapReduce programming architecture
US20150120695A1 (en) * 2013-10-31 2015-04-30 Tata Consultancy Services Limited Indexing of file in a hadoop cluster
CN104102710A (en) * 2014-07-15 2014-10-15 浪潮(北京)电子信息产业有限公司 Massive data query method
CN104516982A (en) * 2015-01-06 2015-04-15 南通大学 Method and system for extracting Web information based on Nutch

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106547807A (en) * 2015-09-23 2017-03-29 财团法人工业技术研究院 Data analysis method and device
CN106547807B (en) * 2015-09-23 2021-01-22 财团法人工业技术研究院 Data analysis method and device
US11086881B2 (en) 2015-09-23 2021-08-10 Industrial Technology Research Institute Method and device for analyzing data
CN106022483A (en) * 2016-05-11 2016-10-12 星环信息科技(上海)有限公司 Method and equipment for conversion between machine learning models
CN106022483B (en) * 2016-05-11 2019-06-14 星环信息科技(上海)有限公司 The method and apparatus converted between machine learning model
CN108427689A (en) * 2017-02-15 2018-08-21 北京国双科技有限公司 Information acquisition method and device
CN107330607A (en) * 2017-06-27 2017-11-07 太仓市华安企业管理有限公司 A kind of enterprise data analysis system
CN109684352A (en) * 2018-12-29 2019-04-26 江苏满运软件科技有限公司 Data analysis system, method, storage medium and electronic equipment
CN109684352B (en) * 2018-12-29 2020-12-01 江苏满运软件科技有限公司 Data analysis system, data analysis method, storage medium, and electronic device

Also Published As

Publication number Publication date
CN104834730B (en) 2018-06-01

Similar Documents

Publication Publication Date Title
Yang et al. A system architecture for manufacturing process analysis based on big data and process mining techniques
EP2577507B1 (en) Data mart automation
US20170139952A1 (en) System and method transforming source data into output data in big data environments
CN109522312B (en) Data processing method, device, server and storage medium
CN108073625B (en) System and method for metadata information management
US20160098662A1 (en) Apparatus and Method for Scheduling Distributed Workflow Tasks
CN104834730A (en) Data analysis system and method
CN109656963B (en) Metadata acquisition method, apparatus, device and computer readable storage medium
CN103430144A (en) Data source analytics
Saltz et al. Exploring the process of doing data science via an ethnographic study of a media advertising company
CN106126601A (en) A kind of social security distributed preprocess method of big data and system
CN115335821B (en) Offloading statistics collection
CN105786941B (en) Information mining method and device
Ereshko et al. Digital platforms clustering model
CN105550351B (en) The extemporaneous inquiry system of passenger's run-length data and method
CN111046059B (en) Low-efficiency SQL statement analysis method and system based on distributed database cluster
US20160378830A1 (en) Data processing system and data processing method
CN111159429B (en) Knowledge graph-based data analysis method and device, equipment and storage medium
CN113157978A (en) Data label establishing method and device
CN105630997A (en) Data parallel processing method, device and equipment
CN113220530B (en) Data quality monitoring method and platform
Thaler et al. The IWi process model corpus
CN109446263A (en) A kind of data relationship correlating method and device
CN114707835A (en) Data processing method and device, electronic equipment and computer readable medium
US20120072227A1 (en) Automatically generating high quality soa design from business process maps based on specified quality goals

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant