CN104834730B - data analysis system and method - Google Patents

data analysis system and method Download PDF

Info

Publication number
CN104834730B
CN104834730B CN201510249589.3A CN201510249589A CN104834730B CN 104834730 B CN104834730 B CN 104834730B CN 201510249589 A CN201510249589 A CN 201510249589A CN 104834730 B CN104834730 B CN 104834730B
Authority
CN
China
Prior art keywords
data
data analysis
grammer
metadata
analysis engine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510249589.3A
Other languages
Chinese (zh)
Other versions
CN104834730A (en
Inventor
孙明
苏建倬
朱晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201510249589.3A priority Critical patent/CN104834730B/en
Publication of CN104834730A publication Critical patent/CN104834730A/en
Application granted granted Critical
Publication of CN104834730B publication Critical patent/CN104834730B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • G06F16/24522Translation of natural language queries to structured queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of data analysis system, including:Scheduler, for distributing the mission bit stream described with data base query language;Data analysis engine for mission bit stream to be converted to distributed grammer, is indexed with being established to data;Data warehouse, for storing the data with established index;Analytical database, the data with established index synchronous with data warehouse;Enquiry module, for receiving metadata associated at least part of inquiry for the data;And configuration module, for converting the metadata into the first grammer that data analysis engine can identify.The data analysis engine is configured as that the second grammer that analytical database can identify will be converted to the metadata of the first syntactic description, states analytical database and is configured as based on performing the inquiry with the metadata of the second syntactic description.

Description

Data analysis system and method
Technical field
The present invention relates to data processing, more particularly, to a kind of data analysis system and method.
Background technology
With the development of information technology, enterprise information system generates substantial amounts of data.How from these mass datas It extracts and analyzes business decision the important problem that useful information becomes business decision administrative staff and faced.How in enterprise's number According to visualization is solved the problems, such as on the basis of warehouse, flexibly analysis inquiry is following.
Traditionally, all data analysis requirements must all submit to data department, and data department is by performing hadoop's Map/reduce programs, soon then 1 it is small when, then several days slowly, can just present the result to business department.Business department is obtaining data After need to carry out data analysis by means of office software or other third party softwares, ultimately form analysis result.With demand Constantly variation, often business department's needs repeatedly so repeatedly such work, timeliness is very poor, it is difficult to meet business needs.
This traditional data analysis scheme long time period and uncontrollable, shortage efficient system management.For need Variation is asked, this scheme there need to be the longer response time.In addition, lacking visual data analysis system, user experience is bad.
Therefore, it is necessary to a kind of improved data analysis schemes.
The content of the invention
The object of the present invention is to provide a kind of data analysis system and method, can Enterprise Data framework (such as Hadoop on the basis of), the flexible Enterprise Data analysis that pulls, can drill through of user's efficient (for example, second grade) is provided Scheme.
According to the first aspect of the invention, a kind of data analysis system is provided, including:Scheduler, for distributing with number The mission bit stream described according to database query language;Data analysis engine, for mission bit stream to be converted to distributed grammer, with logarithm It is indexed according to establishing;Data warehouse, for storing the data with established index;Analytical database, it is synchronous with data warehouse Data with the index established;Enquiry module is related at least part of inquiry for the data for receiving The metadata of connection;And configuration module, for converting the metadata into the first grammer that data analysis engine can identify.It is described Data analysis engine is configured as that the second language that analytical database can identify will be converted to the metadata of the first syntactic description Method, the analytical database are configured as based on performing the inquiry with the metadata of the second syntactic description.
In one embodiment, the data base query language is HQL query languages.
In one embodiment, the distributed grammer is based on Map/Reduce models, and the index is lucence indexes.
In one embodiment, first grammer is based on HQL query languages, and second grammer is based on solr application services Device.
In one embodiment, the enquiry module includes user interface, for receiving metadata from user.
In one embodiment, the data analysis engine is additionally configured to receive query result from analytical database, and will Query result is sent to configuration module, and the configuration module is additionally configured to query result being sent to enquiry module, Yi Jisuo Enquiry module is stated to be additionally configured to that query result is presented to user.
According to the second aspect of the invention, a kind of data analysing method is provided, including:To data analysis engine distribute with The mission bit stream of data base query language description;Mission bit stream is converted to distributed grammer by data analysis engine, with to data Establish index;Data with the index established are stored in data warehouse;Will stored in data warehouse have built The data of vertical index are synchronized to analytical database;Receive member associated at least part of inquiry for the data Data;Convert the metadata into the first grammer that data analysis engine can identify;Data analysis engine will be retouched with the first grammer The metadata stated is converted to the second grammer that analytical database can identify;And analytical database is based on the second syntactic description Metadata perform the inquiry.
The embodiment of above-mentioned first aspect is also applied for second aspect.
According to an embodiment of the invention, it is efficient that user can be provided on the basis of Enterprise Data framework The Enterprise Data analytical plan that can flexibly pull, can be drilled through.
Description of the drawings
By illustrating the preferred embodiment of the present invention below in conjunction with the accompanying drawings, above and other purpose, the spy of the present invention will be made Advantage of seeking peace is clearer, wherein:
Fig. 1 is the block diagram of data analysis system according to embodiments of the present invention;
Fig. 2 is the flow chart of data analysing method according to embodiments of the present invention.
Specific embodiment
Below with reference to the drawings and specific embodiments, the embodiment of the present invention is described in detail.It should be noted that this hair It is bright to should not be limited to specific embodiments described below.In addition, for simplicity being omitted pair and not closed directly with the present invention The detailed description of the known technology of connection, to prevent the understanding of the present invention from causing to obscure.
Fig. 1 is the block diagram of data analysis system 100 according to embodiments of the present invention.As shown in the figure, data analysis system 100 Including scheduler 110, data analysis engine 120, data warehouse 130, analytical database 140, enquiry module 150 and configuration module 160。
The mission bit stream that scheduler 110 is described for distribution with data base query language (such as HQL query languages).One In a example, scheduler 110 is based on Hadoop, can monitor task in real time and carry out the scheduling of task.If duty cycle and Time meets predetermined schedulable condition, and the mission bit stream described with HQL query languages is issued data analysis engine by scheduler 110 120.Mission bit stream can indicate to extract specific data, and the scope of extraction can be described by HQL, reduce user Use threshold.
Data analysis engine 120 is used to mission bit stream being converted to distributed grammer, is indexed with being established to data.Specifically Ground, distributed grammer can be based on Map/Reduce models here, and index can be lucence indexes.In one example, number The data store optimization based on Hadoop, extraction and external service are managed according to analysis engine 120, and is used as 130 He of data warehouse The external interface of analytical database 140.
Data warehouse 130 has the data of established index for storing.In one example, data warehouse 130 can To be Data Warehouse for Enterprises, including enterprise's initial data storage, and the data with established index are stored.
The data with established index synchronous with data warehouse 130 of analytical database 140.In one example, divide Analysis database 140 is responsible for the scheduling and service of data.
Enquiry module 150 is used to receive related at least part of inquiry for the data with established index The metadata of connection.Enquiry module 150 includes user interface, for receiving metadata from user.Here, metadata can include with Relevant information is inquired about, such as the entry of inquiry, and can be to be described with form from user interface to adaptation or language. In one example, on-line analytical processing (OLAP) system of enquiry module 150 based on mass data, provides What You See Is What You Get Data analysis interface supports user flexibility the functions such as pull, drill through, and supports end user's progress dynamic multidimensional analysis, wherein Including across dimension, different levels span member calculating.Not only common OLAP demands had been met, but also based on data analysis engine 120 solve the performance issue of mass data, and corresponding hundred million rank data reach second grade response speed.
Configuration module 160 is used to convert the metadata into the first grammer that data analysis engine 120 can identify.Here, First grammer can be based on HQL query languages.
Then, data analysis engine 120 will be converted to analytical database 140 with the metadata of the first syntactic description and can know Other second grammer.Here, the second grammer can be based on solr application servers.Then, analytical database 140 is based on second The metadata of syntactic description performs inquiry.
For example, data analysis engine 120 can realize Distributed Calculation.Specifically, data analysis engine 120 is by metadata It is converted into solr requests.Data are dispersed in each node of solr.Data analysis engine 120 will distribute solr nodes, Each node has shared a part of calculating task, and most the checkout result of each node summarizes at last.
After analytical database 140 performs inquiry, data analysis engine 120 receives inquiry knot from analytical database 120 Fruit, and query result is sent to configuration module 160.Query result is sent to enquiry module 150 by configuration module 160.One In a example, for different business scenarios, configuration module 160 combines the data service that data analysis engine 120 externally provides Initialize installation is carried out to the content to be presented, including permission, style information etc..Finally, enquiry module 150 is presented to user and looked into Ask result.
Data analysis system 100 according to embodiments of the present invention, which solves conventional data analysis instrument, can not effectively support sea The problem of measuring data, shielding whole technical details allows the simple and convenient application data of data analyst.In addition, data Analysis system 100 automatically forms scheduler task, automatically updating data, Automatic Optimal inquiry velocity, enabling easily to sea Magnitude data are analyzed.The big data analysis system based on hadoop is realized, makes enterprise data integration simpler.
It is corresponding with above-mentioned data analysis system 100, additionally provide a kind of data analysing method 200.Method 200 can be by Above-mentioned data analysis system 100 performs, and comprises the following steps.
In step S210, the mission bit stream described with data base query language is distributed to data analysis engine.Here, data Database query language can be HQL query languages.
In step S220, mission bit stream is converted to distributed grammer by data analysis engine, is indexed with being established to data.This In, distributed grammer can be based on Map/Reduce models, and index can be lucence indexes.
In step S230, the data with the index established are stored in data warehouse.
In step S240, the data with established index stored in data warehouse are synchronized to analytical database.
In step S250, metadata associated at least part of inquiry for the data is received.Here, it is first Data can be received via user interface from user.
In step S260, the first grammer that data analysis engine can identify is converted the metadata into.Here, the first grammer It can be based on HQL query languages.
In step S270, data analysis engine will be converted to analytical database with the metadata of the first syntactic description to be known Other second grammer.Here, the second grammer can be based on solr application servers.
In step S280, analytical database with the metadata of the second syntactic description based on performing the inquiry.
Method 200 can also include:Data analysis engine receives query result from analytical database, to be presented to user Query result.
Although having been combined the preferred embodiment of the present invention above shows the present invention, those skilled in the art will It will be appreciated that without departing from the spirit and scope of the present invention, the present invention can be carry out various modifications, replaced and changed Become.Therefore, the present invention should not be limited by above-described embodiment, and should be limited by appended claims and its equivalent.

Claims (12)

1. a kind of data analysis system, including:
Scheduler, for distributing the mission bit stream described with data base query language;
Data analysis engine for mission bit stream to be converted to distributed grammer, is indexed with being established to data;
Data warehouse, for storing the data with established index;
Analytical database, the data with established index synchronous with data warehouse;
Enquiry module, for receiving metadata associated at least part of inquiry for the data;And
Configuration module, for converting the metadata into the first grammer that data analysis engine can identify,
Wherein, be configured as that analytical database will be converted to the metadata of the first syntactic description can for the data analysis engine Second grammer of identification,
The analytical database is configured as based on performing the inquiry with the metadata of the second syntactic description.
2. data analysis system according to claim 1, wherein, the data base query language is HQL query languages.
3. data analysis system according to claim 1, wherein, the distribution grammer is based on Map/Reduce models, The index is lucence indexes.
4. data analysis system according to claim 1, wherein, first grammer is based on HQL query languages, and described the Two grammers are based on solr application servers.
5. data analysis system according to claim 1, wherein, the enquiry module includes user interface, for from Family receives metadata.
6. data analysis system according to claim 1, wherein
The data analysis engine is additionally configured to receive query result from analytical database, and query result is sent to configuration Module,
The configuration module be additionally configured to by query result be sent to enquiry module and
The enquiry module is additionally configured to that query result is presented to user.
7. a kind of data analysing method, including:
Distribute the mission bit stream described with data base query language to data analysis engine;
Mission bit stream is converted to distributed grammer by data analysis engine, is indexed with being established to data;
Data with the index established are stored in data warehouse;
The data with established index stored in data warehouse are synchronized to analytical database;
Receive metadata associated at least part of inquiry for the data;
Convert the metadata into the first grammer that data analysis engine can identify;
Data analysis engine will be converted to the second grammer that analytical database can identify with the metadata of the first syntactic description;With And
Analytical database with the metadata of the second syntactic description based on performing the inquiry.
8. data analysing method according to claim 7, wherein, the data base query language is HQL query languages.
9. data analysing method according to claim 7, wherein, the distribution grammer is based on Map/Reduce models, The index is lucence indexes.
10. data analysing method according to claim 7, wherein, first grammer is based on HQL query languages, described Second grammer is based on solr application servers.
11. data analysing method according to claim 7, wherein, the metadata is to be connect via user interface from user It receives.
12. data analysing method according to claim 7, further includes:
Data analysis engine receives query result from analytical database, so that query result is presented to user.
CN201510249589.3A 2015-05-15 2015-05-15 data analysis system and method Active CN104834730B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510249589.3A CN104834730B (en) 2015-05-15 2015-05-15 data analysis system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510249589.3A CN104834730B (en) 2015-05-15 2015-05-15 data analysis system and method

Publications (2)

Publication Number Publication Date
CN104834730A CN104834730A (en) 2015-08-12
CN104834730B true CN104834730B (en) 2018-06-01

Family

ID=53812616

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510249589.3A Active CN104834730B (en) 2015-05-15 2015-05-15 data analysis system and method

Country Status (1)

Country Link
CN (1) CN104834730B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106547807A (en) * 2015-09-23 2017-03-29 财团法人工业技术研究院 Data analysis method and device

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022483B (en) * 2016-05-11 2019-06-14 星环信息科技(上海)有限公司 The method and apparatus converted between machine learning model
CN108427689A (en) * 2017-02-15 2018-08-21 北京国双科技有限公司 Information acquisition method and device
CN107330607A (en) * 2017-06-27 2017-11-07 太仓市华安企业管理有限公司 A kind of enterprise data analysis system
CN109684352B (en) * 2018-12-29 2020-12-01 江苏满运软件科技有限公司 Data analysis system, data analysis method, storage medium, and electronic device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102426609A (en) * 2011-12-28 2012-04-25 厦门市美亚柏科信息股份有限公司 Index generation method and index generation device based on MapReduce programming architecture
CN102682036A (en) * 2011-03-18 2012-09-19 新奥特(北京)视频技术有限公司 Non-editing based method and system for searching media assets
CN104102710A (en) * 2014-07-15 2014-10-15 浪潮(北京)电子信息产业有限公司 Massive data query method
CN104516982A (en) * 2015-01-06 2015-04-15 南通大学 Method and system for extracting Web information based on Nutch

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IN2013MU03472A (en) * 2013-10-31 2015-07-24 Tata Consultancy Services Ltd

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682036A (en) * 2011-03-18 2012-09-19 新奥特(北京)视频技术有限公司 Non-editing based method and system for searching media assets
CN102426609A (en) * 2011-12-28 2012-04-25 厦门市美亚柏科信息股份有限公司 Index generation method and index generation device based on MapReduce programming architecture
CN104102710A (en) * 2014-07-15 2014-10-15 浪潮(北京)电子信息产业有限公司 Massive data query method
CN104516982A (en) * 2015-01-06 2015-04-15 南通大学 Method and system for extracting Web information based on Nutch

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106547807A (en) * 2015-09-23 2017-03-29 财团法人工业技术研究院 Data analysis method and device

Also Published As

Publication number Publication date
CN104834730A (en) 2015-08-12

Similar Documents

Publication Publication Date Title
US10268956B2 (en) Method and system for applying probabilistic topic models to content in a tax environment to improve user satisfaction with a question and answer customer support system
Park et al. Web-based collaborative big data analytics on big data as a service platform
CN104834730B (en) data analysis system and method
CN101923557B (en) Data analysis system and method
US9292577B2 (en) User accessibility to data analytics
CN109446274B (en) Method and device for managing BI metadata of big data platform
US20130166563A1 (en) Integration of Text Analysis and Search Functionality
US11263208B2 (en) Context-sensitive cross-lingual searches
US9910858B2 (en) System and method for providing contextual analytics data
US20170011012A1 (en) Automatic verification of graphic rendition of json data
US9177554B2 (en) Time-based sentiment analysis for product and service features
CN113742496B (en) Electric power knowledge learning system and method based on heterogeneous resource fusion
CN118093801A (en) Information interaction method and device based on large language model and electronic equipment
Sinnott et al. A scalable cloud-based system for data-intensive spatial analysis
KR20210060901A (en) System and method for managing of employee
US20210365453A1 (en) Data investigation and visualization system
US11494611B2 (en) Metadata-based scientific data characterization driven by a knowledge database at scale
Manjunath et al. A Big Data MapReduce Hadoop distribution architecture for processing input splits to solve the small data problem
Bailo et al. Interoperability oriented architecture: the approach of EPOS for solid Earth e-infrastructures
CN106021624B (en) A kind of ETL model generating method and device
Rohloff et al. A ubiquitous learning analytics architecture for a service-oriented mooc platform
Mäs et al. Linking the Outcomes of Scientific Research: Requirements from the Perspective of Geosciences.
CN108132940A (en) A kind of application data extracting method and device
Ozcan et al. Centrality and scalability analysis on distributed graph of large-scale e-mail dataset for digital forensics
CN112418260A (en) Model training method, information prompting method, device, equipment and medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant