CN107562946A - A kind of method that concordance list is created in big data system - Google Patents
A kind of method that concordance list is created in big data system Download PDFInfo
- Publication number
- CN107562946A CN107562946A CN201710879944.4A CN201710879944A CN107562946A CN 107562946 A CN107562946 A CN 107562946A CN 201710879944 A CN201710879944 A CN 201710879944A CN 107562946 A CN107562946 A CN 107562946A
- Authority
- CN
- China
- Prior art keywords
- data
- file
- index
- row
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of method that concordance list is created in big data system.The present invention includes(1)Metadata storage based on data dictionary, accelerates calculating speed, data are only just converted into user's readable form when user is returned result to using dictionary encoding;(2)Multidimensional data is assembled:Data are reorganized by multiple dimensions in storage, make data in " more cohesion in hyperspace ";(3)The row of tape index deposit file structure:The index of multiple ranks for multiclass Scenario Design, and incorporated characteristics of some search, there is a multi-dimensional indexing across file, the multi-dimensional indexing in file, the minmax indexes of each column, and the inverted index in arranging;(4)Row group:It is that a kind of row deposit structure on the whole.The present invention is easy to the history from magnanimity, quick obtaining useful information in real time data.
Description
Technical field:
The present invention relates to a kind of method that concordance list is created in big data system, belong to Internet technical field.
Background technology:
With the explosive growth of internet data scale, how from the useful letter of quick obtaining in the history, real time data of magnanimity
Breath, becomes more and more challenging.Search is to obtain one of most efficient approach of information, therefore is also all kinds of websites, application
Basic standard configuration function.Developer wants to realize that function of search is typically all based on some search system of increasing income in the product of oneself
(Such as ElasticSearch, Solr, Sphinx)Build search service.However, except purchase main frame or Entrust Server, from being
System is familiar with, service is built, customizing functions, then is reached the standard grade to service, it usually needs consumes a longer time.
The content of the invention:
The purpose of the present invention is to provide a kind of method that concordance list is created in big data system for above-mentioned problem, is easy to
Quick obtaining useful information in history, real time data from magnanimity.
Above-mentioned purpose is realized by following technical scheme:
A kind of method that concordance list is created in big data system, this method include:
(1)Metadata storage based on data dictionary, accelerates calculating speed, it causes processing/inquiry to draw using dictionary encoding
Hold up directly to be handled in the data encoded and only returning result to user's without change data, data
When be just converted into the readable form of user;
(2)Multidimensional data is assembled:Data are reorganized by multiple dimensions in storage, make data " in hyperspace
More cohesion ", obtains more preferable compression ratio in storage, computationally obtains more preferable data filtering efficiency;
(3)The row of tape index deposit file structure:The index of multiple ranks for multiclass Scenario Design, and incorporated some search
Characteristic, there is a multi-dimensional indexing across file, the multi-dimensional indexing in file, the inverted index in the minmax indexes of each column, and row;
Index and data file store together, and part index inherently data, another part indexes the first number for being stored in file
According in structure;
(4)Row group:That a kind of row deposit structure on the whole, user can using it is some not frequently as filter condition but need as tying
The field that fruit collection returns stores as row group, understands after encoded and stores these fields to be lifted using the capable mode that deposit
Query performance.
Beneficial effect:
The present invention is easy to the history from magnanimity, quick obtaining useful information in real time data.
Embodiment:
Embodiment 1:
The method that concordance list is created in the big data system of the present embodiment, this method include:
(1)Metadata storage based on data dictionary, focuses in the optimization to data tissue, is finally by data tissue
Lift IO performances and calculate performance, Global Dictionary is encoded to accelerate calculating speed, and it allows processing/query engine direct
Handled in the data encoded without change data.Data are just changed only when user is returned result to
The form readable into user.
(2)Multidimensional data is assembled:Data are reorganized by multiple dimensions in storage, make data in " multidimensional sky
Between on more cohesion ", more preferable compression ratio is obtained in storage, computationally obtains more preferable data filtering efficiency.
(3)The row of tape index deposit file structure:The index of multiple ranks for multiclass Scenario Design, and incorporated some and searched
The characteristic of rope, there is a multi-dimensional indexing across file, the multi-dimensional indexing in file, the row of falling in the minmax indexes of each column, and row
Index etc..Secondly, in order to adapt to HDFS storage characteristics, index and data file store together, and part index is inherently
It is data, another part index is stored in the metadata structure of file, and they can provide the access energy of localization with HDFS
Power.
(4)Row group:It is that a kind of row deposit structure on the whole, but for row is deposited, row deposit structure in reply detailed data
The problem of data convert cost is high is had during inquiry, so in order to lift obvious data query performance, supports the storage side of row group
Formula, user can using it is some not frequently as filter condition but need the field for collecting return as a result to be stored as row group,
Understand after encoded and store these fields to lift query performance using the capable mode that deposit.
Technological means disclosed in the present invention program is not limited only to the technological means disclosed in above-mentioned technological means, in addition to
The technical scheme being made up of above technical characteristic equivalent substitution.The unaccomplished matter of the present invention, belongs to those skilled in the art's
Common knowledge.
Claims (1)
1. the method for concordance list is created in a kind of big data system, it is characterized in that:This method includes:
(1)Metadata storage based on data dictionary, accelerates calculating speed, it causes processing/inquiry to draw using dictionary encoding
Hold up directly to be handled in the data encoded and only returning result to user's without change data, data
When be just converted into the readable form of user;
(2)Multidimensional data is assembled:Data are reorganized by multiple dimensions in storage, make data " in hyperspace
More cohesion ", obtains more preferable compression ratio in storage, computationally obtains more preferable data filtering efficiency;
(3)The row of tape index deposit file structure:The index of multiple ranks for multiclass Scenario Design, and incorporated some search
Characteristic, there is a multi-dimensional indexing across file, the multi-dimensional indexing in file, the inverted index in the minmax indexes of each column, and row;
Index and data file store together, and part index inherently data, another part indexes the first number for being stored in file
According in structure;
(4)Row group:That a kind of row deposit structure on the whole, user can using it is some not frequently as filter condition but need as tying
The field that fruit collection returns stores as row group, understands after encoded and stores these fields to be lifted using the capable mode that deposit
Query performance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710879944.4A CN107562946A (en) | 2017-09-26 | 2017-09-26 | A kind of method that concordance list is created in big data system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710879944.4A CN107562946A (en) | 2017-09-26 | 2017-09-26 | A kind of method that concordance list is created in big data system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107562946A true CN107562946A (en) | 2018-01-09 |
Family
ID=60981744
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710879944.4A Pending CN107562946A (en) | 2017-09-26 | 2017-09-26 | A kind of method that concordance list is created in big data system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107562946A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101866358A (en) * | 2010-06-12 | 2010-10-20 | 中国科学院计算技术研究所 | Multidimensional interval querying method and system thereof |
US8495007B2 (en) * | 2008-08-28 | 2013-07-23 | Red Hat, Inc. | Systems and methods for hierarchical aggregation of multi-dimensional data sources |
CN103218404A (en) * | 2013-03-20 | 2013-07-24 | 华中科技大学 | Multi-dimensional metadata management method and system based on association characteristics |
CN103366015A (en) * | 2013-07-31 | 2013-10-23 | 东南大学 | OLAP (on-line analytical processing) data storage and query method based on Hadoop |
CN104268158A (en) * | 2014-09-03 | 2015-01-07 | 深圳大学 | Structural data distributed index and retrieval method |
CN104715039A (en) * | 2015-03-23 | 2015-06-17 | 星环信息科技(上海)有限公司 | Column-based storage and research method and equipment based on hard disk and internal storage |
-
2017
- 2017-09-26 CN CN201710879944.4A patent/CN107562946A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8495007B2 (en) * | 2008-08-28 | 2013-07-23 | Red Hat, Inc. | Systems and methods for hierarchical aggregation of multi-dimensional data sources |
CN101866358A (en) * | 2010-06-12 | 2010-10-20 | 中国科学院计算技术研究所 | Multidimensional interval querying method and system thereof |
CN103218404A (en) * | 2013-03-20 | 2013-07-24 | 华中科技大学 | Multi-dimensional metadata management method and system based on association characteristics |
CN103366015A (en) * | 2013-07-31 | 2013-10-23 | 东南大学 | OLAP (on-line analytical processing) data storage and query method based on Hadoop |
CN104268158A (en) * | 2014-09-03 | 2015-01-07 | 深圳大学 | Structural data distributed index and retrieval method |
CN104715039A (en) * | 2015-03-23 | 2015-06-17 | 星环信息科技(上海)有限公司 | Column-based storage and research method and equipment based on hard disk and internal storage |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104536959B (en) | A kind of optimization method of Hadoop accessing small high-volume files | |
CN103366015B (en) | A kind of OLAP data based on Hadoop stores and querying method | |
Martínez-Prieto et al. | Exchange and consumption of huge RDF data | |
JP6964384B2 (en) | Methods, programs, and systems for the automatic discovery of relationships between fields in a mixed heterogeneous data source environment. | |
US10013440B1 (en) | Incremental out-of-place updates for index structures | |
CN105117502A (en) | Search method based on big data | |
CN103714096A (en) | Lucene-based inverted index system construction method and device, and Lucene-based inverted index system data processing method and device | |
US20150095341A1 (en) | System and a method for hierarchical data column storage and efficient query processing | |
CN104778182A (en) | Data import method and system based on HBase (Hadoop Database) | |
CN103207864A (en) | Online novel content similarity comparison method | |
CN103678550A (en) | Mass data real-time query method based on dynamic index structure | |
CN108319608A (en) | The method, apparatus and system of access log storage inquiry | |
Sarlis et al. | Datix: A system for scalable network analytics | |
CN113779349A (en) | Data retrieval system, apparatus, electronic device, and readable storage medium | |
CN104765767A (en) | Knowledge storage algorithm for intelligent learning | |
Haque et al. | Distributed RDF triple store using hbase and hive | |
CN110781210A (en) | Data processing platform for multi-dimensional aggregation real-time query of large-scale data | |
Ravindra et al. | Efficient processing of RDF graph pattern matching on MapReduce platforms | |
CN107562946A (en) | A kind of method that concordance list is created in big data system | |
Huang et al. | Pisa: An index for aggregating big time series data | |
Bao et al. | Query optimization of massive social network data based on hbase | |
CN115114293A (en) | Database index creating method, related device, equipment and storage medium | |
CN107844546A (en) | A kind of file system metadata management system and method | |
CN103891244B (en) | A kind of method and device carrying out data storage and search | |
Habbal et al. | BIND: An indexing strategy for big data processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180109 |
|
RJ01 | Rejection of invention patent application after publication |