CN109063201A - A kind of impala online interaction formula querying method based on mixing storage scheme - Google Patents

A kind of impala online interaction formula querying method based on mixing storage scheme Download PDF

Info

Publication number
CN109063201A
CN109063201A CN201811058357.XA CN201811058357A CN109063201A CN 109063201 A CN109063201 A CN 109063201A CN 201811058357 A CN201811058357 A CN 201811058357A CN 109063201 A CN109063201 A CN 109063201A
Authority
CN
China
Prior art keywords
hdfs
impala
hbase
data
inquiry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811058357.XA
Other languages
Chinese (zh)
Other versions
CN109063201B (en
Inventor
李开
邹复好
訚实松
刘鹏坤
孙斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Charm Pupil Technology Co Ltd
Original Assignee
Wuhan Charm Pupil Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Charm Pupil Technology Co Ltd filed Critical Wuhan Charm Pupil Technology Co Ltd
Priority to CN201811058357.XA priority Critical patent/CN109063201B/en
Publication of CN109063201A publication Critical patent/CN109063201A/en
Application granted granted Critical
Publication of CN109063201B publication Critical patent/CN109063201B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a kind of impala online interaction formula querying methods based on mixing storage scheme, comprising: establishes hbase table with hadoop order, and builds table on hdfs with impala;External table is established in HIVE to be associated, and external table has been checked whether in impala;If having external table in impala, script is created by same day data and is directed into the hdfs;When user issues inquiry request, hdfs and hbase are inquired respectively, and query result combination is shown to user.Impala online interaction formula querying method provided in an embodiment of the present invention based on mixing storage scheme, the characteristics of making full use of hbase and hdfs carry out mixing storage to the data of increment, improve the speed of impala interactive inquiry.

Description

A kind of impala online interaction formula querying method based on mixing storage scheme
Technical field
The present embodiments relate to big data processing technology fields more particularly to a kind of based on mixing storage scheme Impala online interaction formula querying method.
Background technique
In recent years, with the promotion of computer storage capacity and the development of information technology, data volume exponentially type increases, greatly The trend of data makes scientific technological advance make rapid progress, and big data technology is risen, and subversiveness variation also has occurred in business model.
What big data not only represented is the data of magnanimity, more represents the technology of storage to mass data, processing.Greatly Data are flooded with the from cellar to rafter of human economic society, and how to go to extract valuable information from mass data is one and urgently solves Certainly the problem of.The processing of big data is different with traditional processing mode, and the powerful parallel meter of more machines is mainly utilized in it Calculation ability.By development these years, there are various big data processing platforms in big data field, such as hadoop, spark, Storm, these frames are handled generally directed to certain class big data problem.It is big that the problem of generally big data is handled is divided into three Class: real time data stream process problem, offline batch data handle problem, large-scale data interactive inquiry problem.Impala is A member of hadoop ecosystem, mainly for solving the problems, such as third class: large-scale data interactive inquiry, it can be to storage Formula inquiry is interacted with the data on distributed file system hdfs with similar SQL statement in hadoop database hbase.
The problem of we encounter in practice is that it is next that database has a large amount of message to deposit into daily, and user needs to database All data interact formula inquiry, it is desirable that response as quickly as possible, in the patient time range of user, traditional method be by These data are stored in hbase and are perhaps inquired with impala hbase or hdfs on hdfs.Storage side based on hbase Case, when data volume, which constantly increases, reaches million ranks, query time of the impala on hbase, which dramatically increases, reaches tens Second, it is unable to satisfy demand.And the storage method based on hdfs building, inquiry velocity ratio inquire fast very much, Er Qieke on hbase To use time-based partitioned storage strategy, a file is written into each message, in the selected inquiry period, can be made It is only related with this time to obtain inquiry scale, it will be apparent that shorten query time.But every message is as a file, Namenode can be deposited in memory for each file maintenance metadata, and a large amount of file can consume a large amount of memory of namenode, Serious problems are brought to the scalability and performance of hadoop.
Therefore, the characteristics of how comprehensively utilizing hdfs and hbase is the emphasis studied at present to solve the above problems.
Summary of the invention
To solve the above-mentioned problems, the embodiment of the present invention provides one kind and overcomes the above problem or at least be partially solved State a kind of impala online interaction formula querying method based on mixing storage scheme of problem, comprising:
Hbase table is established with hadoop order, and builds table on hdfs with impala;
External table is established in HIVE to be associated, and external table has been checked whether in impala;
If having external table in impala, script is created by same day data and is directed into the hdfs;
When user issues inquiry request, hdfs and hbase are inquired respectively, and query result combination is shown to User.
Wherein, described to build table on hdfs with impala and include:
It creates the corresponding inquiry field of querying condition and data is subjected to subregion according to daily.
Wherein, described when user issues inquiry request, hdfs and hbase are inquired respectively, and by query result Combination is shown to user, comprising:
Detect querying condition in whether include time conditions, if comprising time conditions and inquiry in need data The table of hdfs is copied to, then only hdfs is inquired;
If comprising time conditions and inquiry in need data be not copied in hdfs if only HBase is looked into It askes;
It is stored in if not including data a part in time conditions or time conditions and being stored in another part in hdfs Conjunctive query then is carried out to hbase and hdfs in hbase.
Wherein, hdfs and hbase are inquired respectively when user issues inquiry request described, and inquiry is tied Fruit combination is shown to after user, the method also includes:
Counting user issues the corresponding data bulk of inquiry request, and combines respectively from single hdfs, list hbase and hdfs Qualified number of data is taken out in hbasee.
Wherein, the corresponding data bulk of inquiry request is issued in the counting user, and respectively from single hdfs, list hbase And hdfs joint hbasee in take out qualified number of data after, the method also includes:
The step-length and initial address of inquiry are set in impala, to provide page turning and formfeed operation.
Wherein, the method also includes:
If not having external table in impala, INVALIDATE METADATA refresh metadata is used.
Wherein, the method also includes:
Script task is timed to the script.
Impala online interaction formula querying method provided in an embodiment of the present invention based on mixing storage scheme, makes full use of The characteristics of hbase and hdfs, carries out mixing storage to the data of increment, improves the speed of impala interactive inquiry.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of impala online interaction formula querying method based on mixing storage scheme provided in an embodiment of the present invention Flow diagram;
Fig. 2 is the implementation steps flow diagram of the mixing storage scheme proposed in the embodiment of the present invention;
Fig. 3 is the flow diagram of the mixing inquiry proposed in the embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical solution in the embodiment of the present invention is explicitly described, it is clear that described embodiment is the present invention A part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not having Every other embodiment obtained under the premise of creative work is made, shall fall within the protection scope of the present invention.
Currently, the storage scheme based on hbase, when data volume, which constantly increases, reaches million ranks, impala is in hbase On query time dramatically increase and reach tens seconds, be unable to satisfy demand.And the storage method based on hdfs building, inquiry velocity It is fast more many than being inquired on hbase, and time-based partitioned storage strategy can be used, a text is written into each message Part may make that inquiry scale is only related with this time, it will be apparent that shorten query time in the selected inquiry period. But every message, as a file, namenode can be deposited in memory for each file maintenance metadata, a large amount of file The a large amount of memory of namenode can be consumed, brings serious problems to the scalability and performance of hadoop.
For above-mentioned problems of the prior art, Fig. 1 is provided in an embodiment of the present invention a kind of based on mixing storage The impala online interaction formula querying method flow diagram of scheme, Fig. 2 are the mixing storage side proposed in the embodiment of the present invention The implementation steps flow diagram of case, Fig. 3 are the flow diagram of the mixing inquiry proposed in the embodiment of the present invention, referring to Fig.1, Shown in Fig. 2 and Fig. 3, the impala online interaction formula querying method packet provided in an embodiment of the present invention based on mixing storage scheme It includes:
Hbase table is established with hadoop order, and builds table on hdfs with impala;
External table is established in HIVE to be associated, and external table has been checked whether in impala;
If having external table in impala, script is created by same day data and is directed into the hdfs;
When user issues inquiry request, hdfs and hbase are inquired respectively, and query result combination is shown to User.
It should be noted that the overall plan thinking of the embodiment of the present invention is using a kind of hbase and hdfs mixing storage Scheme improve impala online interaction formula inquiry response speed data.Storage section is combined using HDFS and HBase deposits Storage, Hbase are only responsible for interim storage and work as day data, previous day data are automatically imported hdfs as one in setting script second day A big file.When we inquire all data, needs to inquire hbase and hdfs respectively, then close query result And together.
Table is established in hbase first, in accordance with the sheet format being pre-designed, and all column are all designed in the same column family Under, it is 25 hours that a TTL, which is arranged, as 90000 to hbase, and the data that TTL represents more than 25 hours can be deleted, because of number According to hdfs is had been introduced into, the data at this moment deleting hbase are feasible, and more stay a hour is in order to by the data on the same day Export is complete.After Hbase establishes table, need to be associated with impala.After table in Impala in mapping Hbase again on hdfs Table is established, equally establishes an Impala table of same field, and subregion is carried out to the date with partitioned by sentence, And table is stored as parquet format using stored as parquet sentence.It is all established in hbase and hdfs It needs to be arranged timing after impala table to copy to the data in hbase in hdfs.The system level tasks in (SuSE) Linux OS Dispatch command is under/etc/crontab file.When we interact formula inquiry, we can use impala pairs Hdfs is inquired, and is then inquired again the data on the day of hbase, and the result of the two inquiry is linked togather, and is exactly inquired As a result, although impala inquires hbase relatively slow, hbase has only deposited the same day newest data, and data volume reduces It is very much, so bulk velocity is quickly.Due to using the daily data of history as a parquet file there are hdfs, without It is each record as a file and improves the performance of hadoop entirety so avoiding hdfs large amount of small documents problem.
In order to make it easy to understand, emphasis noun therein is explained in the embodiment of the present invention.
Hbase: the distributed database towards column being built upon on Hadoop file system.
Hdfs:Hadoop distributed file system, hadoop distributed file system.
Impala:impala is the novel inquiry system of the leading exploitation of Cloudera company, it provides SQL semanteme, can look into Ask the PB grade big data being stored in the hdfs and hbase of Hadoop.
A kind of parquet: column file memory format.
HIVE: the data file of structuring can be mapped as a number by a Tool for Data Warehouse based on Hadoop According to library table, and provide simple sql query function.
Method operation provided in an embodiment of the present invention carries out in ubuntu14.04 system.
So specifically, firstly the need of the table for establishing hbase in scheme provided in an embodiment of the present invention, orders and be Hadoop, it may be assumed that create ' hadoop ', { NAME=> ' info ', TTL=> 90000 }.
And table is built on hdfs with impala, in our application scenarios, there are name, fromip, fromport, The meaning of the fields such as toip, toport, time, ciphertext, plaintext, these fields is as shown in table 1:
Each field meanings of table 1hbase table
Column name Meaning
Name Type
fromip Source ip
fromport Source port
toip Purpose ip
toport Destination port
time Time
ciphertext In plain text
plaintext Ciphertext
Then external table is built in HIVE to be associated, table name info_today, indicate that the table deposited is the number of today According to order is as follows:
Then table info_today has been checked whether in impala, it is daily to create script if having checked table info_today Morning will lead hdfs when day data, this script file is named as copy.sh, and assign script and permission can be performed Sudochmod+x copy.sh, script file content are as follows:
#!/bin/sh
impala-shell–q“insert into table sho.hadoop_hdfspartition(year,month, day)
selectname,fromip,fromport,toip,toport,length,time,
ciphertext,plaintext,id,year(time),month(time),day(time)
fromsho.info_today
where time<to_date(now())and time>to_date(adddate(now(),-1))”&&
impala-shell–q“compute stats sho.hadoop_hdfs”。
If not having external table in impala, INVALIDATE METADATA refresh metadata is used.
Thereby realize mixing storage scheme.Although incorporating hbase for newest data, when second When its morning, these data will be imported into hdfs, that is to say, that hdfs storage is all data before today, daily Data as a file, hbase storage is newest data today.
It is stored using mixing, in order to carry out mixing inquiry and improve inquiry velocity.In our application scenarios, use Family needs to be combined inquiry to the types range of choice such as fromip, toip, time, name.So when user inquires, Hdfs and hbase can inquired respectively from the background, then result is combined and is shown to user.
Impala online interaction formula querying method provided in an embodiment of the present invention based on mixing storage scheme, makes full use of The characteristics of hbase and hdfs, carries out mixing storage to the data of increment, improves the speed of impala interactive inquiry.
On the basis of the above embodiments, described to build table on hdfs with impala and include:
It creates the corresponding inquiry field of querying condition and data is subjected to subregion according to daily.
The corresponding inquiry field of querying condition has been set a new column by the embodiment of the present invention as can be seen from Table 1 Race, referred to as cf.It is optional condition by the setting of some of which field, then can be carried out so in inquiry below Arbitrary query composition.
It should be noted that the embodiment of the present invention carries out subregion, concrete operations are as follows: building table statement according to daily to data " partitioned by " is added below, and storage format is appointed as parquet compressed format, is added behind sentence " as parquet " is ordered as follows:
create table sho.hadoop_hdfs(
name string,
fromip string,
fromportint,
toip string,
toportint,
lengthint,
time timestamp,
ciphertext string,
plaintext string,
id string
)
partitionedby(year int,monthint,dayint)stored as parquet;
So that occupied space is smaller when impala is inquired, network transmission is faster.
On the basis of the above embodiments, described when user issues inquiry request, hdfs and hbase are looked into respectively It askes, and query result combination is shown to user, comprising:
Detect querying condition in whether include time conditions, if comprising time conditions and inquiry in need data The table of hdfs is copied to, then only hdfs is inquired;
If comprising time conditions and inquiry in need data be not copied in hdfs if only HBase is looked into It askes;
It is stored in if not including data a part in time conditions or time conditions and being stored in another part in hdfs Conjunctive query then is carried out to hbase and hdfs in hbase.
Detection mode used in the embodiment of the present invention is first to detect to the time, is detected whether in querying condition sometimes Between, setting one by one then is carried out to other undefined term conditions again and is inquired.Construct a structural body query_info store from Each querying condition that the page obtains, when having time in querying condition, if query context is before today inquiry in need Data have copied in the table of hdfs, need to only inquire at this time hdfs;If the data of only today of inquiry, this All data inquired that query statement need all are stored in hbase and do not include the data on hdfs, need to only look at this time Ask Hbase;Otherwise with regard to needing respectively to inquire hbase and hdfs.
On the basis of the above embodiments, hdfs and hbase is carried out respectively when user issues inquiry request described Inquiry, and by query result combination be shown to user after, the method also includes:
Counting user issues the corresponding data bulk of inquiry request, and combines respectively from single hdfs, list hbase and hdfs Qualified number of data is taken out in hbasee.
It is understood that the embodiment of the present invention in query process, also provides statistical function, that is, counts the querying condition Under how many data shared, respectively from single hdfs, single hbase and hdfs joint hbase in take out qualified data strip Number.
To be inquired parallel when hereinbefore being inquired, inquiry velocity is promoted.
On the basis of the above embodiments, the corresponding data bulk of inquiry request is issued in the counting user, and respectively After taking out qualified number of data in single hdfs, list hbase and hdfs joint hbasee, the method also includes:
The step-length and initial address of inquiry are set in impala, to provide page turning and formfeed operation.
Likewise, the embodiment of the present invention also provides the page turning and formfeed that just can be carried out inquiry after obtaining total inquiry item number The function of operation.Specifically, taking out a part of data from database every time, page turning and formfeed each time is once new look into It askes, the step-length and initial address of inquiry can be set by limit in impala and offset, and pass through the parameter of inquiry It is transmitted, the result that the data query of a step-length comes out is shown as one page.When user jumps to nth page, at this time Offset is equal to limit*n, indicates the limit data after display limit*n.
On the basis of the above embodiments, the method also includes:
Script task is timed to the script.
Specifically: 000***root/usr/bin/copy.sh.
In conclusion the impala online interaction formula querying method provided in an embodiment of the present invention based on mixing storage scheme It has the advantages that compared with prior art
1, the scheme of the invention based on hbase and hdfs mixing storage, improves impala interactive inquiry speed, both The problem of impala inquires hdfs the feature faster than hbase, in turn avoids hdfs large amount of small documents is utilized.
2, data have been carried out parquet format compression by the scheme of the invention based on hbase and hdfs mixing storage, The time of network transmission when not only having reduced the space of storage, but also having reduced inquiry.
3, hdfs has daily carried out partitioned storage to data, can be only to relevant time period in querying condition set period Data are inquired, and inquiry velocity is improved.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation Method described in certain parts of example or embodiment.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features; And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (7)

1. a kind of impala online interaction formula querying method based on mixing storage scheme characterized by comprising
Hbase table is established with hadoop order, and builds table on hdfs with impala;
External table is established in HIVE to be associated, and external table has been checked whether in impala;
If having external table in impala, script is created by same day data and is directed into the hdfs;
When user issues inquiry request, hdfs and hbase are inquired respectively, and query result combination is shown to use Family.
2. building table on hdfs the method according to claim 1, wherein described with impala and including:
It creates the corresponding inquiry field of querying condition and data is subjected to subregion according to daily.
3. according to the method described in claim 2, it is characterized in that, it is described when user issue inquiry request when, respectively to hdfs It is inquired with hbase, and query result combination is shown to user, comprising:
Whether detect in querying condition includes time conditions, if comprising time conditions and the data of inquiry in need replicated To the table of hdfs, then only hdfs is inquired;
If comprising time conditions and inquiry in need data be not copied in hdfs if only HBase is inquired;
It is stored in hbase if not including data a part in time conditions or time conditions and being stored in another part in hdfs Conjunctive query then is carried out to hbase and hdfs.
4. the method according to claim 1, wherein it is described when user issue inquiry request when, it is right respectively Hdfs and hbase are inquired, and by query result combination be shown to user after, the method also includes:
Counting user issues the corresponding data bulk of inquiry request, and combines respectively from single hdfs, list hbase and hdfs Qualified number of data is taken out in hbasee.
5. according to the method described in claim 4, it is characterized in that, issuing the corresponding data of inquiry request in the counting user Quantity, and respectively after taking out qualified number of data in single hdfs, list hbase and hdfs joint hbasee, it is described Method further include:
The step-length and initial address of inquiry are set in impala, to provide page turning and formfeed operation.
6. the method according to claim 1, wherein the method also includes:
If not having external table in impala, INVALIDATE METADATA refresh metadata is used.
7. the method according to claim 1, wherein the method also includes:
Script task is timed to the script.
CN201811058357.XA 2018-09-11 2018-09-11 Impala online interactive query method based on mixed storage scheme Active CN109063201B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811058357.XA CN109063201B (en) 2018-09-11 2018-09-11 Impala online interactive query method based on mixed storage scheme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811058357.XA CN109063201B (en) 2018-09-11 2018-09-11 Impala online interactive query method based on mixed storage scheme

Publications (2)

Publication Number Publication Date
CN109063201A true CN109063201A (en) 2018-12-21
CN109063201B CN109063201B (en) 2022-02-11

Family

ID=64761332

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811058357.XA Active CN109063201B (en) 2018-09-11 2018-09-11 Impala online interactive query method based on mixed storage scheme

Country Status (1)

Country Link
CN (1) CN109063201B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112783923A (en) * 2020-11-25 2021-05-11 辽宁振兴银行股份有限公司 Implementation method for efficiently acquiring database based on Spark and Impala

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268336A (en) * 2013-05-13 2013-08-28 刘峰 Fast data and big data combined data processing method and system
CN103530378A (en) * 2013-10-15 2014-01-22 福建榕基软件股份有限公司 Data paging query method and device and data base construction method and device
CN104657446A (en) * 2015-02-04 2015-05-27 深圳市汇朗科技有限公司 Combined statistical query method, combined statistical query device and combined statistical query system for secondary tables
CN105117433A (en) * 2015-08-07 2015-12-02 北京思特奇信息技术股份有限公司 Method and system for statistically querying HBase based on analysis performed by Hive on HFile

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268336A (en) * 2013-05-13 2013-08-28 刘峰 Fast data and big data combined data processing method and system
CN103530378A (en) * 2013-10-15 2014-01-22 福建榕基软件股份有限公司 Data paging query method and device and data base construction method and device
CN104657446A (en) * 2015-02-04 2015-05-27 深圳市汇朗科技有限公司 Combined statistical query method, combined statistical query device and combined statistical query system for secondary tables
CN105117433A (en) * 2015-08-07 2015-12-02 北京思特奇信息技术股份有限公司 Method and system for statistically querying HBase based on analysis performed by Hive on HFile

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
AHMED ELDAWY ET AL: "The Era of Big Spatial Data: Challenges and Opportunities", 《IEEE》 *
T A ASHWITHA ET AL: "Movie Dataset Analysis Using Hadoop-Hive", 《IEEE》 *
徐涛: "结构化大数据存储与查询优化关键技术", 《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》 *
齐方方: "海量数据存储和准实时查询系统设计与实现", 《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112783923A (en) * 2020-11-25 2021-05-11 辽宁振兴银行股份有限公司 Implementation method for efficiently acquiring database based on Spark and Impala

Also Published As

Publication number Publication date
CN109063201B (en) 2022-02-11

Similar Documents

Publication Publication Date Title
CN101807207B (en) Method for sharing document based on content difference comparison
Tole Big data challenges.
US8170981B1 (en) Computer method and system for combining OLTP database and OLAP database environments
US9501512B2 (en) Optimizing storage in a publish / subscribe environment
TW202002587A (en) Block chain-based data processing method and device
CN104765840B (en) A kind of method and apparatus of big data distributed storage
CN104794190B (en) The method and apparatus that a kind of big data effectively stores
CN106649828B (en) Data query method and system
CN104331435A (en) Low-influence high-efficiency mass data extraction method based on Hadoop big data platform
CN104750855B (en) A kind of big data storage optimization method and device
WO2014011434A2 (en) System and method for economical migration of legacy applications from mainframe and distributed platforms
US20180032582A1 (en) Cross object synchronization
CN106933836A (en) A kind of date storage method and system based on point table
CN110019512A (en) A kind of data processing method and device
CN107480283A (en) Realize the method, apparatus and storage system of big data quick storage
CN109063201A (en) A kind of impala online interaction formula querying method based on mixing storage scheme
CN111753019A (en) Data partitioning method and device applied to data warehouse
Gupta et al. Fair: A hadoop-based hybrid model for faculty information retrieval system
CN103414756A (en) Task distributing method and distributing node and system
CN112817930A (en) Data migration method and device
CN104933119A (en) Big data management method
Pal et al. Big data real-time clickstream data ingestion paradigm for e-commerce analytics
Bhushan et al. Cost based model for big data processing with hadoop architecture
Singh NoSQL: A new horizon in big data
Sawant et al. Big Data Ingestion and Streaming Patterns

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant