CN104714983A - Generating method and device for distributed indexes - Google Patents

Generating method and device for distributed indexes Download PDF

Info

Publication number
CN104714983A
CN104714983A CN201310695615.6A CN201310695615A CN104714983A CN 104714983 A CN104714983 A CN 104714983A CN 201310695615 A CN201310695615 A CN 201310695615A CN 104714983 A CN104714983 A CN 104714983A
Authority
CN
China
Prior art keywords
index database
reduce operation
database corresponding
file system
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310695615.6A
Other languages
Chinese (zh)
Other versions
CN104714983B (en
Inventor
韩丙卫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201310695615.6A priority Critical patent/CN104714983B/en
Priority to PCT/CN2014/078696 priority patent/WO2014180411A1/en
Publication of CN104714983A publication Critical patent/CN104714983A/en
Application granted granted Critical
Publication of CN104714983B publication Critical patent/CN104714983B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a generating method and device for distributed indexes. According to the method, the number of map jobs in Hadoop is determined according to the data volume of original data; data processed through the map jobs are distributed to multiple reduce jobs, and an index database corresponding to each reduce job is generated, wherein the number of the reduce jobs and the corresponding relation between each reduce job and one or more map jobs are pre-configured; the index databases corresponding to the reduce jobs are combined. According to the technical scheme, mass data are efficiently and quickly indexed.

Description

The generation method of distributed index and device
Technical field
The present invention relates to the communications field, in particular to a kind of generation method and device of distributed index.
Background technology
Along with the arriving in cloud epoch, large data (Big data) have also attracted increasing concern.Large data are commonly used to a large amount of destructuring and the semi-structured data that describe company's creation, and these data can expend too much time and money when downloading to relevant database for analyzing.Large data analysis is normal to be linked together with cloud computing, because the real-time framework of large data set analysis needs as MapReduce shares out the work to tens of, hundreds of or even thousands of computers.And large data are often referred to for so a kind of phenomenon at internet industry: Internet firm generates in daily operation, the user network behavioral data of accumulation.The scale of these data is so huge, to such an extent as to cannot adopt G or T to weigh.
Do large data have much on earth? by means of only the time of one day, the full content that internet produces can carve full 1.68 hundred million DVD; The mail amount sent can reach more than 2,940 hundred million envelopes; The community post sent can reach 2,000,000; The mobile phone sold is 37.8 ten thousand
Cut-off was to 2012, and data volume is from TB(1TB=1024GB) rank rises to PB(1PB=1024TB), EB(1EB=1024PB) and even ZB(1ZB=1024EB) rank.The result of study of International Data Corporation (IDC) (IDC) shows, the data volume of whole world generation in 2008 is 0.49ZB, the data volume of whole world generation in 2009 is 0.8ZB, the data volume of whole world generation in 2010 increases as 1.2ZB, and the data volume that the whole world in 2011 produces is especially up to 1.82ZB, everyone produces the data of more than 200GB to be equivalent to the whole world.To 2012, the data volume of all printing materials of human being's production was 200PB, and all data volumes that the whole mankind said in history are approximately 5EB.The research of IBM shows, in the total data that whole human civilization obtains, has 90% to produce in two years in the past.And having arrived the year two thousand twenty, the data scale that the whole world produces will reach 44 times of today.
At present, at large data age, from large data, how fast and effeciently to search out the data that user is concerned about has become increasingly important problem.The efficient index of establishment is fast the prerequisite that user carries out searching for, and the technical scheme of the establishment index usually adopted in correlation technique is single-threaded, performance bottleneck is there is when in the face of mass data, due to higher to system requirements, and the limited system expanding ability, it cannot meet the demand that user fast and effeciently carries out data retrieval in mass data.
Summary of the invention
The invention provides a kind of generation method and device of distributed index, at least to solve the problem that cannot create efficient index fast in correlation technique to mass data.
According to an aspect of the present invention, a kind of generation method of distributed index is provided.
Generation method according to distributed index of the present invention comprises: the quantity determining mapping (map) operation in Hadoop according to the data volume of raw data; Data after each map operation process are dispensed to multiple stipulations (reduce) operation, and generate the index database corresponding with each reduce operation, wherein, the quantity of reduce operation and the corresponding relation between each reduce operation and one or more map operation are pre-configured completing; The index database corresponding with each reduce operation is merged.
Preferably, generate the index database corresponding with each reduce operation to comprise: the type obtaining the file system of current support; The generating mode of the index database corresponding with each reduce operation is determined according to the type of file system; The index database corresponding with each reduce operation is generated according to generating mode.
Preferably, generate the index database corresponding with each reduce operation according to generating mode to comprise: when the type of file system is Hadoop distributed file system (HDFS), in local disk, generate the index database corresponding with each reduce operation, then the index database generated in local disk is all uploaded to HDFS; Or, when the type of file system be all the other except HDFS support distributed file system (DFS) shared time, directly support to generate the index database corresponding with each reduce operation in the DFS shared at all the other.
Preferably, merging is carried out to the index database corresponding with each reduce operation and comprises: when the type of file system is HDFS, the index database corresponding with each reduce operation in HDFS is downloaded to local disk; Merge at the local disk pair index database corresponding with each reduce operation; The index database obtained after merging is uploaded to HDFS, and the index database corresponding with each reduce operation in local disk is deleted.
Preferably, merging is carried out to the index database corresponding with each reduce operation and comprises: when the type of file system be all the other support the DFS shared time, all the other are supported that the index database corresponding with each reduce operation generated in the DFS shared merges; All the other are supported that the index database corresponding with each reduce operation generated in the DFS shared is deleted.
According to a further aspect in the invention, a kind of generating apparatus of distributed index is provided.
Generating apparatus according to distributed index of the present invention comprises: determination module, for determining the quantity of the mapping map operation in Hadoop according to the data volume of raw data; Generation module, for the data after each map operation process are dispensed to multiple stipulations reduce operation, and generate the index database corresponding with each reduce operation, wherein, the quantity of reduce operation and the corresponding relation between each reduce operation and one or more map operation are pre-configured completing; Merge module, for merging the index database corresponding with each reduce operation.
Preferably, generation module comprises: acquiring unit, for obtaining the type of the file system of current support; Determining unit, for determining the generating mode of the index database corresponding with each reduce operation according to the type of file system; Generation unit, for generating the index database corresponding with each reduce operation according to generating mode.
Preferably, generation unit, for when the type of file system is Hadoop distributed file system HDFS, generates the index database corresponding with each reduce operation, then the index database generated in local disk is all uploaded to HDFS in local disk; Or, generation unit, for when the type of file system be all the other except HDFS support the distributed file system DFS shared time, directly support to generate the index database corresponding with each reduce operation in the DFS shared at all the other.
Preferably, merge module to comprise: download unit, for when the type of file system is HDFS, is downloaded to local disk by the index database corresponding with each reduce operation in HDFS; First merge cells, for merging at the local disk pair index database corresponding with each reduce operation; First processing unit, for the index database obtained after merging is uploaded to HDFS, and deletes the index database corresponding with each reduce operation in local disk.
Preferably, merge module and comprise: the second merge cells, for when the type of file system be all the other support the DFS shared time, all the other are supported that the index database corresponding with each reduce operation generated in shared DFS merges; By all the other, second processing unit, for supporting that the index database corresponding with each reduce operation generated in the DFS shared is deleted.
By the embodiment of the present invention, adopt the quantity determining the map operation in Hadoop according to the data volume of raw data; Data after each map operation process are dispensed to multiple reduce operation, and generating the index database corresponding with each reduce operation, the quantity of this reduce operation and the corresponding relation between each reduce operation and one or more map operation are pre-configured completing; The index database corresponding with each reduce operation is merged, namely by adopting the map operation in Hadoop and reduce operation to process raw data, generate the index database corresponding with each reduce operation, then the index database corresponding with each reduce operation is merged, to solve in correlation technique the problem that cannot create efficient index fast to mass data thus, and then achieve efficiently, rapidly index is carried out to mass data.
Accompanying drawing explanation
Accompanying drawing described herein is used to provide a further understanding of the present invention, and form a application's part, schematic description and description of the present invention, for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the process flow diagram of the generation method of distributed index according to the embodiment of the present invention;
Fig. 2 is the process flow diagram of the generation method of distributed index according to the preferred embodiment of the invention;
Fig. 3 is the structured flowchart of the generating apparatus of distributed index according to the embodiment of the present invention;
Fig. 4 is the structured flowchart of the generating apparatus of distributed index according to the preferred embodiment of the invention.
Embodiment
Hereinafter also describe the present invention in detail with reference to accompanying drawing in conjunction with the embodiments.It should be noted that, when not conflicting, the embodiment in the application and the feature in embodiment can combine mutually.
Fig. 1 is the process flow diagram of the generation method of distributed index according to the embodiment of the present invention.As shown in Figure 1, the method can comprise following treatment step:
Step S102: the quantity determining the map operation in Hadoop according to the data volume of raw data;
Step S104: the data after each map operation process are dispensed to multiple reduce operation, and generate the index database corresponding with each reduce operation, wherein, the quantity of reduce operation and the corresponding relation between each reduce operation and one or more map operation are pre-configured completing;
Step S106: the index database corresponding with each reduce operation is merged.
In correlation technique, cannot create efficiently mass data, index fast.Adopt method as shown in Figure 1, by adopting the map operation in Hadoop and reduce operation, raw data is processed, generate the index database corresponding with each reduce operation, then the index database corresponding with each reduce operation is merged, to solve in correlation technique the problem that cannot create efficient index fast to mass data thus, and then achieve efficiently, rapidly index is carried out to mass data.
Preferably, in step S104, generate the index database corresponding with each reduce operation and can comprise following operation:
Step S1: the type obtaining the file system of current support;
Step S2: the generating mode determining the index database corresponding with each reduce operation according to the type of file system;
Step S3: generate the index database corresponding with each reduce operation according to generating mode.
In a preferred embodiment, first, need the size of the data volume determining raw data to be obtained, and to be divided into M(M be positive integer) part, wherein, every number is according to a corresponding map operation respectively.Certainly, the data volume handled by each map operation can dynamic-configuration.Thus, map data processing plug-in unit is set.In addition, the middle key-value pair set produced after each map operation process regularly can write local disk, and it is positive integer that local disk can be divided into again N(N) individual, N is that User Defined is arranged, and each subregion is a corresponding reduce operation respectively.By the maximum number of configuration reduce operation, to improve the establishment efficiency of distributed index, and reduce data processing plug-in unit is set according to the quantity of user configured reduce operation.In the preferred embodiment, create index can support Hadoop distributed file system (HDFS) and other can support the distributed file system (DFS) shared.Therefore, the generating mode of the index database corresponding with each reduce operation can be determined according to the type difference creating the file system supported in Index process, then generate the index database corresponding with each reduce operation according to generating mode.
Preferably, in step s3, generate the index database corresponding with each reduce operation according to generating mode can one of comprise the following steps:
Step S31: when the type of file system is Hadoop distributed file system (HDFS), generates the index database corresponding with each reduce operation, then the index database generated in local disk is all uploaded to HDFS in local disk;
Step S32: when the type of file system be all the other except HDFS support distributed file system (DFS) shared time, directly support to generate the index database corresponding with each reduce operation in the DFS shared at all the other.
In a preferred embodiment, if the type of the file system of current support is HDFS, so each reduce operation all generates interim index database in local file system (i.e. local disk); Then, in the scale removal process that reduce operation is last, the interim index database generated can be uploaded in HDFS file system in local file system.If the type of the file system of current support is all the other support shared DFS, then directly can generate interim index database in DFS file system.
Preferably, in step s 106, merging is carried out to the index database corresponding with each reduce operation and can comprise following operation:
Step S4: when the type of file system is HDFS, is downloaded to local disk by the index database corresponding with each reduce operation in HDFS;
Step S5: merge at the local disk pair index database corresponding with each reduce operation;
Step S6: the index database obtained after merging is uploaded to HDFS, and the index database corresponding with each reduce operation in local disk is deleted.
In a preferred embodiment, if the type of the file system of current support is HDFS, so, first from HDFS file system, whole interim index database is downloaded to local file system by the index host node (master) of Hadoop; Secondly, index host node merges the whole interim index database in local file system, generates complete index database; Again, complete index database is uploaded in HDFS file system by index host node; Then, interim for each in local file system index database is deleted by index host node; Finally, complete index database is downloaded in local file system from HDFS file system from node (slave) by the index of Hadoop, so that retrieval uses.
Preferably, in step s 106, carry out merging to the index database corresponding with each reduce operation can comprise the following steps:
Step S7: when the type of file system be all the other support the DFS shared time, all the other are supported that the index database corresponding with each reduce operation generated in the DFS shared merges;
Step S8: all the other are supported that the index database corresponding with each reduce operation generated in the DFS shared is deleted.
In a preferred embodiment, if the type of the file system of current support is all the other support shared DFS, so first by the index host node of Hadoop, the interim index database in DFS file system is merged into complete index database, so that retrieval uses; On index host node, interim for each in DFS file system index database is deleted again.
Below in conjunction with the preferred implementation shown in Fig. 2, above-mentioned preferred implementation process is further described.
Fig. 2 is the process flow diagram of the generation method of distributed index according to the preferred embodiment of the invention.As shown in Figure 2, the processing stage that this flow process can comprising following:
First stage: data acquisition phase, i.e. the map sessions of Hadoop, data acquisition phase is the preposition preparatory stage arranging index, and it can provide Data support for creating index.What the map sessions of Hadoop adopted is distributed implementation, and it can process data concurrently, and wherein, the number needs of map operation dynamically will be determined by the data volume gathered.Utilize the collection text of the map operation of Hadoop or database file to process data, generate the content of each field (i.e. key-value pair (key, value) set) created required for index, drastically increase data processing performance thus.And when gathering owing to supporting plug-in unit process, therefore different processing modes can be customized according to data volume.
Subordinate phase: create index stage, i.e. the reduce sessions of Hadoop, creates distributed index storehouse.The greatest measure reduceNum of reduce job parallelism process is determined by the number arranging reduce operation.The data generated in data acquisition phase distribute concrete data to each reduce operation as index by HashCode () %reduceNum, and each reduce operation generates self interim index database file respectively.
It should be noted that, create index can support Hadoop distributed file system (HDFS) and other can support the distributed file system (DFS) shared.
Phase III: index merging phase, according to each the interim index database creating each reduce operation generation that the index stage obtains, call index merging by index host node and each interim index database is merged into a complete index database.When execution index merges, each interim index database can be read one by one, interim index database is incorporated into independent master index storehouse, finally each interim index database be deleted, and provide retrieval service by master index storehouse.
Fig. 3 is the structured flowchart of the generating apparatus of distributed index according to the embodiment of the present invention.As shown in Figure 3, this device can comprise: determination module 10, for determining the quantity of the mapping map operation in Hadoop according to the data volume of raw data; Generation module 20, for the data after each map operation process are dispensed to multiple stipulations reduce operation, and generate the index database corresponding with each reduce operation, wherein, the quantity of reduce operation and the corresponding relation between each reduce operation and one or more map operation are pre-configured completing; Merge module 30, for merging the index database corresponding with each reduce operation.
Adopt device as shown in Figure 3, solve the problem that cannot create efficient index fast in correlation technique to mass data, and then achieve efficiently, rapidly index is carried out to mass data.
Preferably, as shown in Figure 4, generation module 20 can comprise: acquiring unit 200, for obtaining the type of the file system of current support; Determining unit 202, for determining the generating mode of the index database corresponding with each reduce operation according to the type of file system; Generation unit 204, for generating the index database corresponding with each reduce operation according to generating mode.
Preferably, as shown in Figure 4, generation unit 204, for when the type of file system is Hadoop distributed file system HDFS, in local disk, generate the index database corresponding with each reduce operation, then the index database generated in local disk is all uploaded to HDFS; Or, generation unit 204, for when the type of file system be all the other except HDFS support the distributed file system DFS shared time, directly support to generate the index database corresponding with each reduce operation in the DFS shared at all the other.
Preferably, as shown in Figure 4, merge module 30 can comprise: download unit 300, for when the type of file system is HDFS, is downloaded to local disk by the index database corresponding with each reduce operation in HDFS; First merge cells 302, for merging at the local disk pair index database corresponding with each reduce operation; First processing unit 304, for the index database obtained after merging is uploaded to HDFS, and deletes the index database corresponding with each reduce operation in local disk.
Preferably, as shown in Figure 4, merge module 30 can comprise: the second merge cells 306, for when the type of file system be all the other support the DFS shared time, all the other are supported that the index database corresponding with each reduce operation generated in shared DFS merges; By all the other, second processing unit 308, for supporting that the index database corresponding with each reduce operation generated in the DFS shared is deleted.
From above description, can find out, above embodiments enable following technique effect (it should be noted that these effects are effects that some preferred embodiment can reach): adopt the technical scheme that the embodiment of the present invention provides, can process raw data by adopting the map-reduce programming model in Hadoop, generate the index database corresponding with each reduce operation, then the index database corresponding with each reduce operation is merged, form a complete index database, so that retrieval uses, solve the problem that cannot create efficient index fast in correlation technique to mass data thus, and then achieve to mass data efficiently, carry out index rapidly.
Obviously, those skilled in the art should be understood that, above-mentioned of the present invention each module or each step can realize with general calculation element, they can concentrate on single calculation element, or be distributed on network that multiple calculation element forms, alternatively, they can realize with the executable program code of calculation element, thus, they can be stored and be performed by calculation element in the storage device, and in some cases, step shown or described by can performing with the order be different from herein, or they are made into each integrated circuit modules respectively, or the multiple module in them or step are made into single integrated circuit module to realize.Like this, the present invention is not restricted to any specific hardware and software combination.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. a generation method for distributed index, is characterized in that, comprising:
The quantity of the mapping map operation in Hadoop is determined according to the data volume of raw data;
Data after each map operation process are dispensed to multiple stipulations reduce operation, and generate the index database corresponding with each reduce operation, wherein, the quantity of described reduce operation and the corresponding relation between described each reduce operation and one or more map operation are pre-configured completing;
The index database corresponding with described each reduce operation is merged.
2. method according to claim 1, is characterized in that, generates the index database corresponding with described each reduce operation and comprises:
Obtain the type of the file system of current support;
The generating mode of the index database corresponding with described each reduce operation is determined according to the type of described file system;
The index database corresponding with described each reduce operation is generated according to described generating mode.
3. method according to claim 2, is characterized in that, generates the index database corresponding with described each reduce operation comprise according to described generating mode:
When the type of described file system is Hadoop distributed file system HDFS, in local disk, generates the index database corresponding with described each reduce operation, then the index database generated in described local disk is all uploaded to described HDFS; Or,
When the type of described file system be all the other except described HDFS support the distributed file system DFS shared time, directly described all the other support to generate the index database corresponding with described each reduce operation in the DFS shared.
4. method according to claim 3, is characterized in that, carries out merging comprise the index database corresponding with described each reduce operation:
When the type of described file system is described HDFS, the index database corresponding with described each reduce operation in described HDFS is downloaded to described local disk;
Merge at the described local disk pair index database corresponding with described each reduce operation;
The index database obtained after merging is uploaded to described HDFS, and the index database corresponding with described each reduce operation in described local disk is deleted.
5. method according to claim 3, is characterized in that, carries out merging comprise the index database corresponding with described each reduce operation:
As the DFS that all the other supports described in the type of described file system is are shared, the index database corresponding with described each reduce operation generated in the DFS share all the other supports described merges;
The index database corresponding with described each reduce operation generated in the DFS all the other supports described shared is deleted.
6. a generating apparatus for distributed index, is characterized in that, comprising:
Determination module, for determining the quantity of the mapping map operation in Hadoop according to the data volume of raw data;
Generation module, for the data after each map operation process are dispensed to multiple stipulations reduce operation, and generate the index database corresponding with each reduce operation, wherein, the quantity of described reduce operation and the corresponding relation between described each reduce operation and one or more map operation are pre-configured completing;
Merge module, for merging the index database corresponding with described each reduce operation.
7. device according to claim 6, is characterized in that, described generation module comprises:
Acquiring unit, for obtaining the type of the file system of current support;
Determining unit, for determining the generating mode of the index database corresponding with described each reduce operation according to the type of described file system;
Generation unit, for generating the index database corresponding with described each reduce operation according to described generating mode.
8. device according to claim 7, it is characterized in that, described generation unit, for when the type of described file system is Hadoop distributed file system HDFS, in local disk, generate the index database corresponding with described each reduce operation, then the index database generated in described local disk is all uploaded to described HDFS; Or, described generation unit, for when the type of described file system be all the other except described HDFS support the distributed file system DFS shared time, directly described all the other support to generate the index database corresponding with described each reduce operation in the DFS shared.
9. device according to claim 8, is characterized in that, described merging module comprises:
Download unit, for when the type of described file system is described HDFS, is downloaded to described local disk by the index database corresponding with described each reduce operation in described HDFS;
First merge cells, for merging at the described local disk pair index database corresponding with described each reduce operation;
First processing unit, for the index database obtained after merging is uploaded to described HDFS, and deletes the index database corresponding with described each reduce operation in described local disk.
10. device according to claim 8, is characterized in that, described merging module comprises:
Second merge cells, for when the type of described file system be described all the other support the DFS shared time, to described all the other support that the index database corresponding with described each reduce operation generated in the DFS shared merges;
Second processing unit, for deleting the index database corresponding with described each reduce operation generated in DFS shared for all the other supports described.
CN201310695615.6A 2013-12-17 2013-12-17 The generation method and device of distributed index Active CN104714983B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201310695615.6A CN104714983B (en) 2013-12-17 2013-12-17 The generation method and device of distributed index
PCT/CN2014/078696 WO2014180411A1 (en) 2013-12-17 2014-05-28 Distributed index generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310695615.6A CN104714983B (en) 2013-12-17 2013-12-17 The generation method and device of distributed index

Publications (2)

Publication Number Publication Date
CN104714983A true CN104714983A (en) 2015-06-17
CN104714983B CN104714983B (en) 2019-02-19

Family

ID=51866791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310695615.6A Active CN104714983B (en) 2013-12-17 2013-12-17 The generation method and device of distributed index

Country Status (2)

Country Link
CN (1) CN104714983B (en)
WO (1) WO2014180411A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354251A (en) * 2015-10-19 2016-02-24 国家电网公司 Hadoop based power cloud data management indexing method in power system
CN105610899A (en) * 2015-12-10 2016-05-25 浪潮(北京)电子信息产业有限公司 Text file parallel uploading method and device

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105430078B (en) * 2015-11-17 2019-03-15 浪潮(北京)电子信息产业有限公司 A kind of distributed storage method of mass data
US11216516B2 (en) 2018-06-08 2022-01-04 At&T Intellectual Property I, L.P. Method and system for scalable search using microservice and cloud based search with records indexes

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100162230A1 (en) * 2008-12-24 2010-06-24 Yahoo! Inc. Distributed computing system for large-scale data handling
CN102426609A (en) * 2011-12-28 2012-04-25 厦门市美亚柏科信息股份有限公司 Index generation method and index generation device based on MapReduce programming architecture
CN102436491A (en) * 2011-11-08 2012-05-02 张三明 System and method used for searching huge amount of pictures and based on BigBase
CN102479217A (en) * 2010-11-23 2012-05-30 腾讯科技(深圳)有限公司 Method and device for realizing computation balance in distributed data warehouse
US20130254237A1 (en) * 2011-10-04 2013-09-26 International Business Machines Corporation Declarative specification of data integraton workflows for execution on parallel processing platforms

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102467570B (en) * 2010-11-17 2014-03-12 日电(中国)有限公司 Connection query system and method for distributed data warehouse
US20130151535A1 (en) * 2011-12-09 2013-06-13 Canon Kabushiki Kaisha Distributed indexing of data
CN103246549B (en) * 2012-02-07 2016-12-14 阿里巴巴集团控股有限公司 A kind of method and system of data conversion storage
CN103440244A (en) * 2013-07-12 2013-12-11 广东电子工业研究院有限公司 Large-data storage and optimization method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100162230A1 (en) * 2008-12-24 2010-06-24 Yahoo! Inc. Distributed computing system for large-scale data handling
CN102479217A (en) * 2010-11-23 2012-05-30 腾讯科技(深圳)有限公司 Method and device for realizing computation balance in distributed data warehouse
US20130254237A1 (en) * 2011-10-04 2013-09-26 International Business Machines Corporation Declarative specification of data integraton workflows for execution on parallel processing platforms
CN102436491A (en) * 2011-11-08 2012-05-02 张三明 System and method used for searching huge amount of pictures and based on BigBase
CN102426609A (en) * 2011-12-28 2012-04-25 厦门市美亚柏科信息股份有限公司 Index generation method and index generation device based on MapReduce programming architecture

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354251A (en) * 2015-10-19 2016-02-24 国家电网公司 Hadoop based power cloud data management indexing method in power system
CN105610899A (en) * 2015-12-10 2016-05-25 浪潮(北京)电子信息产业有限公司 Text file parallel uploading method and device
CN105610899B (en) * 2015-12-10 2019-09-24 浪潮(北京)电子信息产业有限公司 A kind of parallel method for uploading of text file and device

Also Published As

Publication number Publication date
WO2014180411A1 (en) 2014-11-13
CN104714983B (en) 2019-02-19

Similar Documents

Publication Publication Date Title
US9063976B1 (en) Dynamic tree determination for data processing
US9996593B1 (en) Parallel processing framework
US9372880B2 (en) Reclamation of empty pages in database tables
US8959519B2 (en) Processing hierarchical data in a map-reduce framework
US9268716B2 (en) Writing data from hadoop to off grid storage
CN109614402B (en) Multidimensional data query method and device
CN106030573A (en) Implementation of semi-structured data as a first-class database element
CN107748752B (en) Data processing method and device
US9529933B2 (en) Dynamic assignment of business logic based on schema mapping metadata
CN105468720A (en) Method for integrating distributed data processing systems, corresponding systems and data processing method
CN105900093B (en) A kind of update method of the tables of data of KeyValue databases and table data update apparatus
CN103473121A (en) Mass image parallel processing method based on cloud computing platform
CN105471989A (en) Data storage method
CN104112013A (en) HBase secondary indexing method and device
CN104111936A (en) Method and system for querying data
CN106970929A (en) Data lead-in method and device
CN107343021A (en) A kind of Log Administration System based on big data applied in state's net cloud
CN106055678A (en) Hadoop-based panoramic big data distributed storage method
CN103246549B (en) A kind of method and system of data conversion storage
Wang et al. Distributed storage and index of vector spatial data based on HBase
Konstantinou et al. Distributed indexing of web scale datasets for the cloud
JP2014078085A (en) Execution control program, execution control method and information processor
CN104714983A (en) Generating method and device for distributed indexes
US20150199408A1 (en) Systems and methods for a high speed query infrastructure
CN108628954B (en) Mass data self-service query method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant