CN103902614A - Data processing method, device and system - Google Patents

Data processing method, device and system Download PDF

Info

Publication number
CN103902614A
CN103902614A CN201210584674.1A CN201210584674A CN103902614A CN 103902614 A CN103902614 A CN 103902614A CN 201210584674 A CN201210584674 A CN 201210584674A CN 103902614 A CN103902614 A CN 103902614A
Authority
CN
China
Prior art keywords
data
column
hstore
server
query request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201210584674.1A
Other languages
Chinese (zh)
Other versions
CN103902614B (en
Inventor
徐萌
何鸿凌
杜宇健
钱岭
孙少陵
金骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201210584674.1A priority Critical patent/CN103902614B/en
Publication of CN103902614A publication Critical patent/CN103902614A/en
Application granted granted Critical
Publication of CN103902614B publication Critical patent/CN103902614B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An embodiment of the invention discloses a data processing method, device and system. The method includes that a sharding server receives data querying request, including key fields used for indicating the row where the requested data located and list fields used for indicating the column where the requested data located, transmitted by a main server; the sharding server queries corresponding column data in self-stored data according to the key fields and the list fields, and returns the queried column data to the main server in an array manner. According to the method, performance consumption of data processing in a distributed column type database system is reduced, and data processing efficiency is improved.

Description

A kind of data processing method, equipment and system
Technical field
The present invention relates to communication technical field, particularly relate to a kind of data processing method, equipment and system.
Background technology
Distributed column storage database is a kind of applicable fast query, distributed good solution, and it can also effectively improve the inquiry velocity to data when mass data storage is provided.
In existing distributed column memory technology scheme, mainly focus on how to realize data query, and the demand that does not have focused data to analyze.And in practical application, the major function of database is except inquiry, major part is analytic type demand.For example, add up under certain condition the summation of certain row; Calculate for certain several row, as calculated the ratio etc. of local telephone network minute and long-distance call minute.
For the problems referred to above, the solution in distributed system can adopt the method for Distributed Calculation to realize at present.For example, based on the system of Hadoop, adopt Mapreduce as Computational frame, its Map interface is dbinputformat, and this interface provides reading in data line.Specific as follows:
1), inputformat can be divided into several bursts according to key;
2), each Map reads in a burst;
3), the read-write interface that provides of Map intrinsic call distributed data base, according to key, read a line item.
In the inner Realization analysis of Map, what read in is the line item of a line a line, first need to distinguish concrete field to be processed according to field location, and then process; Some action need enters the reduce stage, for example summation.Obviously, this according to the capable mode that reads processing, do not utilize the advantage of column storage.
Realizing in process of the present invention, inventor finds at least to exist in prior art following problem:
Due to distributed column storage, each row family is kept in a file, so read the interface of a line item at every turn, need to read according to key the field of response from multiple files, then merges into a record and returns; Meanwhile, in the Map stage, because needs operate for certain row, also need line item to decompose according to field, could further operate, caused and merged and the twice performance loss splitting.
Summary of the invention
The embodiment of the present invention provides a kind of data processing method, equipment and system, to reduce the performance consumption of the data processing based on distributed column storage database system, improves data-handling efficiency.
In order to reach above object, the embodiment of the present invention provides a kind of data processing method, is applied in the distributed column storage database system that comprises master server and burst server, and the method comprises:
Burst server receives the data query request that master server forwards, and wherein carries the list field that is used to indicate the data column that the key field that the data that read of request are expert at and the request that is used to indicate read;
Described burst server is inquired about corresponding column data according to described key field and list field in the data of self storage, and the column data inquiring is returned to described master server with the form of array.
The embodiment of the present invention also provides a kind of distributed column storage database system, comprises master server and burst server,
Described master server is used for, and receives the data query request that client is initiated, and this data query request is transmitted to burst server; And receive the data of the array form that burst server returns;
Described burst server is used for, and receives the data query request that master server forwards, and wherein carries the list field that is used to indicate the data column that the key field that the data that read of request are expert at and the request that is used to indicate read; In the data of self storage, inquire about corresponding column data according to described key field and list field, and the column data inquiring is returned to described master server with the form of array.
The embodiment of the present invention also provides a kind of burst server, be applied in the distributed column storage database system that comprises master server, described distribution server comprises: a data slice module Hregion, at least one row module Hstore, and at least one row storage file HstoreFile; Wherein:
Described Hregion is used for, and receives the data query request that main service forwards, and wherein carries the list field of the data column that the key field that data that the request of being used to indicate reads are expert at and the request that is used to indicate read; Determine corresponding Hstore according to described list field, and this data query request is transmitted to this Hstore; Receive the data file that Hstore returns, according to this data file generated data array, and this data array is returned to master server;
Described Hstore is used for, and in the time receiving the data query request of Hregion forwarding, determines corresponding HstoreFile, and this data query request is transmitted to this HstoreFile according to described key field; Receive the data file that HstoreFile returns, and this data file is returned to Hregion;
Described HstoreFile is used for, and in the time receiving the data query request of Hstore forwarding, returns to whole data file to Hstore.
In the above embodiment of the present invention, burst server receives after the data query request of master server forwarding, in the data of self storage, inquire about corresponding column data according to key field and list field, and the column data inquiring is returned to master server with the form of array, the performance consumption that has reduced data processing in distributed column storage database system, has improved data-handling efficiency.
Brief description of the drawings
Fig. 1 is existing distributed column storage database system architecture schematic diagram;
Fig. 2 is the schematic flow sheet of existing distributed data base reading out data;
Fig. 3 is the schematic flow sheet of existing Map task deal with data;
The schematic flow sheet of a kind of data processing method that Fig. 4 provides for the embodiment of the present invention;
The schematic flow sheet of a kind of data processing method that Fig. 5 provides for the embodiment of the present invention;
The schematic flow sheet of a kind of data processing method that Fig. 6 provides for the embodiment of the present invention;
The structural representation of a kind of distributed column storage database system that Fig. 7 provides for the embodiment of the present invention;
The structural representation of a kind of split blade type server that Fig. 8 provides for the embodiment of the present invention.
Embodiment
The technical scheme providing in order to understand better the embodiment of the present invention, simply describes existing distributed column storage database system architecture and the conventional data processing method based on existing distributed column storage database system architecture below.
Referring to Fig. 1, existing distributed column storage database system comprises master server (Master) and burst server (Tablet Server), this burst server comprises: a data slice module (Hregion), at least one row module (Hstore), and at least one row storage file (HstoreFile); Wherein:
In a Hregion, can store one or more fragment datas; This fragment data comprises the total data of former tables of data a line or multirow, burst number can according to the quantity of the equipment of parallel data processing determine;
In a burst server, the data of storing in Hregion are stored in different Hstore and (in a Hstore, store the data of row or a Ge Lie family) by row or row family; The data branch storing in Hstore is stored in HstoreFile.Wherein, in distributed column storage database, several row of often simultaneously being accessed are defined as to row family.
Based on above-mentioned distributed column storage database system, in prior art, flow chart of data processing can be as shown in Figures 2 and 3.Wherein, this flow chart of data processing relates generally to two flow processs: first is the process of distributed data base reading out data; Second is the flow process of Map task deal with data.
Referring to Fig. 2, in prior art, the process of distributed data base reading out data can comprise the following steps:
Step 201, master server receive the data query request that client sends, and this data query request is transmitted to corresponding Hregion by the key field that the data that read according to the request that is used to indicate of wherein carrying are expert at.
Step 202, Hregion receive data query request, and traversal Hstore, to inquire about the data of corresponding key field in respective column.
Step 203, Hstore determine corresponding HstoreFile according to key field;
Step 204, HstoreFile determine the side-play amount (offset) of asking the data that read according to index corresponding to key field, and this side-play amount are returned to Hstore.
Step 205, Hstore read corresponding data according to this side-play amount, and the data that read are returned to Hregion.
Step 206, Hregion splice the result that all Hstore return.
Spliced result is returned to master server by step 207, Hregion.
Wherein, master server obtains after result, outputs it to Map task.
Referring to Fig. 3, for the flow process of Map task deal with data in prior art can comprise the following steps:
Step 301, Map read in a record (being data line, the mode reading data of Map to read line by line).
Step 302, from the record reading in, split out corresponding field value according to metadata information.
Wherein, during due to Map reading data, be to be undertaken by the capable mode reading, and the data that need to analyze and process are the data of certain row or a few row in tables of data, therefore, after Map reading data, need to from the data of reading in, split out according to metadata information corresponding field value (as the age).
Step 303, the field value obtaining is carried out to respective handling (as summation).
In such scheme, carry out data while reading still for reading by row, and due in distributed column storage database system, each row or row family are stored in a file, press the interface of row reading out data, need to read according to key the field of response from multiple files, then merge into a record and return, data reading performance using redundancy is lower; Further, the processing stage of Map task, because needs operate for certain row, after row reading out data, line item need to be decomposed according to field, could further operate, increase the performance consumption of data processing.
For the problems referred to above, the embodiment of the present invention provides a kind of technical scheme that is applied to the data processing in distributed column storage database system.In this technical scheme, in the data query request that client sends to the master server of distributed column storage database system, not only carry and be used to indicate the key field of asking the data that read to be expert at, also comprise and be used to indicate the list field of asking the data column reading; Master server receives after data query request, according to key field, this data query request is transmitted to corresponding burst server; Burst server receives after the data query request of master server forwarding, in the data of self storage, inquire about corresponding column data according to key field and list field, and the column data inquiring is returned to master server with the form of array, the performance consumption that has reduced data processing in distributed column storage database system, has improved data-handling efficiency.
Below in conjunction with the accompanying drawing in embodiments of the invention, the technical scheme in embodiments of the invention is clearly and completely described, obviously, the embodiments described below are only the present invention's part embodiment, instead of whole embodiment.Based on the embodiment in the present invention, those of ordinary skill in the art are not making the every other embodiment obtaining under creative work prerequisite, all belong to the scope of embodiments of the invention protection.
As shown in Figure 4, the schematic flow sheet of a kind of data processing method providing for the embodiment of the present invention, can comprise the following steps:
Step 401, burst server receive the data query request that master server forwards, and wherein carry the list field of the data column that the key field that data that the request of being used to indicate reads are expert at and the request that is used to indicate read.
Concrete, for distributed column storage database system, in the time that user need to carry out data query, can initiate data query request with the master server to distributed column storage database system by input corresponding query argument in client.
In order to make full use of the advantage of distributed column storage database system, in embodiments of the present invention, in the data query request that client sends to the master server of distributed column storage database system, except carrying the key field that data that the conventional request that is used to indicate reads are expert at, also carry the list field of the data column that the request of being used to indicate reads.
Master server receives after the data query request of client transmission, determines the burst server at the data place of institute's requesting query, and this data query request is transmitted to corresponding burst server according to the key field of wherein carrying.
Step 402, burst server are inquired about corresponding column data according to key field and list field in the data of self storage, and the column data inquiring is returned to master server with the form of array.
Concrete, in embodiments of the present invention, burst server is inquired about corresponding column data according to key field and list field in the data of self storage, and the specific implementation that the column data inquiring is returned to master server with the form of array can comprise the following steps:
Step 4021, Hregion determine corresponding Hstore according to the list field of carrying in data query request, and this data query request is transmitted to this Hstore.
Concrete, the fragment data of storing in Hregion is stored in Hstore by row or row family, when Hregion receives after data query request, determine the row of asking the data place of reading according to the list field of wherein carrying, and then determine the Hstore that stores this column data, and this data query request is transmitted to this Hstore.
Step 4022, Hstore determine corresponding HstoreFile according to the key field of carrying in data query request, and this data query request is transmitted to this HstoreFile.
Concrete, the data of storing in Hstore are stored in HstoreFile by row, when Hstore receives after data query request, determine the row of asking the data place of reading according to the key field of wherein carrying, and then determine the HstoreFile that stores the row data, and this data query request is transmitted to this HstoreFile.
Step 4023, HstoreFile receive after data query request, return to whole data file to Hstore.
Concrete, in prior art, HstoreFile receives after data query request, need to determine the offset that asks the data that read according to index corresponding to key field, and this offset is returned to Hstore, read the full line data of corresponding row according to this offset by Hstore.
In order to improve data-handling efficiency, in embodiments of the present invention, HstoreFile receives after data query request, directly whole data file is returned to Hstore, make Hstore directly obtain corresponding column data, and without go to read full line data according to offset.
The data file receiving is returned to Hregion by step 4024, Hstore.
The data file generated data array that step 4025, Hregion basis receive, and this data array is returned to master server.
By with upper type, realize reading of distributed column storage database system midrange certificate, take full advantage of the advantage of column storage, reduce the performance consumption that data read, improve the efficiency of data processing.
Master server receives after the data that burst server returns, and data need to be exported to Map task, to carry out further the processing of Map task.
As shown in Figure 6, the data processing method that the embodiment of the present invention provides can also comprise the following steps:
Step 601, Map read in a ColRecord.
Concrete, in embodiments of the present invention, be defined as follows structure:
ColRecord(coldata[1],coldata[2],……coldata[n])
Wherein, n is the columns of the column data that arrives of described burst server lookup, coldata[i] be described burst server lookup to column data in a column data, i is the positive integer that is not more than n.
Map receives after the data array data of master server output, according to above-mentioned data structure reading data.
Step 602, Map obtain each column data according to this ColRecord.
Step 603, Map carry out data processing according to the column data obtaining by row.
The data of exporting to Map task due to master server are no longer full line data, but data array; Map receives after the data array data of master server output, can be according to ColRecord structure reading data, directly obtain needing each column data to be processed, thereby each column data is analyzed and processed by row, and without again the line item reading in being decomposed according to field, the performance consumption that has further reduced data processing, has improved data-handling efficiency.
Can find out by above description, in the technical scheme providing in the embodiment of the present invention, in the data query request that client sends to the master server of distributed column storage database system, not only carry and be used to indicate the key field of asking the data that read to be expert at, also comprise and be used to indicate the list field of asking the data column reading; Master server receives after data query request, according to key field, this data query request is transmitted to corresponding burst server; Burst server receives after the data query request of master server forwarding, in the data of self storage, inquire about corresponding column data according to key field and list field, and the column data inquiring is returned to master server with the form of array, the performance consumption that has reduced data processing in distributed column storage database system, has improved data-handling efficiency.
Based on the identical technical conceive of said method embodiment, the embodiment of the present invention provides a kind of distributed column storage database system.
As shown in Figure 7, the structural representation of a kind of distributed column storage database system providing for the embodiment of the present invention, can comprise master server 71 and burst server 72, wherein:
Described master server 71 can be for, receives the data query request that client is initiated, and this data query request is transmitted to burst server 72; And receive the data of the array form that burst server 72 returns;
Described burst server 72 for, receive the data query request that forwards of master server 71, wherein carry the list field of the data column that the key field that data that the request of being used to indicate reads are expert at and the request that is used to indicate read; In the data of self storage, inquire about corresponding column data according to described key field and list field, and the column data inquiring is returned to described master server 71 with the form of array.
Wherein, described burst server 72 comprises a data slice module Hregion, at least one row module Hstore, and at least one row storage file HstoreFile; Wherein:
Described Hregion is used for, and receives the data query request that main service forwards, and wherein carries the list field of the data column that the key field that data that the request of being used to indicate reads are expert at and the request that is used to indicate read; Determine corresponding Hstore according to described list field, and this data query request is transmitted to this Hstore; Receive the data file that Hstore returns, according to this data file generated data array, and this data array is returned to master server;
Described Hstore is used for, and in the time receiving the data query request of Hregion forwarding, determines corresponding HstoreFile, and this data query request is transmitted to this HstoreFile according to described key field; Receive the data file that HstoreFile returns, and this data file is returned to Hregion;
Described HstoreFile is used for, and in the time receiving the data query request of Hstore forwarding, returns to whole data file to Hstore.
Wherein, described master server 71 can also be used for, and described data array is exported to Map, so that described Map is according to this data array reading out data, and carries out analyzing and processing according to the column data obtaining by row.
Wherein, described master server specifically for, data array is exported to Map, so that described Map reads described data array data according to ColRecord structure;
Described ColRecord structure is specially:
ColRecord(coldata[1],coldata[2],……coldata[n])
Wherein, n is the columns of the column data that arrives of described burst server lookup, coldata[i] be described burst server lookup to column data in a column data, i is the positive integer that is not more than n.
Wherein, in the distributed column storage database system providing in the embodiment of the present invention, a master server can corresponding one or more burst servers.
Based on the identical technical conceive of said method embodiment, the embodiment of the present invention also provides a kind of burst server, can be applied to said method embodiment.
As shown in Figure 8, the structural representation of a kind of burst server providing for the embodiment of the present invention, can comprise: a data slice module Hregion81, at least one row module Hstore82, and at least one row storage file HstoreFile83; Wherein:
Described Hregion81 is used for, and receives the data query request that main service forwards, and wherein carries the list field of the data column that the key field that data that the request of being used to indicate reads are expert at and the request that is used to indicate read; Determine corresponding Hstore82 according to described list field, and this data query request is transmitted to this Hstore82; Receive the data file that Hstore82 returns, according to this data file generated data array, and this data array is returned to master server;
Described Hstore82 is used for, and in the time receiving the data query request of Hregion81 forwarding, determines corresponding HstoreFile83, and this data query request is transmitted to this HstoreFile83 according to described key field; Receive the data file that HstoreFile83 returns, and this data file is returned to Hregion81;
Described HstoreFile83 is used for, and in the time receiving the data query request of Hstore82 forwarding, returns to whole data file to Hstore82.
Through the above description of the embodiments, those skilled in the art can be well understood to the embodiment of the present invention and can realize by hardware, and the mode that also can add necessary general hardware platform by software realizes.Based on such understanding, the technical scheme of the embodiment of the present invention can embody with the form of software product, it (can be CD-ROM that this software product can be stored in a non-volatile memory medium, USB flash disk, portable hard drive etc.) in, comprise that each implements the method described in scene in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) carry out the embodiment of the present invention in some instructions.
It will be appreciated by those skilled in the art that accompanying drawing is a schematic diagram of preferably implementing scene, the module in accompanying drawing or flow process might not be that the enforcement embodiment of the present invention is necessary.
It will be appreciated by those skilled in the art that the module in the device of implementing in scene can be distributed in the device of implementing scene according to implementing scene description, also can carry out respective change and be arranged in the one or more devices that are different from this enforcement scene.The module of above-mentioned enforcement scene can be merged into a module, also can further split into multiple submodules.
The invention described above embodiment sequence number, just to describing, does not represent the quality of implementing scene.
Disclosed is above only the several concrete enforcement scene of the embodiment of the present invention, and still, the embodiment of the present invention is not limited thereto, and the changes that any person skilled in the art can think of all should fall into the traffic limits scope of the embodiment of the present invention.

Claims (9)

1. a data processing method, is applied in the distributed column storage database system that comprises master server and burst server, it is characterized in that, the method comprises:
Burst server receives the data query request that master server forwards, and wherein carries the list field that is used to indicate the data column that the key field that the data that read of request are expert at and the request that is used to indicate read;
Described burst server is inquired about corresponding column data according to described key field and list field in the data of self storage, and the column data inquiring is returned to described master server with the form of array.
2. the method for claim 1, is characterized in that, described burst server comprises a data slice module Hregion, at least one row module Hstore, and at least one row storage file HstoreFile;
Described burst server is inquired about corresponding column data according to described key field and list field in the data of self storage, and the column data inquiring is returned to described master server with the form of array, is specially:
Described Hregion determines corresponding Hstore according to described list field, and this data query request is transmitted to this Hstore;
Described Hstore determines corresponding HstoreFile according to described key field, and this data query request is transmitted to this HstoreFile;
Described HstoreFile receives after data query request, returns to whole data file to Hstore;
The data file receiving is returned to Hregion by described Hstore;
The data file generated data array that described Hregion basis receives, and this data array is returned to master server.
3. method as claimed in claim 2, is characterized in that, the method also comprises:
Described data array is exported to Map by described master server, so that described Map is according to this data array reading out data, and carries out analyzing and processing according to the column data obtaining by row.
4. method as claimed in claim 3, is characterized in that, described Map, according to data array reading out data, is specially:
Described Map reads described data array data according to ColRecord structure;
Described ColRecord structure is specially:
ColRecord(coldata[1],coldata[2],……coldata[n])
Wherein, n is the columns of the column data that arrives of described burst server lookup, coldata[i] be described burst server lookup to column data in a column data, i is the positive integer that is not more than n.
5. a distributed column storage database system, comprises master server and burst server, it is characterized in that,
Described master server is used for, and receives the data query request that client is initiated, and this data query request is transmitted to burst server; And receive the data of the array form that burst server returns;
Described burst server is used for, and receives the data query request that master server forwards, and wherein carries the list field that is used to indicate the data column that the key field that the data that read of request are expert at and the request that is used to indicate read; In the data of self storage, inquire about corresponding column data according to described key field and list field, and the column data inquiring is returned to described master server with the form of array.
6. distributed column storage database system as claimed in claim 5, is characterized in that, described burst server comprises a data slice module Hregion, at least one row module Hstore, and at least one row storage file HstoreFile; Wherein:
Described Hregion is used for, and receives the data query request that main service forwards, and wherein carries the list field of the data column that the key field that data that the request of being used to indicate reads are expert at and the request that is used to indicate read; Determine corresponding Hstore according to described list field, and this data query request is transmitted to this Hstore; Receive the data file that Hstore returns, according to this data file generated data array, and this data array is returned to master server;
Described Hstore is used for, and in the time receiving the data query request of Hregion forwarding, determines corresponding HstoreFile, and this data query request is transmitted to this HstoreFile according to described key field; Receive the data file that HstoreFile returns, and this data file is returned to Hregion;
Described HstoreFile is used for, and in the time receiving the data query request of Hstore forwarding, returns to whole data file to Hstore.
7. system as claimed in claim 6, is characterized in that,
Described master server also for, described data array is exported to Map so that described Map is according to this data array reading out data, and carry out analyzing and processing according to the column data obtaining by row.
8. system as claimed in claim 7, is characterized in that,
Described master server specifically for, data array is exported to Map, so that described Map reads described data array data according to ColRecord structure;
Described ColRecord structure is specially:
ColRecord(coldata[1],coldata[2],……coldata[n])
Wherein, n is the columns of the column data that arrives of described burst server lookup, coldata[i] be described burst server lookup to column data in a column data, i is the positive integer that is not more than n.
9. a burst server, be applied in the distributed column storage database system that comprises master server, it is characterized in that, described distribution server comprises: a data slice module Hregion, at least one row module Hstore, and at least one row storage file HstoreFile; Wherein:
Described Hregion is used for, and receives the data query request that main service forwards, and wherein carries the list field of the data column that the key field that data that the request of being used to indicate reads are expert at and the request that is used to indicate read; Determine corresponding Hstore according to described list field, and this data query request is transmitted to this Hstore; Receive the data file that Hstore returns, according to this data file generated data array, and this data array is returned to master server;
Described Hstore is used for, and in the time receiving the data query request of Hregion forwarding, determines corresponding HstoreFile, and this data query request is transmitted to this HstoreFile according to described key field; Receive the data file that HstoreFile returns, and this data file is returned to Hregion;
Described HstoreFile is used for, and in the time receiving the data query request of Hstore forwarding, returns to whole data file to Hstore.
CN201210584674.1A 2012-12-28 2012-12-28 A kind of data processing method, equipment and system Active CN103902614B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210584674.1A CN103902614B (en) 2012-12-28 2012-12-28 A kind of data processing method, equipment and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210584674.1A CN103902614B (en) 2012-12-28 2012-12-28 A kind of data processing method, equipment and system

Publications (2)

Publication Number Publication Date
CN103902614A true CN103902614A (en) 2014-07-02
CN103902614B CN103902614B (en) 2018-05-04

Family

ID=50993942

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210584674.1A Active CN103902614B (en) 2012-12-28 2012-12-28 A kind of data processing method, equipment and system

Country Status (1)

Country Link
CN (1) CN103902614B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404638A (en) * 2015-09-28 2016-03-16 高新兴科技集团股份有限公司 Method for solving correlated query of distributed cross-database fragment table
WO2016155510A1 (en) * 2015-03-28 2016-10-06 Huawei Technologies Co., Ltd. Apparatus and method for creating user defined variable size tags on records in rdbms
CN106802891A (en) * 2015-11-26 2017-06-06 中国电信股份有限公司 The querying method of the non-burst field of distributed data base, system and equipment
CN111090618A (en) * 2019-10-29 2020-05-01 厦门网宿有限公司 Data reading method, system and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727465A (en) * 2008-11-03 2010-06-09 中国移动通信集团公司 Methods for establishing and inquiring index of distributed column storage database, device and system thereof
CN101828182A (en) * 2007-09-21 2010-09-08 哈索-普拉特纳-研究所软件系统有限责任公司 ETL-less zero redundancy system and method for reporting OLTP data
CN102521367A (en) * 2011-12-16 2012-06-27 清华大学 Distributed type processing method based on massive data
CN102156714B (en) * 2011-03-22 2012-11-14 清华大学 Method for realizing self-adaptive vertical divided relational database and system thereof
WO2012164469A1 (en) * 2011-05-31 2012-12-06 International Business Machines Corporation A method for determining rules by providing data records in columnar data structures

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101828182A (en) * 2007-09-21 2010-09-08 哈索-普拉特纳-研究所软件系统有限责任公司 ETL-less zero redundancy system and method for reporting OLTP data
CN101727465A (en) * 2008-11-03 2010-06-09 中国移动通信集团公司 Methods for establishing and inquiring index of distributed column storage database, device and system thereof
CN102156714B (en) * 2011-03-22 2012-11-14 清华大学 Method for realizing self-adaptive vertical divided relational database and system thereof
WO2012164469A1 (en) * 2011-05-31 2012-12-06 International Business Machines Corporation A method for determining rules by providing data records in columnar data structures
CN102521367A (en) * 2011-12-16 2012-06-27 清华大学 Distributed type processing method based on massive data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
冰冻奶茶: "关于hbase的read操作的深入研究", 《中国最专业的IT技术社区,HTTP://WWW.ITPUB.NET/THREAD-1606989-1-1.HTML》 *
李作主: "巧用数组实现多表数据的更新", 《科技信息》 *
鲍亮,陈荣: "《深入浅出云计算》", 31 October 2012, 清华大学出版社 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016155510A1 (en) * 2015-03-28 2016-10-06 Huawei Technologies Co., Ltd. Apparatus and method for creating user defined variable size tags on records in rdbms
CN105404638A (en) * 2015-09-28 2016-03-16 高新兴科技集团股份有限公司 Method for solving correlated query of distributed cross-database fragment table
CN106802891A (en) * 2015-11-26 2017-06-06 中国电信股份有限公司 The querying method of the non-burst field of distributed data base, system and equipment
CN111090618A (en) * 2019-10-29 2020-05-01 厦门网宿有限公司 Data reading method, system and equipment
CN111090618B (en) * 2019-10-29 2023-08-18 厦门网宿有限公司 Data reading method, system and equipment

Also Published As

Publication number Publication date
CN103902614B (en) 2018-05-04

Similar Documents

Publication Publication Date Title
EP3602351B1 (en) Apparatus and method for distributed query processing utilizing dynamically generated in-memory term maps
US8874600B2 (en) System and method for building a cloud aware massive data analytics solution background
CN103678408B (en) A kind of method and device of inquiry data
CN104424199B (en) searching method and device
JP2020038623A (en) Method, device, and system for storing data
CN107798038B (en) Data response method and data response equipment
US10013440B1 (en) Incremental out-of-place updates for index structures
US8775471B1 (en) Representing user behavior information
CN110431545A (en) Inquiry is executed for structural data and unstructured data
US20140074771A1 (en) Query optimization
CN102375837B (en) Data acquiring system and method
CN103838867A (en) Log processing method and device
CN104516979A (en) Data query method and data query system based on quadratic search
CN107704202B (en) Method and device for quickly reading and writing data
CN108875042B (en) Hybrid online analysis processing system and data query method
US20150356137A1 (en) Systems and Methods for Optimizing Data Analysis
CN103353901B (en) The orderly management method of table data based on Hadoop distributed file system and system
CN103714096A (en) Lucene-based inverted index system construction method and device, and Lucene-based inverted index system data processing method and device
CN106970929A (en) Data lead-in method and device
CN106294826A (en) A kind of company-data Query method in real time and system
CN104899278B (en) A kind of generation method and device of Hbase database data operation log
CN109359237A (en) It is a kind of for search for boarding program method and apparatus
CN104268298A (en) Method for creating database index and inquiring data
CN103902614A (en) Data processing method, device and system
CN111026709A (en) Data processing method and device based on cluster access

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant