WO2013078583A1 - Procédé et appareil permettant d'optimiser l'accès à des données, et procédé et appareil permettant d'optimiser le stockage de données - Google Patents

Procédé et appareil permettant d'optimiser l'accès à des données, et procédé et appareil permettant d'optimiser le stockage de données Download PDF

Info

Publication number
WO2013078583A1
WO2013078583A1 PCT/CN2011/083021 CN2011083021W WO2013078583A1 WO 2013078583 A1 WO2013078583 A1 WO 2013078583A1 CN 2011083021 W CN2011083021 W CN 2011083021W WO 2013078583 A1 WO2013078583 A1 WO 2013078583A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
information
range
input
key value
Prior art date
Application number
PCT/CN2011/083021
Other languages
English (en)
Chinese (zh)
Inventor
智伟
赵智峰
周帅锋
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201180002537.6A priority Critical patent/CN102725753B/zh
Priority to PCT/CN2011/083021 priority patent/WO2013078583A1/fr
Publication of WO2013078583A1 publication Critical patent/WO2013078583A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Definitions

  • the present invention relates to the field of data processing technologies, and in particular, to a method and apparatus for optimizing data access, and a method and apparatus for optimizing data storage.
  • HBASE Hadoop Database
  • MapReduce has been widely used in large-scale data analysis.
  • HBASE Hadoop Database
  • MapReduce has been widely used in large-scale data analysis.
  • HBASE Hadoop Database
  • MapReduce can be used as a MapReduce data source and data destination, enabling MapReduce to process data stored in HBASE or output. The data is saved in HBASE.
  • the table name and the range of the data accessed by the MapReduce are defined by the table name and the scope query object.
  • the range query object defines the data query range by setting the start key value and the terminating key value.
  • the main controller creates a Map task based on the block information and the number, and each Map task processes the data in one block.
  • the MapReduce function library in the user program first divides the input file into M blocks. According to the table to be saved, access the HBASE metadata table to obtain the range information corresponding to the table, and specify the number of Reduces R by the range information.
  • the main controller gets the input split information and creates a Map task for each partition. Create a Reduce task based on the number of Reduces configured. There are a total of M Map tasks and R Reduce tasks that need to be dispatched. The master controller dispatches tasks to the slave processor. There are a total of M Map tasks and R Reduce tasks that need to be dispatched. (3) Other processes are the same as the basic MapReduce process.
  • the data range of MapReduce access is defined according to the scope of data query in a range query object.
  • the data range specified by the scope query object can only be expanded. This results in too many partitions being covered and the partition contains a large amount of invalid data.
  • the MapReduce program In addition to reading the required records in the partition, the MapReduce program must also read a large number of invalid records for comparison and discard, resulting in a large number of invalid operations, which seriously reduces the speed of data processing execution.
  • An embodiment of the present invention provides a method and apparatus for optimizing data access to reduce the reading of invalid data and improve data processing efficiency.
  • An embodiment of the present invention provides a method and an apparatus for optimizing data storage, so as to reduce execution of invalid MapReduce tasks and improve data processing efficiency.
  • the technical solution adopted by the embodiment of the present invention is:
  • a method of optimizing data access including:
  • the main controller receives a request for the user to access the data table in the HBASE, where the request carries data input range information, and the data input range information includes multiple data input ranges;
  • a method of optimizing data storage including:
  • the primary controller receives a request from the user to store data in a data table in HBASE, the request carrying one or more data output ranges; Determining output block information according to the partition information of the data table and the data output range; determining a number of Reduce tasks according to the output block information;
  • Data is written from the processor to the data table in accordance with the number of Reduce tasks.
  • An apparatus for optimizing data access comprising:
  • a receiving unit configured to receive a request for a user to access a data table in the HBASE, where the request carries data input range information, where the data input range information includes multiple data input ranges;
  • a blocking unit configured to determine input block information according to the partition information of the data table and the data input range information
  • a task determining unit configured to determine a number of Map tasks according to the input block information
  • an allocating unit configured to read data in the data table from the processor according to the number of the Map tasks
  • a sending unit configured to return the data read by the slave processor to the user.
  • An apparatus for optimizing data storage comprising:
  • a receiving unit configured to receive a request for a user to store data in a data table in HBASE, where the request carries one or more data output ranges;
  • An output blocking unit configured to determine output block information according to the partition information of the data table and the data output range
  • a task determining unit configured to determine, according to the output block information, a number of Reduce tasks
  • an allocating unit configured to allocate data from the processor to the data table according to the number of the reduced task allocations.
  • the method and device for optimizing data access when HBASE is used as the data source of MapReduce, reduce the reading of invalid data by specifying a plurality of data input ranges, and improve data processing efficiency; accordingly, the embodiment of the present invention optimizes data.
  • the storage method and device when HBASE is used as the data destination of MapReduce, reduce the execution of invalid MapReduce tasks by specifying multiple data output ranges, and improve data processing efficiency.
  • FIG. 1 is a schematic diagram of an operation flow of an existing MapReduce
  • FIG. 2 is a schematic diagram of a data model of a table in an existing HBASE database
  • FIG. 3 is a schematic diagram of an operation flow of MapReduce in the prior art when HBASE is used as a data source;
  • FIG. 4 is a schematic diagram of the operation flow of MapReduce in the prior art when HBASE is used as a data destination;
  • FIG. 5 is a flowchart of a method for optimizing data access according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a data table in HBASE according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of querying the data table shown in FIG. 6 according to the prior art.
  • FIG. 8 is a schematic diagram of querying the data table shown in FIG. 6 according to a method according to an embodiment of the present invention
  • FIG. 9 is a flowchart of a method for optimizing data storage according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of storing the data table shown in FIG. 4 according to the prior art
  • FIG. 11 is a schematic diagram of storing the data table shown in FIG. 4 according to the method of the embodiment of the present invention
  • FIG. 12 is an embodiment of the present invention. Schematic diagram of a device for data access
  • FIG. 13 is a schematic diagram showing the structure of an input blocking unit in an apparatus for optimizing data access according to an embodiment of the present invention
  • FIG. 14 is a schematic structural diagram of an apparatus for optimizing data storage according to an embodiment of the present invention.
  • Figure 15 is a block diagram showing the structure of an output block unit in an apparatus for optimizing data storage in an embodiment of the present invention. detailed description
  • MapReduce consists of three separate entities: user program, main controller, and slave processor.
  • the main controller is used to coordinate the running of the job, and assigns the task to the slave processor; the slave task processes the Map task and the Reduce task after the job is run.
  • the user program calls the MapReduce function, it will cause the following operations:
  • the main controller gets the input split information and creates a Map task for each partition. Create a Reduce task based on the configured amount of Reduce. There are a total of M Map tasks and R Reduce tasks that need to be dispatched. The master controller dispatches tasks to the slave processor.
  • a slave processor assigned a Map task reads and processes the associated input chunks.
  • the intermediate result of the Map task buffering to memory will be periodically written to the local hard disk, and the data is divided into R areas by the partition function.
  • the intermediate result is that the location information of the local hard disk will be sent back to the primary controller, and then the primary controller is responsible for transmitting these location information to the slave processor of the Reduce task.
  • the slave task of the Reduce task processes the intermediate data and passes the intermediate result value set to the user-defined Reduce function.
  • the output of the Reduce block in the Reduce function is output to a final output file.
  • HBASE is a distributed column storage database, in which the data model of the table is shown in Figure 2:
  • Figure 2 shows the data in a HBASE table, including the following information:
  • RowKey The identifier of each row of data, similar to the primary key of a table in a relational database.
  • Column Family the HBASE table is a collection of different column clusters. Columns of tables in a similar relational database need to be predefined. Unlike columns in a table in a relational database, there can be multiple columns under a column cluster.
  • the data in the table is organized according to a certain size.
  • the partition will be split, a new partition will be created, and the data under the original column cluster will be moved to the new partition in the row key order.
  • HBASE can be used as a data source for MapReduce, that is, MapReduce can process data stored in HBASE.
  • MapReduce can process data stored in HBASE.
  • Figure 3 when the user program calls the MapReduce function, it will cause the following operations: 1) MapReduce obtains all Region information of the formulated table from HBASE according to the table name, and obtains the data range and all Region scope information by comparing the scope query object. Block information and number.
  • the main controller creates a Map task according to the above-mentioned block information and the number, and each Map task processes the data in one block.
  • the master controller dispatches a Map task to the slave processor.
  • HBASE can also be used as a data export for MapReduce. That is to say, MapReduce can store output data in HBASE, and define the table to be stored by MapReduce data by table name.
  • the MapReduce function in the user program first divides the input file into multiple partitions. According to the table to be saved, the HBASE data table is accessed to obtain the Region information corresponding to the table, and the Reduce number is specified by the Region information.
  • the main controller gets the input file block information and creates a Map task for each block. Create a Reduce task based on the configured Reduce amount.
  • the method and device for optimizing data storage in view of the above problems in the prior art, when HBASE is used as a data destination of MapReduce, the execution of invalid MapReduce tasks is reduced by specifying multiple data output ranges, thereby improving Data processing efficiency.
  • Step 501 The main controller receives a request for a user to access a data table in the HBASE, where the request carries data input range information, where the data input range information includes multiple data input ranges.
  • the plurality of data input ranges described above may be divided by a separator, and accordingly, the main controller may divide the input string by a separator to obtain a plurality of data input ranges.
  • the data input range may take various forms, for example, any one of the following forms: a) a list form, and the list includes multiple range query objects, and each range query object is a data input range, for example:
  • the file format that is, in the form of a file, a plurality of data input ranges are saved in the file, and the main controller obtains a data input range by reading the file, for example:
  • Step 502 Determine input block information according to the partition information of the data table and the data input range information.
  • the start key value and the end key value of all the partitions in the data table may be first acquired, and then each data input range in the data input range information is respectively compared with the start key value of each partition.
  • the termination key values are compared to obtain the coverage of each data input range in each partition, and the input block information can be determined according to the coverage range.
  • the coverage areas belonging to the same partition and the continuous coverage may be merged first, and then the input points are determined according to the combined coverage. Block information.
  • the data input range in the data input range information is compared with the start key value and the end key value of each partition respectively.
  • the data input range in the data input range information is first sorted, and then the sorted data input range is sequentially compared with the start key value and the end key value of each partition.
  • the input block information may include: inputting the number of blocks and the start key value and the end key value of each input block.
  • Step 503 Determine the number of Map tasks according to the input block information.
  • the number of Map tasks may be determined according to the number of input blocks, and each Map task corresponds to one input block.
  • Step 504 Read data in the data table from the processor according to the number of the Map tasks.
  • the master controller may allocate a slave processor for each Map task, and transmit the start key value and the terminating key value of the input block corresponding to the Map task to the slave processor.
  • the slave processor reads the data in the data table according to the start key value and the end key value of the input block.
  • Step 505 Return the data read from the processor to the user.
  • the table name information of the data table is also included.
  • the main controller may first check the name of the table in the request and the data input range carried in the request to determine whether the information is correct. That is to say, according to the table name information in the request, it is checked whether there is a corresponding data table in HBASE, and whether the data input range is within the data range stored in the corresponding data table. If the information is correct, perform step 502 above.
  • the HBASE system table may be queried. If the table name is not included in the system table, the input table is incorrect; if the table name is present, the table name is correct. Then, the Region information of the data table corresponding to the table name is obtained, and by comparing the data input range and the Region information, if the data input range is smaller than the start key value of the table or greater than the termination key value of the table, the data input range may be determined to be incorrect. Otherwise, you can determine that the data input range is correct.
  • the method for optimizing data access in the embodiment of the present invention may also be The method is compatible. For example, after receiving the request of the user to access the data table in the HBASE, the main controller determines whether the data carries a plurality of data input ranges, and if yes, accesses the HBASE according to the process of steps 502 to 504. The data table in ; otherwise, access the data table in HBASE in the manner of the prior art.
  • a data table in HBASE shown in Figure 6 is a movie access information table indicating the number of times different movies are accessed by different types of terminals each day.
  • the Region information in the data table shown in Figure 6 is shown in Table 1 below.
  • MapReduce processing is performed on the data table shown in Figure 6 according to the prior art, and the total number of accesses of different terminals of each movie in the first week of August is aggregated, the processing is as shown in Fig. 7, as follows:
  • the effective input data range is:
  • Region2 (movie 3#20110101, film 1#20110807)
  • the valid input data range is divided into three different block information, which are:
  • the first block information Movie 1#20110801 Movie 1#20110808, where the movie! #20110808 is invalid data.
  • the second block information film 2#20110102 film 2#20110808, where the film 2#20110101—movie 2#20110731, the film 2#20110808 is invalid data.
  • the third block information film 3#20110101 film 1#20110807, where the film 3#20110101-film 3#20110731 is invalid data.
  • the host controller initiates three Map tasks based on these three block information and dispatches them to the slave processor for processing.
  • the three Map tasks process the data in the different blocks, and the invalid data is also included in the above blocks.
  • the processor processes the data, it reads each record in each block from the HBASE table to determine whether it is the first week of August data, and if so, adds it. If not, it discards it and does not process it. As shown in Figure 7, the data in the thick line frame is valid data, and the others are invalid data.
  • MapReduce processing is performed on the data table shown in FIG. 6, and the total number of accesses of different terminals of each movie in the first week of August is aggregated, and the processing procedure is as shown in FIG. as follows:
  • the main controller receives a request for data access by the user, and the request carries multiple data input range information. For example, in this example, there are three data input ranges, as follows:
  • SCAN1 starting key value: movie 1#20110801, termination key value: movie 1#20110807
  • SCAN2 starting key value: movie 2#20110801, termination key value: movie 2#20110807
  • SCAN3 starting key value: Movie 3#20110801, End Key: Movie 3#20110807).
  • the valid input data range is:
  • the valid input data range is: Region2 (movie 3#20110801, film 3#20110807).
  • the above valid input data range belongs to three different Regions, so according to the three valid input data ranges obtained above, three different input block information are obtained, which are:
  • the host controller starts three Map tasks and dispatches them to the slave processor for processing.
  • the three Map tasks process the data in different input block information.
  • the data in the above three input block information is valid data, and does not contain invalid data.
  • the effective input data range is limited by multiple comparison operations of the cartridge, so that the data processing process is reduced. A large amount of invalid data is read, which greatly improves the processing efficiency.
  • FIG. 9 it is a flowchart of a method for optimizing data storage according to an embodiment of the present invention, which includes the following steps:
  • Step 901 The main controller receives a request for the user to store data in the HBASE data table, where the request carries one or more data output ranges.
  • the plurality of data output ranges described above may be divided by a separator, and accordingly, the main controller divides the output string by a separator to obtain a plurality of data output ranges.
  • the data output range may take various forms, for example, any of the following forms: a) In the form of a list, the list includes a start and end data range pair, for example:
  • Each start and end data range pair represents a data output range, for example:
  • Step 902 Determine output block information according to the partition information of the data table and the data output range.
  • the start key value and the end key value of all the partitions in the data table may be first obtained, and then the data output range is compared with the start key value and the end key value of each partition respectively to obtain a
  • the partition information covered by the data output range is determined, and the output block information is determined according to the partition information.
  • the output block information includes: outputting the number of blocks.
  • Step 903 Determine, according to the output block information, a number of Reduce tasks.
  • the number of Reduce tasks may be determined according to the number of output blocks, and each Reduce task corresponds to one output block.
  • Step 904 Write data from the processor to the data table according to the number of the Reduce tasks.
  • the main controller may allocate a slave processor for each Reduce task, and transfer the storage data corresponding to the Reduce task to the slave processor.
  • the slave processor writes the storage data corresponding to the Reduce task into the partition corresponding to the output block.
  • the data output table is the movie access information table shown in FIG.
  • RegionO (movie 1#20110104, film 1#20110808);
  • Region2 (movie 3#20110101, film 3#20110808).
  • the Reduce task is dispatched to the slave processor for processing.
  • Each of the Reduce tasks processes an output data range, and the data satisfying the condition "start key value ⁇ input data termination key value" is stored in the corresponding Region.
  • the actual data range in Table 2 that needs to be saved to HBASE is the movie 3#20110809—movie 3#20110812, which belongs to the Reduce data range corresponding to Region2, so the Reduce corresponding to RegionO and Regionl will not process any data. That is, since the actual input data satisfies the data range of Region2, the data in Table 2 will be stored in Region2, while the other two Reduce tasks have no output data, which is an invalid Reduce task, as shown in Figure 10.
  • the main controller receives a request for data storage by the user, and the request carries data output range information.
  • the data output range information carried is: ⁇ movie 3#20110809, movie 3#20110812>.
  • MapReduce compares the data output range with the Region information of the data table shown in Figure 6, and obtains a valid output data range, namely Region2 (movie 3#20110101, movie 3#20110808).
  • the valid output data range belongs to only one Region.
  • Region2 (film 3#20110101, film 3#20110808)
  • the main controller sets one Reduce task according to the above output block information. Dispatched to the slave processor for processing.
  • the data storage method of the embodiment of the present invention divides and sets the output data by subdividing The comparison output operation limits the effective output data range, which reduces the startup and destruction of a large number of invalid Reduce tasks in the data processing process, and greatly improves the processing efficiency.
  • the embodiment of the present invention further provides an apparatus for optimizing data access, as shown in FIG. 12, which is a schematic structural diagram of the apparatus.
  • the device for optimizing data access includes:
  • the receiving unit 121 is configured to receive a request for the user to access the data table in the HBASE, where the request carries data input range information, the data input range information includes multiple data input ranges, and the input blocking unit 122 is configured to Partition information of the data table and the data input range information, determining input block information;
  • the task determining unit 123 is configured to determine the number of Map tasks according to the input block information, and the allocating unit 124 is configured to read data in the data table from the processor according to the number of the Map tasks;
  • the sending unit 125 is configured to return the data read by the slave processor to the user.
  • the input blocking unit 122 can be implemented in various manners.
  • the input blocking unit 122 can include: a partition information acquiring subunit 1221, a comparing subunit 1222, a merging subunit 1223, and blocking information.
  • Subunit 1224 is determined. among them:
  • the partition information obtaining sub-unit 1221 is configured to obtain a start key value and a stop key value of all partitions in the data table;
  • the comparing subunit 1222 is configured to compare the data input range in the data input range information with the start key value and the end key value of each partition, to obtain coverage of the data input range in each partition. ;
  • the merging sub-unit 1223 is configured to merge all the coverages obtained by the comparing sub-units into the same partition and merge the coverage areas;
  • the block information determining sub-unit 1224 is configured to determine the input block information according to the merged coverage.
  • the foregoing merging sub-unit 1223 is optional, that is, the blocking information confirming that the staging unit 1224 can directly determine the input blocking information according to the coverage obtained by the comparing sub-unit 1222.
  • the above input blocking unit 122 may further include:
  • the sequence subunit 1225 is configured to input the range information to the data before the comparison subunit 1222 compares the data input range in the data input range information with the start key value and the termination key value of each partition respectively. Sort the data input range in .
  • the input block information may include: inputting the number of blocks and the start key value and the terminating key value of each input block.
  • the task determining unit 123 may determine the number of Map tasks according to the number of input blocks, and each Map task corresponds to one input block.
  • the foregoing allocating unit 124 may allocate a slave processor for each Map task, and transmit the start key value and the terminating key value of the input block corresponding to the Map task to the slave processor, so that The slave processor reads data in the data table according to a start key value and a stop key value of the input block.
  • the device for optimizing data access in the embodiment of the present invention can be used as a main controller in MapReduce, and the device can be used to limit the effective input data range by multiple comparison operations of the single tube, so that a large amount of invalid data is read during data processing. , greatly improving the processing efficiency.
  • MapReduce MapReduce
  • the device for optimizing data access in the embodiment of the present invention can be used as a main controller in MapReduce, and the device can be used to limit the effective input data range by multiple comparison operations of the single tube, so that a large amount of invalid data is read during data processing. , greatly improving the processing efficiency.
  • the embodiment of the present invention further provides an apparatus for optimizing data storage, as shown in FIG. 14, which is a schematic structural diagram of the apparatus.
  • the apparatus for optimizing data storage includes:
  • the receiving unit 131 is configured to receive a request for the user to store data in the HBASE data table, where the request carries one or more data output ranges;
  • the output blocking unit 132 is configured to determine output blocking information according to the partition information of the data table and the data output range;
  • the task determining unit 133 is configured to determine a number of Reduce tasks according to the output block information, and an allocating unit 134, configured to allocate data from the processor to the data table according to the number of the Reduce tasks.
  • the output blocking unit 132 can be implemented in various manners.
  • the output blocking unit 132 can include: a partition information acquiring subunit 1321, a comparing subunit 1322, and a blocking information determining subunit 1323. among them:
  • the partition information obtaining sub-unit 1321 is configured to acquire the start of all partitions in the data table. Key value and end key value;
  • the comparing subunit 1322 is configured to compare the data output range with a start key value and a stop key value of each partition to obtain partition information covered by the data output range;
  • the blocking information determining subunit 1323 is configured to determine an output blocking message according to the partition information.
  • output blocking unit 132 may have other implementation manners, which are not limited in this embodiment of the present invention.
  • the outputting the block information may include: outputting the number of blocks.
  • the task determining unit 133 may determine the number of Reduce tasks according to the number of output blocks, and each Reduce task corresponds to one output block;
  • the allocating unit 134 may allocate a slave processor for each Reduce task, and the storage data corresponding to the Reduce task is written into the partition corresponding to the output block.
  • the device for optimizing data storage in the embodiment of the present invention can be used as a main controller in MapReduce.
  • the effective output data range can be limited by the comparison operation of the cartridge, so that the startup and destruction of a large number of invaliduce tasks are reduced during the data processing. , greatly improving the processing efficiency.

Abstract

L'invention concerne un procédé et un appareil permettant d'optimiser l'accès à des données, ainsi qu'un procédé et un appareil permettant d'optimiser le stockage de données. Le procédé permettant d'optimiser l'accès aux données comprend un contrôleur principal pour recevoir une demande d'un utilisateur d'accéder à une table de données dans HBASE, la demande comprenant des informations de plages d'entrée de données, et les informations de plages d'entrée de données comprenant une pluralité de plages d'entrée de données; déterminer des informations de blocs d'entrée conformément à des informations de région de la table de données et aux informations de plages d'entrée de données; déterminer le nombre de tâches Mapper conformément aux informations des blocs d'entrée; distribuer, conformément au nombre de tâches Mapper, les données dans la table de données lues par un processeur; et renvoyer les données lues par le processeur à l'utilisateur.
PCT/CN2011/083021 2011-11-28 2011-11-28 Procédé et appareil permettant d'optimiser l'accès à des données, et procédé et appareil permettant d'optimiser le stockage de données WO2013078583A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201180002537.6A CN102725753B (zh) 2011-11-28 2011-11-28 优化数据访问的方法及装置、优化数据存储的方法及装置
PCT/CN2011/083021 WO2013078583A1 (fr) 2011-11-28 2011-11-28 Procédé et appareil permettant d'optimiser l'accès à des données, et procédé et appareil permettant d'optimiser le stockage de données

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/083021 WO2013078583A1 (fr) 2011-11-28 2011-11-28 Procédé et appareil permettant d'optimiser l'accès à des données, et procédé et appareil permettant d'optimiser le stockage de données

Publications (1)

Publication Number Publication Date
WO2013078583A1 true WO2013078583A1 (fr) 2013-06-06

Family

ID=46950464

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/083021 WO2013078583A1 (fr) 2011-11-28 2011-11-28 Procédé et appareil permettant d'optimiser l'accès à des données, et procédé et appareil permettant d'optimiser le stockage de données

Country Status (2)

Country Link
CN (1) CN102725753B (fr)
WO (1) WO2013078583A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109195175A (zh) * 2018-09-03 2019-01-11 郑州云海信息技术有限公司 一种基于云计算的移动无线网络优化方法
WO2020034194A1 (fr) * 2018-08-17 2020-02-20 西门子股份公司 Procédé, dispositif et système de traitement de données distribuées, et support lisible par machine

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103838632B (zh) * 2012-11-21 2017-04-12 阿里巴巴集团控股有限公司 数据查询方法及装置
CN103150403A (zh) * 2013-03-28 2013-06-12 北京圆通慧达管理软件开发有限公司 数据处理系统和方法
CN103226532A (zh) * 2013-03-28 2013-07-31 北京圆通慧达管理软件开发有限公司 数据处理系统和方法
CN103198109A (zh) * 2013-03-28 2013-07-10 北京圆通慧达管理软件开发有限公司 数据处理系统和方法
CN104679590B (zh) * 2013-11-27 2018-12-07 阿里巴巴集团控股有限公司 分布式计算系统中的Map优化方法及装置
CN103646073A (zh) * 2013-12-11 2014-03-19 浪潮电子信息产业股份有限公司 一种基于HBase表的条件查询优化方法
CN104112011B (zh) * 2014-07-16 2017-09-15 深圳国泰安教育技术股份有限公司 一种海量数据提取的方法及装置
CN104252536B (zh) * 2014-09-16 2017-12-08 福建新大陆软件工程有限公司 一种基于hbase的上网日志数据查询方法及装置
CN106326309B (zh) * 2015-07-03 2020-02-21 阿里巴巴集团控股有限公司 一种数据查询方法和装置
CN106383826A (zh) * 2015-07-29 2017-02-08 阿里巴巴集团控股有限公司 数据库查询方法和装置
CN106484689B (zh) * 2015-08-24 2019-09-03 杭州华为数字技术有限公司 数据处理方法和装置
CN105183901A (zh) * 2015-09-30 2015-12-23 北京京东尚科信息技术有限公司 一种数据查询引擎读取数据库表的方法及装置
CN105956043A (zh) * 2016-04-26 2016-09-21 海尔优家智能科技(北京)有限公司 为Hbase数据库上运行的MapReduce分配Map task的方法及装置
CN106294886A (zh) * 2016-10-17 2017-01-04 北京集奥聚合科技有限公司 一种从HBase中全量抽取数据的方法及系统
CN108427747B (zh) * 2018-03-09 2021-10-15 广西师范大学 一种基于范围查询边界集的动态规划数据分片优化方法
CN109657009B (zh) * 2018-12-21 2021-03-12 北京锐安科技有限公司 数据预分区存储周期表创建方法、装置、设备和存储介质
CN110083658B (zh) * 2019-03-11 2021-05-25 北京达佳互联信息技术有限公司 数据同步方法、装置、电子设备及存储介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101957863A (zh) * 2010-10-14 2011-01-26 广州从兴电子开发有限公司 数据并行处理方法、装置及系统
KR20110069338A (ko) * 2009-12-17 2011-06-23 한국전자통신연구원 스트림 데이터에 대한 점진적인 맵리듀스 기반 분산 병렬 처리 시스템 및 방법

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20110069338A (ko) * 2009-12-17 2011-06-23 한국전자통신연구원 스트림 데이터에 대한 점진적인 맵리듀스 기반 분산 병렬 처리 시스템 및 방법
CN101957863A (zh) * 2010-10-14 2011-01-26 广州从兴电子开发有限公司 数据并行处理方法、装置及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIN QIANG: "Research and Design of RDF Storage System Based on HBase", CHINESE MASTER'S THESES FULL-TEXT DATABASE: INFORMATION SCIENCE AND TECHNOLOGY, 15 July 2011 (2011-07-15), pages 1137 - 28 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020034194A1 (fr) * 2018-08-17 2020-02-20 西门子股份公司 Procédé, dispositif et système de traitement de données distribuées, et support lisible par machine
CN109195175A (zh) * 2018-09-03 2019-01-11 郑州云海信息技术有限公司 一种基于云计算的移动无线网络优化方法
CN109195175B (zh) * 2018-09-03 2021-12-21 郑州云海信息技术有限公司 一种基于云计算的移动无线网络优化方法

Also Published As

Publication number Publication date
CN102725753A (zh) 2012-10-10
CN102725753B (zh) 2014-01-01

Similar Documents

Publication Publication Date Title
WO2013078583A1 (fr) Procédé et appareil permettant d'optimiser l'accès à des données, et procédé et appareil permettant d'optimiser le stockage de données
US11960726B2 (en) Method and apparatus for SSD storage access
CN107533551B (zh) 数据块级别的大数据统计
US20200050607A1 (en) Reassigning processing tasks to an external storage system
TWI549060B (zh) Access methods and devices for virtual machine data
JP2020038623A (ja) データを記憶するための方法、装置及びシステム
CN102307206B (zh) 基于云存储的快速访问虚拟机镜像的缓存系统的缓存方法
EP2863310B1 (fr) Procédé et appareil de traitement de données, ainsi que dispositif de stockage partagé
WO2017161540A1 (fr) Procédé d'interrogation de données, procédé de stockage d'objets de données et système de données
US10073648B2 (en) Repartitioning data in a distributed computing system
CN111258978B (zh) 一种数据存储的方法
WO2017028394A1 (fr) Procédé et appareil de récupération de données distribuées basée sur des exemples
JP2018513454A (ja) カラム・ストアにおける挿入およびポイント・クエリ・オペレーションの効率的パフォーマンス
CN110162395B (zh) 一种内存分配的方法及装置
US7509461B1 (en) Method and apparatus for intelligent buffer cache pre-emption
US11625192B2 (en) Peer storage compute sharing using memory buffer
WO2016206100A1 (fr) Procédé et appareil de gestion partitionnée pour une table de données
WO2023040348A1 (fr) Procédé de traitement de données dans un système distribué, et système associé
WO2012171363A1 (fr) Procédé et équipement destinés à une opération de données dans un système de cache réparti
CN113835613B (zh) 一种文件读取方法、装置、电子设备和存储介质
US10824640B1 (en) Framework for scheduling concurrent replication cycles
CN111061557B (zh) 均衡分布式内存数据库负载的方法和装置
US10185729B2 (en) Index creation method and system
CN108287853B (zh) 一种数据关系分析方法及其系统
US11550793B1 (en) Systems and methods for spilling data for hash joins

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201180002537.6

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11876546

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11876546

Country of ref document: EP

Kind code of ref document: A1