WO2016023372A1 - Procédé et dispositif de traitement de stockage de données - Google Patents

Procédé et dispositif de traitement de stockage de données Download PDF

Info

Publication number
WO2016023372A1
WO2016023372A1 PCT/CN2015/075302 CN2015075302W WO2016023372A1 WO 2016023372 A1 WO2016023372 A1 WO 2016023372A1 CN 2015075302 W CN2015075302 W CN 2015075302W WO 2016023372 A1 WO2016023372 A1 WO 2016023372A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
copies
copy
hbase
stored
Prior art date
Application number
PCT/CN2015/075302
Other languages
English (en)
Chinese (zh)
Inventor
杨庆平
屠趁锋
黄震江
汪峰来
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016023372A1 publication Critical patent/WO2016023372A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present invention relates to the field of communications, and in particular to a data storage processing method and apparatus.
  • Hadoop an open source big data storage and analytics platform, has become the de facto standard for the industry to handle big data.
  • the Hadoop platform consists of two important subsystems: Distributed File System (HDFS) and MapReduce (Parallel Computing Framework).
  • HDFS Distributed File System
  • MapReduce Parallel Computing Framework
  • Hadoop is a highly fault-tolerant, multi-copy distributed system for deployment on inexpensive machines, and Hadoop supports parallel data writing and reading on multiple hard drives on the machine.
  • HBASE is a distributed, column-oriented open source database based on HDFS that provides high reliability, high performance, column storage, scalable, real-time read and write database systems.
  • HBASE is an important part of the Hadoop platform ecosystem of big data analytics platform and has been widely used in the industry.
  • the mode in which HBASE is stored on HDFS is stored in a column-based mode, and each column corresponds to one or more storage files. The following describes the storage of data for HBASE.
  • the HBASE processing scheme is: when creating the HBASE table, the system uses the same number of copies for all column data to store, and the number of copies does not allow the user to set the table when setting, and can only rely on the HBASE system default setting. 3 copies. That is, all the columns in the table data are stored in 3 copies.
  • the HBASE processing table data storage scheme in the related art has the following disadvantages: high hardware cost: the same storage copy is used for all table data stored in HBASE, and for important data and non-critical data, storage The same copy, which takes up a lot of hardware costs. Data cannot be differentiated: For hot data columns, you want multiple copies to increase read speed, and now you can't differentiate to set up a storage copy for a separate data column.
  • the present invention provides a data storage processing method and apparatus, to at least solve the related art, when storing data for an HBASE processing table, the data cannot be differentially stored, and there is not only waste of storage resources but also data reading. The problem of low efficiency.
  • a data storage processing method comprising: obtaining a copy number of a stored data copy of a column family in a distributed database HBASE table for storing data, wherein each column family in the HBASE table The number of copies of the stored data copy is different; a stored copy of the data is generated based on the obtained number of copies.
  • the method before acquiring the number of copies of the stored data copy of the column family in the HBASE table for storing data, the method further includes: creating the HBASE table by using Ruby hash attribute values when establishing the HBASE table And the number of copies of the stored data copy of the column family in the HBASE table for storing data according to the number of copies corresponding to the Ruby hash attribute value.
  • the method before acquiring the number of copies of the stored data copy of the column family in the HBASE table for storing data, the method further includes: receiving the number of copies of the dynamic input.
  • the number of copies of the stored data copy of the column family in the HBASE table for storing data is obtained by at least one of: receiving a command carrying the number of copies; receiving a web carrying the number of copies Page information.
  • generating the stored copy of the data according to the obtained number of copies comprises: transferring the copy number to a HBASE data write file class when data is written; writing according to the HBASE data transfer The number of copies in the incoming file class generates the corresponding stored copy.
  • the method further comprises: reading the stored copy separately loaded according to the number of copies.
  • a data storage processing apparatus comprising: an acquisition module configured to acquire a copy number of a copy of a stored data of a column family of a distributed database HBASE table for storing data, The number of copies of the data copy of each column family in the HBASE table is different; the generating module is configured to generate a stored copy of the data according to the obtained number of copies.
  • the apparatus further includes: a creating module, configured to: when the HBASE table is created, create a copy number attribute corresponding to each column family in the HBASE table by using a Ruby hash attribute value, according to the Ruby hash attribute value The corresponding copy number attribute obtains the number of copies of the stored data copy of the column family in the HBASE table for storing data.
  • a creating module configured to: when the HBASE table is created, create a copy number attribute corresponding to each column family in the HBASE table by using a Ruby hash attribute value, according to the Ruby hash attribute value The corresponding copy number attribute obtains the number of copies of the stored data copy of the column family in the HBASE table for storing data.
  • the apparatus further comprises: a receiving module configured to receive the number of copies of the dynamic input.
  • the obtaining module comprises at least one of the following: a first receiving unit configured to receive a command carrying the number of copies; and a second receiving unit configured to receive web page information carrying the number of copies.
  • the generating module includes: a transmitting unit configured to: when the data is written, transfer the copy number to the HBASE data writing file class; and the generating unit is configured to write the file class according to the HBASE data transfer The number of copies in the generation generates the corresponding storage copy.
  • the apparatus further comprises: a reading module configured to read the stored copy separately loaded in accordance with the number of copies.
  • the number of copies of the stored data copy of the column family in the distributed database HBASE table for storing data is used, wherein the number of copies of the data copy of each column family in the HBASE table is different;
  • the number of copies generates a stored copy of the data, which not only solves the related art, but also cannot perform differential storage processing on the data when the HBASE processing table is stored, which not only wastes storage resources, but also reads data efficiently.
  • the low problem in turn, achieves a different number of copies for the HBASE column family, which realizes the differential storage of data, and can effectively reduce the storage cost without reducing the data write and read.
  • FIG. 1 is a flow chart of a data storage processing method according to an embodiment of the present invention.
  • FIG. 2 is a block diagram showing the structure of a data storage processing apparatus according to an embodiment of the present invention.
  • FIG. 3 is a block diagram 1 of a preferred structure of a data storage processing apparatus according to an embodiment of the present invention.
  • FIG. 4 is a block diagram 2 of a preferred structure of a data storage processing apparatus according to an embodiment of the present invention.
  • FIG. 5 is a block diagram showing a preferred structure of the acquisition module 22 in the data storage processing apparatus according to an embodiment of the present invention.
  • FIG. 6 is a block diagram showing a preferred structure of a generating module 24 in a data storage processing apparatus according to an embodiment of the present invention
  • FIG. 7 is a block diagram 3 of a preferred structure of a data storage processing apparatus according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a HBASE storage structure according to an embodiment of the present invention.
  • FIG. 9 is a logical view of HBASE data in accordance with an embodiment of the present invention.
  • FIG. 10 is a flow diagram of dynamically creating a HBASE multiple copy in accordance with a preferred embodiment of the present invention.
  • FIG. 1 is a flowchart of a data storage processing method according to an embodiment of the present invention. As shown in FIG. 1, the flow includes the following steps:
  • Step S102 obtaining a copy number of the stored data copy of the column family in the distributed database HBASE table for storing data, wherein the number of copies of the data copy of each column family stored in the HBASE table is different;
  • Step S104 generating a stored copy of the data according to the obtained number of copies.
  • the following processing may also be involved: when the HBASE table is created, the number of copies corresponding to each column family in the HBASE table is created by the Ruby hash attribute value.
  • the attribute obtains the number of copies of the stored data copy of the column family in the HBASE table for storing data according to the copy number attribute corresponding to the Ruby hash attribute value.
  • the above Ruby hash is The creation of the sex value can receive the number of copies of the dynamic input, and dynamically store the data according to the number of copies according to the number of copies received dynamically.
  • a plurality of methods may be used. For example, at least one of the following methods may be used.
  • the command may be used to receive the copy.
  • the number of commands can also be in the form of a web page, that is, receiving web page information carrying the number of copies.
  • a plurality of manners may also be adopted when generating a storage copy of the data according to the obtained number of copies.
  • the number of copies is transferred to the HBASE data writing file class; and the file is written according to the data transmitted to the HBASE.
  • the number of copies in the class produces a corresponding storage copy.
  • the copy of each column family is separately performed. Load read, that is, read the storage copy separately loaded according to the number of copies, each column family does not affect each other.
  • a data storage processing device is provided, which is used to implement the above-mentioned embodiments and preferred embodiments, and has not been described again.
  • the term "module” may implement a combination of software and/or hardware of a predetermined function.
  • the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • FIG. 2 is a block diagram showing the structure of a data storage processing apparatus according to an embodiment of the present invention. As shown in FIG. 2, the apparatus includes an acquisition module 22 and a generation module 24. The apparatus will be described below.
  • the obtaining module 22 is configured to obtain a copy number of the storage data copy of the column family in the distributed database HBASE table for storing data, wherein the number of copies of the data copy of each column family stored in the HBASE table is different; the generating module 24 is connected to The obtaining module 22 is configured to generate a stored copy of the data according to the obtained number of copies.
  • FIG. 3 is a block diagram of a preferred structure of a data storage processing apparatus according to an embodiment of the present invention. As shown in FIG. 3, the apparatus includes a creation module 32 in addition to all the modules shown in FIG. 32 for explanation.
  • the creating module 32 is connected to the obtaining module 22, and is configured to create a copy number attribute corresponding to each column family in the HBASE table by using a Ruby hash attribute value when the HBASE table is created, and obtain the copy number attribute corresponding to the Ruby hash attribute value.
  • FIG. 4 is a block diagram of a preferred structure of a data storage processing apparatus according to an embodiment of the present invention. As shown in FIG. 4, the apparatus includes a receiving module 42 in addition to all the modules shown in FIG. 42 for explanation.
  • the receiving module 42 is connected to the obtaining module 22 and configured to receive the number of copies of the dynamic input.
  • FIG. 5 is a block diagram of a preferred structure of the acquisition module 22 in the data storage processing device according to the embodiment of the present invention.
  • the acquisition module 22 includes at least one of the following: a first receiving unit 52 and a second receiving unit 54, The acquisition module 22 will be described below.
  • the first receiving unit 52 is configured to receive a command carrying the number of copies; the second receiving unit 54 is configured to receive the web page information carrying the number of copies.
  • FIG. 6 is a block diagram showing a preferred structure of the generating module 24 in the data storage processing apparatus according to the embodiment of the present invention.
  • the generating module 24 includes a transmitting unit 62 and a generating unit 64, and the generating module 24 is described below. .
  • the transfer unit 62 is configured to transfer the copy number to the HBASE data write file class when the data is written; the generating unit 64 is connected to the transfer unit 62, and generates a corresponding number according to the number of copies in the file class written to the HBASE data. Storage copy.
  • FIG. 7 is a block diagram 3 of a preferred structure of a data storage processing apparatus according to an embodiment of the present invention. As shown in FIG. 7, the apparatus includes a reading module 72 in addition to all the modules shown in FIG. Module 72 is taken to illustrate.
  • the reading module 72 is coupled to the generation module 24 described above and is configured to read a stored copy that is loaded separately according to the number of copies.
  • the HBASE database cannot dynamically set the number of storage copies of each column for the data storage.
  • a dynamic processing method for multiple copies of the HBASE database is provided. It mainly includes the following processing: When creating a table for HBASE, you can set the number of copies of each column. The number of copies of each column does not depend on the unified configuration. Creating a table is to support setting a different number of copies for each column, and storing it in HBASE. In the table definition, the table data is dynamically activated when it is inserted and read, and there is no need to restart the HBASE database.
  • multiple copies of the column store can be dynamically processed, independent of the default copy number set by the underlying storage, and the number of copies of each column can be dynamically processed.
  • the number of copies corresponding to each column family is defined by the Ruby hash attribute key when the table is created.
  • Step 2 When the system detects that it is necessary to separately set the number of copies for each column family, dynamically adjust the definition of the column family by HBASE, and set the copy value to the column family of HBASE.
  • Step 3 When data is written, the system dynamically transfers the number of copies corresponding to the column family to the HBASE data write file class.
  • HBASE writes to the HDFS system
  • the dynamic copy number is transferred to the HDFS, and the HDFS generates a storage copy according to the number of copies.
  • Step 4 When the data is read, the table reads that the number of copies is inconsistent, and when the column family copy is loaded, it is processed separately and does not affect each other.
  • the structure includes an HRegionServer (Distributed Storage Server) and an HDFS.
  • the HRegionServer includes one or more HRegions, and the HRegion includes an HLog and one or more Streo, which includes MemSotore and one or more StoreFiles.
  • the HDFS includes one or more DataNodes (storage nodes).
  • each column in the HBASE table corresponds to a storage file of a storage area, as shown in the figure, a copy of each column.
  • the numbers correspond to different storage files (StroeFile).
  • FIG. 10 is a flow chart of dynamically creating an HBASE multiple copy according to a preferred embodiment of the present invention. As shown in FIG. 10, the flow includes the following steps:
  • Step S1002 creating a table number
  • step S1004 it is determined whether the number of copies of the column family in the created HBASE table is defined.
  • Step S1006 determining whether the number of copies of the column family in the HBASE table is defined
  • Step S1008 parsing the package, and obtaining the number of copies of each column family in the HBASE table
  • Step S1010 Create an HBASE table according to the obtained number of copies, where the number of copies of each column family in the HBASE table is different;
  • step S1012 a corresponding copy file is created according to the number of copies of each column family in the HBASE table.
  • HBASE can create a table in the following ways:
  • the table parameter transmission can be created in real time based on the manner of the WEB page;
  • the HBASE table can also be dynamically created according to the number of copies of each column in the dynamically changed HBASE table.
  • the following describes the manner in which the HBASE dynamically creates the table. Of course, there may be other different implementation manners.
  • the HBASE column family description class supports the newly defined copy parameters
  • the copy parameters are supported in the HBASE creation table interface
  • HBASE generates a store file class to add a parameter with a copy
  • HBASE supports the copy parameter value when calling StoreFile to write the file system.
  • Step 1 The user defines the HBASE table structure, and defines the number of copies for each column;
  • Step 2 The system parses the HBASE table definition parameters, and then removes the number of copies
  • Step 3 HBASE creates a storeFile file according to the table definition parameters
  • Step 4 HBASE submits the distributed file system, and creates a corresponding file according to the number of copies of the storeFile file.
  • modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein.
  • the steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module.
  • the invention is not limited to any specific combination of hardware and software.
  • the above embodiments and preferred embodiments not only solve the related art, but also cannot perform differential storage processing on data when the HBASE processing table is stored, which not only wastes storage resources but also reads data.
  • the problem of low efficiency is that the number of copies of the HBASE column family is set differently, and the data is stored differently, and the storage cost can be effectively reduced without reducing the data writing and reading.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé et un dispositif de traitement de stockage de données. Le procédé comprend les étapes suivantes : acquérir le nombre de copies de données de stockage d'une famille de colonnes dans une table de base de données distribuée (HBASE) permettant de stocker des données, où les nombres de copies de données de stockage de diverses familles de colonnes dans la table de HBASE sont différents; et produire une copie de stockage pour des données selon le nombre de copies acquis. La présente invention résout le problème de l'état de la technique selon lequel lorsque le stockage de données de table de traitement de HBASE est effectué, on ne peut pas effectuer de traitement de stockage différencié sur les données, ce qui non seulement gaspille des ressources de stockage, mais a aussi une faible efficacité de lecture des données, ce qui permet de définir différents nombres de copies pour des familles de colonnes d'une HBASE afin d'obtenir un stockage différencié de données, et permet de réduire efficacement les coûts de stockage sans réduire l'écriture et la lecture de données.
PCT/CN2015/075302 2014-08-14 2015-03-27 Procédé et dispositif de traitement de stockage de données WO2016023372A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410401504.4 2014-08-14
CN201410401504.4A CN105335450B (zh) 2014-08-14 2014-08-14 数据存储处理方法及装置

Publications (1)

Publication Number Publication Date
WO2016023372A1 true WO2016023372A1 (fr) 2016-02-18

Family

ID=55285978

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/075302 WO2016023372A1 (fr) 2014-08-14 2015-03-27 Procédé et dispositif de traitement de stockage de données

Country Status (2)

Country Link
CN (1) CN105335450B (fr)
WO (1) WO2016023372A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159273A (zh) * 2019-12-31 2020-05-15 中国联合网络通信集团有限公司 数据流处理方法、装置、服务器及存储介质
CN113704346A (zh) * 2020-05-20 2021-11-26 杭州海康威视数字技术股份有限公司 一种Hbase表中冷热数据转换方法、装置及电子设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122364B (zh) * 2016-02-25 2021-05-18 华为技术有限公司 数据操作方法和数据管理服务器
CN111046074B (zh) * 2019-12-13 2023-09-01 北京百度网讯科技有限公司 流式数据处理方法、装置、设备和介质
CN112306421B (zh) * 2020-11-20 2021-04-30 昆易电子科技(上海)有限公司 一种用于存储分析测量数据格式mdf文件的方法和系统

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101187931A (zh) * 2007-12-12 2008-05-28 浙江大学 分布式文件系统多文件副本的管理方法
CN103838860A (zh) * 2014-03-19 2014-06-04 华存数据信息技术有限公司 一种基于动态副本策略的文件存储系统及其存储方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567495B (zh) * 2011-12-22 2013-08-21 国家电网公司 一种海量信息存储系统及实现方法
CN103905517A (zh) * 2012-12-28 2014-07-02 中国移动通信集团公司 一种数据存储方法及设备

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101187931A (zh) * 2007-12-12 2008-05-28 浙江大学 分布式文件系统多文件副本的管理方法
CN103838860A (zh) * 2014-03-19 2014-06-04 华存数据信息技术有限公司 一种基于动态副本策略的文件存储系统及其存储方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159273A (zh) * 2019-12-31 2020-05-15 中国联合网络通信集团有限公司 数据流处理方法、装置、服务器及存储介质
CN113704346A (zh) * 2020-05-20 2021-11-26 杭州海康威视数字技术股份有限公司 一种Hbase表中冷热数据转换方法、装置及电子设备
CN113704346B (zh) * 2020-05-20 2024-06-04 杭州海康威视数字技术股份有限公司 一种Hbase表中冷热数据转换方法、装置及电子设备

Also Published As

Publication number Publication date
CN105335450A (zh) 2016-02-17
CN105335450B (zh) 2020-06-05

Similar Documents

Publication Publication Date Title
JP7360395B2 (ja) 入力および出力スキーママッピング
US10614041B2 (en) Sync as a service for cloud-based applications
Macedo et al. Redis cookbook: Practical techniques for fast data manipulation
WO2016023372A1 (fr) Procédé et dispositif de traitement de stockage de données
US20080243847A1 (en) Separating central locking services from distributed data fulfillment services in a storage system
JP2016529599A (ja) コンテンツクリップボードの同期
CN108287894B (zh) 数据处理方法、装置、计算设备及存储介质
US20160088077A1 (en) Seamless binary object and metadata sync
CN107315972A (zh) 一种大数据非结构化文件动态脱敏方法及系统
Yang et al. On construction of a distributed data storage system in cloud
WO2017092384A1 (fr) Procédé et dispositif de stockage distribué de base de données groupée
CN106855861A (zh) 一种文件合并方法、装置及电子设备
WO2018094962A1 (fr) Procédé, appareil et système de migration d'autorisation sur un fichier
CN104239508A (zh) 数据查询方法和装置
JP2015180991A (ja) 画像形成装置、画像形成装置の制御方法およびプログラム
CN103414762A (zh) 云备份方法和装置
US11288003B2 (en) Cross-platform replication of logical units
US20140297953A1 (en) Removable Storage Device Identity and Configuration Information
CN105653566B (zh) 一种实现数据库写访问的方法及装置
US11429400B2 (en) User interface metadata from an application program interface
US9537941B2 (en) Method and system for verifying quality of server
WO2022121387A1 (fr) Procédé et appareil de stockage de données, serveur et support
US8566280B2 (en) Grid based replication
US20180316756A1 (en) Cross-platform replication of logical units
US20160164941A1 (en) Method for transcoding mutimedia, and cloud mulimedia transcoding system operating the same

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15831499

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15831499

Country of ref document: EP

Kind code of ref document: A1