WO2018205981A1 - Metadata management - Google Patents

Metadata management Download PDF

Info

Publication number
WO2018205981A1
WO2018205981A1 PCT/CN2018/086398 CN2018086398W WO2018205981A1 WO 2018205981 A1 WO2018205981 A1 WO 2018205981A1 CN 2018086398 W CN2018086398 W CN 2018086398W WO 2018205981 A1 WO2018205981 A1 WO 2018205981A1
Authority
WO
WIPO (PCT)
Prior art keywords
column
name
family
operation instruction
statement
Prior art date
Application number
PCT/CN2018/086398
Other languages
French (fr)
Chinese (zh)
Inventor
吴宏志
Original Assignee
新华三大数据技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 新华三大数据技术有限公司 filed Critical 新华三大数据技术有限公司
Publication of WO2018205981A1 publication Critical patent/WO2018205981A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24573Query processing with adaptation to user needs using data annotations, e.g. user-defined metadata

Definitions

  • the Hadoop Database (HBase) is a distributed, scalable, non-relational (NoSQL, Not Only SQL) database for columnar storage built on the Hadoop infrastructure.
  • NoSQL Not Only SQL
  • HBase's Table is composed of rows and columns, but HBase can store values at different times, and multiple columns can form a Column Family. Therefore, the data model defined by HBase has a large difference from the data model of the traditional relational database (the two-dimensional model composed of rows and columns), which actually defines a four-dimensional data model.
  • Meta Data is data about data.
  • metadata is data that describes the structure and method of building data in a database. It helps database administrators and database developers to easily find the information they care about. data.
  • FIG. 1 is a schematic diagram of an architecture of an HBase example of the present disclosure
  • FIG. 2 is a flow chart of an example of a method of the present disclosure
  • FIG. 3 is a hardware architecture diagram of an example of an apparatus of the present disclosure
  • FIG. 4 is a functional block diagram of an example of an apparatus of the present disclosure.
  • FIG. 5 is a schematic diagram of an architecture of an example of a web application of the present disclosure.
  • the HBase table (for simplicity of description, the table referred to as the table in the present disclosure, the tables referred to in the subsequent description all refer to the HBase table) has at least the following characteristics: 1) Each row of the table has a sortable row key and any number of The columns and columns can be dynamically increased as needed. Different rows in the same table can have different columns. 2) The data in each cell of the table can have multiple versions. By default, the version number is automatically assigned. For example, it can be the timestamp when the cell is inserted; 3) the data type is single, that is, the data in the table is a string, there is no type; 4) the size of the table is large, one table can have billions of rows, millions of columns .
  • HBase can be used better with metadata.
  • HBase can support the management of some metadata, such as the management of metadata indicating the storage location of the table.
  • HBase can not provide a unified management means, such as HBase does not support the management of the table's business description and table structure metadata.
  • the coprocessor function possessed by HBase itself is used to realize automatic collection and storage of metadata corresponding to the table structure of the table, and an external interface is provided to collect service descriptions of tables and columns, which promotes table structure and service description.
  • HBase The architecture of the HBase is as shown in FIG. 1, and includes: a client 201, a master node 202, a region server (Region Server) 203, and a coordination server (Zookeeper) 204.
  • the client 201 is an interface for the user to access the HBase.
  • the Master node 202 is connected to each Region Server 203 and is responsible for managing each Region Server 203, such as a Region for the Region Server 203, and load balancing for the Region Server 203.
  • the master node 202 can correspond to a physical device in an actual application, and the physical device can be an X86 architecture device, for example, an X86 server.
  • Each Region Server 203 maintains its assigned Regions, handling input/output (I/O) requests to these Regions.
  • the Zookeeper 204 ensures that there is only one master node 202 in the cluster at any time, stores the location of each Region Server 203, monitors the status of each Region Server 203 in real time, and notifies the master node 202 in real time, and also stores other information required for HBase operation. .
  • a metadata collection function may be deployed at the master node 202, and the master node may execute the flow shown in FIG. 2 upon receiving an operation instruction for the table.
  • the deployment of the metadata collection function may be performed in the following manner: the package for collecting metadata is deployed on the master node 202, as shown in FIG. 1, and then HBase can be invoked at a specific timing by using the observer function of the coprocessor.
  • the package is executed and implemented to realize automatic collection of metadata corresponding to the table structure.
  • the metadata management method provided by the present disclosure may include steps 300, 301, 302, and 303.
  • step 300 the master node detects an operation instruction for the table.
  • Step 301 The master node parses the operation instruction, and determines whether the operation instruction changes a table structure of the table, where the change includes adding a table, deleting a table, adding a column family, deleting a column family, adding a column, and deleting a column. At least one is performed; if yes, step 302 is performed; otherwise, step 303 is performed.
  • the operation instructions herein may include a Data Manipulation Language (DML) operation instruction, a Data Definition Language (DDL) operation instruction, and the like.
  • DML Data Manipulation Language
  • DDL Data Definition Language
  • the DDL operation instruction for the HBase may include: creating a new table, enumerating the table information, obtaining the table description, deleting the column family, deleting the table, and the like.
  • DML operation instruction for HBase it may include: adding a record, viewing a record, viewing the total number of records in the table, deleting a record, and the like.
  • operations that may affect the structure of the table include: creating a new table, deleting a column family, deleting a table, adding a record, and the like.
  • HBase does not support deleting columns, but considering the speed of HBase development, the operation of deleting columns may be implemented in the future.
  • the changes to the table structure of the table may include the following:
  • the master node can use the keyword identification to determine whether the received operation instruction changes the table structure of the table. For example, if the Master node recognizes a Drop statement (used to delete a table) or a Create statement (used to create a table) in the received operation instruction, it can be considered that the operation instruction changes the table structure; if the Master node is receiving If the Alter statement is recognized in the operation instruction (can be used to delete or add a column family), it can be considered that the operation instruction may change the table structure. It is necessary to further determine whether the Alter statement is used to delete or add a column family.
  • the operation instruction changes the table structure; if the Master node recognizes the Put statement (can be used to add a column) in the received operation instruction, it can be considered that the operation instruction may change the table structure, and the Put statement needs to be further determined. Whether to increase the column, if it is, you can determine that the operation instruction changes the table structure.
  • Step 302 When determining that the operation instruction changes the table structure of the table, the master node records the table name of the added table, the column family name of the added column family, or the column name of the added column, or deletes the table including the deleted table.
  • the Master node records the table name of the added table, the column family name of the added column family, or the column name of the added column. If the change includes deleting the table, deleting the column family, or deleting the column, the Master node deletes the record containing the table name of the deleted table, the column family name of the deleted column family, or the column name of the deleted column.
  • the master node may record the table name of the added table, the column family name of the added column family, or the column name of the added column in a pre-established two-dimensional table, and/or delete the table containing the deletion from the two-dimensional table.
  • the two-dimensional table described above can be used to reflect the table structure of the table in real time, and the two-dimensional table can include at least the following fields: table name, column family name, and column name.
  • Table 1 shows an example of the structure of the two-dimensional table.
  • the two-dimensional table can be used to quickly obtain an accurate table structure, without having to traverse all the records of the table. For example, by searching for "T1" in Table 1, you can quickly find that "T1" contains the columns A, B, and C under the table.
  • the two-dimensional table can be built in a relational database, such as Access, Oracle, etc.; or, the two-dimensional table can also be built in a text file, such as Extensible Markup Language (XML), EXCEL, etc. In the text.
  • a relational database such as Access, Oracle, etc.
  • a text file such as Extensible Markup Language (XML), EXCEL, etc. In the text.
  • the storage form of the record including the name of the table, the column family, or the column is not limited to the two-dimensional table, and may be stored in other manners such as text.
  • the master node Based on the received operation instruction, the master node correspondingly modifies the metadata related to the table structure in the stored record, for example, adding or deleting a table name, a column family name, and a column name, so that the stored record can reflect the table structure of the table in real time. .
  • the master node may record the metadata corresponding to the changed table structure into the two-dimensional table by:
  • a record containing the table name of the added table may be added to the two-dimensional table.
  • the Create statement is usually used to add a table. For example, if an operation instruction indicates that the table "T1" is created, a record as shown in the first line in Table 1 can be added.
  • the operation instruction is used to delete the table, one or more records containing the table name of the table may be deleted in the two-dimensional table.
  • the Drop statement is usually used to delete a table. For example, if an operation instruction instructs deletion of the table "T1", the records shown in the first row, the third row, the fourth row, and the fifth row can be deleted in Table 1 shown above.
  • a record can be added to the two-dimensional table, including the column family name of the added column family and the table name of the table in which the column family is located.
  • the Alter statement is usually used to increase the column family. For example, if an operation instruction indicates that the column family "F1" is created in the table "T2", then a record as shown in the second row in Table 1 can be added.
  • the record containing the column family name of the specified column family and the table name of the table to which the specified column family belongs may be deleted in the two-dimensional table.
  • the Alter statement is usually used to delete the column family in the table. For example, if an operation instruction instructs deletion of the column family "F1" in the table "T1", the records shown in the third row and the fourth row can be deleted in Table 1 shown above.
  • the operation instruction is used to add a column under the specified column family of the specified table, you can add a record to the two-dimensional table, the record includes the column name of the added column, the table name of the table in which the column is located, and the column of the column The family name of the family.
  • the Put statement is usually used to increase the column. For example, if an operation instruction indicates that the column "A” is added under the column family "F1" of the table "T1", the record as shown in the third row in Table 1 can be added.
  • the column name containing the specified column, the column family name of the column family to which the specified column belongs, and the column to which the specified column belongs may be deleted in the two-dimensional table.
  • a record of the table name of the table For example, if an operation instruction instructs deletion of the column "A" in the column family "F1" of the table "T1", the record shown in the third row can be deleted in Table 1 shown above.
  • the two-dimensional table provided by the present disclosure may include a service description field of a table, a service description field of a column family, a service description field of a column, a creation time field of a table, in addition to a field for describing a table structure of the table, The creation time field of the column family, the creation time field of the column, and so on.
  • the business description fields of the table, column family, and column are used to annotate the business meaning of the table, column family, and column, respectively. For example, by the example in Table 2 below, it can be known that the column name "A" represents the student name.
  • the creation time fields for tables, column families, and columns are used to create time for annotation tables, column families, and columns, respectively, which can be accurate to milliseconds.
  • HBase can provide business descriptions of external interface collection tables, column families, and columns, and store them in a two-dimensional table; and can determine the creation time of tables, column families, and columns through local system time, and store them in a two-
  • T1 F1 A indicates the student's name yyyy-MM-dd-hh-mm-ss-SSS T1 F1 B B indicates student gender yyyy-MM-dd-hh-mm-ss-SSS T1 F2 C C indicates the age of the student yyy-MM-dd-hh-mm-ss-SSS
  • step 303 the master node does not perform any processing when it is determined that the operation instruction does not change the table structure of the table.
  • the Update statement in the SQL language is used to update the value.
  • the table structure of the table is not changed. If the operation command received by the Master node includes an Update statement, the metadata collection procedure can be directly ended. The package is called without updating the two-dimensional table.
  • the master node may implement the method shown in FIG. 2 in an asynchronous manner, that is, the master node performs step 300 to step 303 on the one hand, and normally responds to the operation instruction on the one hand, according to the action processing table indicated by the operation instruction. This can not affect the normal operation of HBase.
  • the stored metadata can also be presented and maintained.
  • a metabase a database
  • the way to present and maintain HBase metadata through a web application is:
  • the architecture of the Web application is shown in Figure 5. It can include the data access layer. , business interface and implementation layer, REST (Representational State Transfer) API (Application Programming Interface) and front-end view layer.
  • the web application can interact with the metabase via JDBC (Java Data Base Connectivity).
  • the data access layer of the web application can read the stored HBase metadata from the metadata database; the service interface and the implementation layer can be responsible for scheduling the data access layer and the REST API, and assembling the assembled metadata in the data access layer. Passed between the REST API; the front-end view layer gets the assembled metadata by calling the REST API and presenting them in an orderly way on the web page. You can add a service description to a specified table or column in a two-dimensional table through an interface provided by a web application, such as a page, or you can ambiguously query related metadata according to a business description or a column name or a table name.
  • the web application can manage the metadata in the form of an existing management file, which will not be described in detail herein.
  • the present disclosure utilizes the coprocessor function provided by HBase to realize automatic collection and storage of metadata corresponding to the table structure of the table, and can quickly learn a certain kind of metadata based on the stored metadata.
  • the table structure of a table eliminates the need to traverse all the records of the table, realizing the transformation of such metadata from unmanageable to manageable.
  • the present disclosure also provides an interface that can add a service description to a specified table or column, increases the availability of HBase, and reduces the development difficulty of an HBase-based application.
  • the following describes how the present disclosure manages the metadata corresponding to the table structure of the table through a two-dimensional table through a specific embodiment.
  • the relational database creates a relational database as a metabase, and create a two-dimensional table in the metabase for storing the metadata corresponding to the table structure of the table.
  • the two-dimensional table includes fields such as table name, column family name, column name, and column creation time.
  • the two-dimensional table is an empty table.
  • the master node receives an operation instruction indicating that the table "T1" is created in HBase, the column family “F1” is created in the table “T1”, and the column “A” is created under the column family "F1". "and "B”.
  • the record related to the operation instruction can be added to the two-dimensional table created in advance, as shown in Table 3 below.
  • the master node receives an operation instruction indicating that data is added to the "F1" column family members "A” and “B” of the "F1" row key of the table "T1", respectively.
  • the operation instruction since the operation instruction does not change the table structure of the table, the two-dimensional table is not updated, and the two-dimensional table at this time is still as shown in Table 3.
  • the master node receives an operation instruction indicating that data is added to the "F1" column family members "A", “B", and “C” of the "F1” row key of the table "T1”, respectively.
  • the table structure contained in the operation instruction With the table structure contained in Table 3, it can be seen that the column “C” contained in the operation instruction does not appear in Table 3, that is, the operation instruction substantially changes the table structure, so that the table can be "
  • the column “C” added under the column family "F1" of T1" is synchronized to the two-dimensional table, and the updated two-dimensional table is as shown in Table 4 below.
  • the master node receives an operation instruction indicating the deletion of the column family "F1" under the table "T1". Since the operation instruction changes the table structure, the three records of the column family name "F1" can be deleted in the two-dimensional table shown in Table 4, that is, the three records in Table 4 are deleted.
  • the method of the present disclosure has been described above.
  • the apparatus of the present disclosure will now be described, which has the function of implementing the above method.
  • the functions may be implemented by hardware or by corresponding software implemented by hardware.
  • the hardware or software includes one or more modules or units corresponding to the functions described above.
  • FIG. 3 is a schematic diagram of a hardware structure of a master node according to the disclosure.
  • the Master node can include a processor 601, a machine readable storage medium 602 that stores machine executable instructions.
  • the processor 601 is in communication with a machine readable storage medium 602 that can perform the metadata management methods described above by reading and executing machine executable instructions in the machine readable storage medium 602.
  • the present disclosure also provides a metadata management apparatus, which may be included in a Master node as shown in FIG.
  • the device can include the following units:
  • An instruction detecting unit 500 configured to detect an operation instruction for the table
  • the instruction parsing unit 501 is configured to parse an operation instruction for the table, and determine whether the operation instruction changes a table structure of the table; the change includes adding a table, deleting a table, adding a column family, deleting a column family, adding a column, and Delete at least one of the columns; and
  • the metadata collection unit 502 is configured to: when the instruction parsing unit 501 determines that the operation instruction changes the table structure of the table, record the table name of the added table, the column family name of the added column family, or the added column. Column name or delete the record containing the table name of the deleted table, the column family name of the deleted column family, or the column name of the deleted column. Specifically, if the change includes adding a table, adding a column family, or adding a column, the metadata collection unit 502 records the table name of the added table, the column family name of the added column family, or the column name of the added column. If the change includes deleting the table, deleting the column family, or deleting the column, the metadata collection unit 502 deletes the record including the table name of the deleted table, the column family name of the deleted column family, or the column name of the deleted column.
  • the instruction parsing unit 501 determines, when the operation instruction satisfies at least one of the following conditions, that the operation instruction changes a table structure of the table: 1) the operation instruction includes a Drop statement; 2) The operation instruction includes a Create statement; 3) the operation instruction includes an Alter statement, and the Alter statement is used to add a column family and/or delete a column family; 4) the operation instruction includes a Put statement, and the Put statement Used to add columns.
  • the metadata collection unit 502 records the table name of the added table, the column family name of the added column family, or the column name of the added column in a pre-established two-dimensional table, and/or from the two-dimensional
  • a record that contains the table name of the deleted table, the column family name of the deleted column family, or the column name of the deleted column contains records for the name of the table, column family, or column.
  • the two-dimensional table is located in a relational database or text file, and the two-dimensional table includes the following fields: a table name, a column family name, and a column name.
  • the metadata collection unit 502 can synchronize the changed table structure metadata into a pre-established two-dimensional table by the following steps:
  • the operation instruction includes a Create statement, adding one or more records to the two-dimensional table, the one or more records including the table name of the table indicated by the Create statement;
  • the operation instruction includes an Alter statement and the Alter statement is used to delete a column family, deleting one or more records from the two-dimensional table, the one or more records including the Alter statement indicating deletion
  • the operation instruction includes an Alter statement and the Alter statement is used to add a column family, adding one or more records to the two-dimensional table, the one or more records including the Alter statement indicating an increase The column family name of the column family and the table name of the table to which the column family belongs;
  • the operation instruction includes a Put statement and the Put statement is used to add a column, adding one or more records to the two-dimensional table, the one or more records including the Put statement indicating an increase
  • the two-dimensional table may further include at least one of the following fields: a service description of the table, a service description of the column family, a service description of the column, a creation time of the table, a creation time of the column family, and a creation time of the column. .
  • the metadata management apparatus may further include: an instruction execution unit, configured to process the table according to the action indicated by the operation instruction while the instruction parsing unit 501 parses the operation instruction.
  • an instruction execution unit configured to process the table according to the action indicated by the operation instruction while the instruction parsing unit 501 parses the operation instruction.
  • the present disclosure can be a method, apparatus, program, and machine readable storage medium.
  • the machine readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device.
  • the machine-readable storage medium can be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • machine-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, for example, with instructions stored thereon A raised structure in the hole card or groove, and any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM erasable programmable read only memory
  • flash memory flash memory
  • SRAM static random access memory
  • CD-ROM compact disk read only memory
  • DVD digital versatile disk
  • memory stick floppy disk
  • mechanical encoding device for example, with instructions stored thereon
  • a machine-readable storage medium as used herein is not to be interpreted as a transient signal itself, such as radio waves or other free-propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (eg, light pulses through a fiber optic cable), or through wires The electrical signal transmitted.
  • the machine-executable instructions described herein can be downloaded from a machine-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or network interface in each computing/processing device can execute instructions from the network receiver and forward the machine executable instructions for storage in a machine readable storage medium in each computing/processing device.
  • Machine-executable instructions for performing the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine related instructions, microcode, firmware instructions, state setting data, or in one or more programming languages.
  • the machine executable instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer, partly on the remote computer, or entirely on the remote computer or server. .
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or wide area network (WAN), or can be connected to an external computer (eg, using an Internet service provider to access the Internet) connection).
  • the electronic circuitry can be customized by utilizing state information of machine executable instructions, such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA).
  • the machine can execute instructions to implement various aspects of the present disclosure.
  • the machine executable instructions can be provided to a general purpose computer, a special purpose computer, or a processor of other programmable data processing apparatus to produce a machine such that when executed by a processor of a computer or other programmable data processing apparatus.
  • the machine executable instructions can also be stored in a machine readable storage medium that causes the computer, programmable data processing apparatus, and/or other apparatus to operate in a particular manner, such that the machine readable storage medium storing the instructions includes An article of manufacture that includes instructions for implementing various aspects of the functions/acts recited in one or more of the flowcharts.
  • the machine executable instructions can also be loaded onto a computer, other programmable data processing device, or other device to perform a series of operational steps on a computer, other programmable data processing device or other device to produce a computer implemented process.
  • instructions executed on a computer, other programmable data processing apparatus, or other device implement the functions/acts recited in one or more blocks of the flowcharts and/or block diagram.
  • each block in the flowchart or block diagram can represent a module, a program segment, or a portion of an instruction that includes one or more components for implementing the specified logical functions.
  • Executable instructions can also occur in a different order than those illustrated in the drawings. For example, two consecutive blocks may be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or function. Or it can be implemented by a combination of dedicated hardware and computer instructions.

Abstract

A master node in an HBase architecture detects an operating command for a table; the master node parses the operating command and determines whether the operating command changes the table structure of the table; the change comprises at least one of adding a table, deleting a table, adding a column family, deleting a column family, adding a column, and deleting a column; when determining that the operating command changes the table structure of the table and that the change comprises adding a table, adding a column family, or adding a column, the master node records the table name of the added table, the column family name of the added column family, or the column name of the added column; and when determining that the operating command changes the table structure of the table and that the change comprises deleting a table, deleting a column family, or deleting a column, the master node deletes the records comprising the table name of the deleted table, the column family name of the deleted column family, or the column name of the deleted column.

Description

元数据管理Metadata management
相关申请的交叉引用Cross-reference to related applications
本公开基于并要求2017年5月11日递交的中国专利申请201710331521.9的优先权,其所有内容通过引用包含于此。The present disclosure is based on and claims the priority of the Chinese Patent Application No.
背景技术Background technique
Hadoop数据库(Hadoop Database,HBase)是一种列式存储的分布式、可伸缩的非关系型(NoSQL,Not Only SQL)数据库,它构建在Hadoop基础设施上。和传统的关系型数据库一样,HBase的表(Table)是由行和列组成的,但HBase同一列可以存储不同时刻的值,同时多个列可以组成一个列族(Column Family)。所以HBase所定义的数据模型与传统关系型数据库的数据模型(行与列构成的二维模型)有比较大的差异,它实际上定义了一个四维的数据模型。The Hadoop Database (HBase) is a distributed, scalable, non-relational (NoSQL, Not Only SQL) database for columnar storage built on the Hadoop infrastructure. Like a traditional relational database, HBase's Table is composed of rows and columns, but HBase can store values at different times, and multiple columns can form a Column Family. Therefore, the data model defined by HBase has a large difference from the data model of the traditional relational database (the two-dimensional model composed of rows and columns), which actually defines a four-dimensional data model.
元数据(Meta Data)是关于数据的数据,在数据库系统中,元数据是描述数据库内数据的结构和建立方法的数据,有助于数据库管理员和数据库的开发人员方便地找到他们所关心的数据。Meta Data is data about data. In a database system, metadata is data that describes the structure and method of building data in a database. It helps database administrators and database developers to easily find the information they care about. data.
附图说明DRAWINGS
图1是本公开的HBase示例的架构示意图;1 is a schematic diagram of an architecture of an HBase example of the present disclosure;
图2是本公开的方法示例的流程图;2 is a flow chart of an example of a method of the present disclosure;
图3是本公开的装置示例的硬件架构图;3 is a hardware architecture diagram of an example of an apparatus of the present disclosure;
图4是本公开的装置示例的功能模块框图;以及4 is a functional block diagram of an example of an apparatus of the present disclosure;
图5是本公开的Web应用程序示例的架构示意图。FIG. 5 is a schematic diagram of an architecture of an example of a web application of the present disclosure.
具体实施方式detailed description
下面结合附图和具体实施例对本公开进行详细描述。The present disclosure is described in detail below with reference to the drawings and specific embodiments.
HBase表(为简化描述,本公开中简称为表,后续描述中涉及的表均指HBase表)的表结构至少具有以下特点:1)表的每行都有一个可排序的行键和任意多的列,列可以根据需要动态的增加,同一张表中不同的行可以有不 同的列;2)表的每个单元格(cell)中的数据可以有多个版本,默认情况下版本号自动分配,例如可以是单元格插入时的时间戳;3)数据类型单一,即表中的数据都是字符串,没有类型;4)表的规模大,一个表可以有数十亿行、上百万列。The HBase table (for simplicity of description, the table referred to as the table in the present disclosure, the tables referred to in the subsequent description all refer to the HBase table) has at least the following characteristics: 1) Each row of the table has a sortable row key and any number of The columns and columns can be dynamically increased as needed. Different rows in the same table can have different columns. 2) The data in each cell of the table can have multiple versions. By default, the version number is automatically assigned. For example, it can be the timestamp when the cell is inserted; 3) the data type is single, that is, the data in the table is a string, there is no type; 4) the size of the table is large, one table can have billions of rows, millions of columns .
利用元数据可以更好地使用HBase,目前HBase可以支持对部分元数据的管理,比如对指示表存储位置的元数据的管理。但也有部分的元数据,HBase还不能提供统一的管理手段,比如HBase暂不支持对表的业务描述和表结构这两类元数据的管理。HBase can be used better with metadata. Currently HBase can support the management of some metadata, such as the management of metadata indicating the storage location of the table. However, there are some metadata, HBase can not provide a unified management means, such as HBase does not support the management of the table's business description and table structure metadata.
在本公开中,利用HBase本身具有的协处理器功能实现对表的表结构对应的元数据的自动采集和存储,并提供外部接口收集对表和列的业务描述,促使了表结构和业务描述这两类元数据从无法管理到被管理的变革。In the present disclosure, the coprocessor function possessed by HBase itself is used to realize automatic collection and storage of metadata corresponding to the table structure of the table, and an external interface is provided to collect service descriptions of tables and columns, which promotes table structure and service description. These two types of metadata range from unmanageable to managed change.
如下先对HBase进行介绍。HBase的架构如图1所示,包括:客户端(Client)201、主(Master)节点202、区域服务器(Region Server)203和协调服务器(Zookeeper)204。Introduce HBase as follows. The architecture of the HBase is as shown in FIG. 1, and includes: a client 201, a master node 202, a region server (Region Server) 203, and a coordination server (Zookeeper) 204.
其中,Client 201为用户访问HBase的接口。The client 201 is an interface for the user to access the HBase.
Master节点202与每个Region Server 203连接,负责管理各个Region Server 203,如为Region Server 203分配区域(Region)、负责Region server 203的负载均衡等。Master节点202在实际应用中可以对应一台物理设备,该物理设备可以是一台X86架构设备,例如可以是X86服务器。The Master node 202 is connected to each Region Server 203 and is responsible for managing each Region Server 203, such as a Region for the Region Server 203, and load balancing for the Region Server 203. The master node 202 can correspond to a physical device in an actual application, and the physical device can be an X86 architecture device, for example, an X86 server.
各个Region Server 203维护其被分配的区域(Region),处理对这些Region的输入/输出(I/O)请求。Each Region Server 203 maintains its assigned Regions, handling input/output (I/O) requests to these Regions.
Zookeeper 204通过选举,保证任何时候集群中只有一个Master节点202,存储各个Region Server 203的位置,实时监控各个Region Server 203的状态并实时通知给Master节点202,以及还存储HBase运行所需的其它信息。The Zookeeper 204 ensures that there is only one master node 202 in the cluster at any time, stores the location of each Region Server 203, monitors the status of each Region Server 203 in real time, and notifies the master node 202 in real time, and also stores other information required for HBase operation. .
在本公开中,可以在Master节点202部署元数据采集功能,Master节点可以在接收到针对表的操作指令时,执行图2所示的流程。其中,关于元数据采集功能的部署可以采用如下的方式:将采集元数据的程序包部署在Master节点202上,如图1所示,进而HBase可以利用协处理器的观察者功能在特定时机调用该程序包并执行,实现对表结构所对应的元数据的自动采集。In the present disclosure, a metadata collection function may be deployed at the master node 202, and the master node may execute the flow shown in FIG. 2 upon receiving an operation instruction for the table. The deployment of the metadata collection function may be performed in the following manner: the package for collecting metadata is deployed on the master node 202, as shown in FIG. 1, and then HBase can be invoked at a specific timing by using the observer function of the coprocessor. The package is executed and implemented to realize automatic collection of metadata corresponding to the table structure.
参见图2,本公开提供的元数据管理方法可包括步骤300、301、302和303。Referring to FIG. 2, the metadata management method provided by the present disclosure may include steps 300, 301, 302, and 303.
步骤300,Master节点检测针对表的操作指令。In step 300, the master node detects an operation instruction for the table.
步骤301,Master节点解析该操作指令,并判断所述操作指令是否对表的表结构进行更改,所述更改包括增加表、删除表、增加列族、删除列族、增加列和删除列中的至少一种;若是,则执行步骤302;否则,执行步骤303。Step 301: The master node parses the operation instruction, and determines whether the operation instruction changes a table structure of the table, where the change includes adding a table, deleting a table, adding a column family, deleting a column family, adding a column, and deleting a column. At least one is performed; if yes, step 302 is performed; otherwise, step 303 is performed.
这里的操作指令可以包括数据操纵语言(Data Manipulation Language,DML)操作指令、数据定义语言(Data Definition Language,DDL)操作指令等。The operation instructions herein may include a Data Manipulation Language (DML) operation instruction, a Data Definition Language (DDL) operation instruction, and the like.
其中,针对HBase的DDL操作指令,可以包括:创建新表、列举表信息、获取表描述、删除列族、删除表等操作。The DDL operation instruction for the HBase may include: creating a new table, enumerating the table information, obtaining the table description, deleting the column family, deleting the table, and the like.
关于针对HBase的DML操作指令,可以包括:添加记录、查看记录、查看表中的记录总数、删除记录等操作。Regarding the DML operation instruction for HBase, it may include: adding a record, viewing a record, viewing the total number of records in the table, deleting a record, and the like.
基于上述对DDL操作指令和DML操作指令的描述,其中有可能会影响表的结构的操作包括:创建新表、删除列族、删除表、添加记录等。目前HBase暂不支持删除列,但是考虑到HBase的发展速度,删除列的操作也许可以在未来实现。Based on the above description of the DDL operation instruction and the DML operation instruction, operations that may affect the structure of the table include: creating a new table, deleting a column family, deleting a table, adding a record, and the like. Currently HBase does not support deleting columns, but considering the speed of HBase development, the operation of deleting columns may be implemented in the future.
基于此,对表的表结构发生更改的情况可以包括以下几种:Based on this, the changes to the table structure of the table may include the following:
1)增加了新的表;1) Added a new table;
2)在表中增加了新的列族;2) A new column family has been added to the table;
3)在表的列族下增加了新的列;3) A new column is added under the column family of the table;
4)删除了表;4) deleted the table;
5)在表中删除了已有的列族;5) Delete the existing column family in the table;
6)在表的列族下删除了已有的列。6) Delete the existing column under the column family of the table.
Master节点可以通过关键字识别,来判断收到的操作指令是否对表的表结构进行更改。例如,如果Master节点在收到的操作指令中识别到了Drop语句(用于删除表)或Create语句(用于创建表),则可以认为该操作指令更改了表结构;如果Master节点在收到的操作指令中识别到了Alter语句(能够用于删除或增加列族),则可以认为该操作指令有可能会更改表结构,需要进一步判断该Alter语句是否用于删除或增加列族,如果是则可以确定该操作指令更改了表结构;如果Master节点在收到的操作指令中识别到了Put语句(能够用于增加列),则可以认为该操作指令有可能会更改表结构,需要进一步 判断该Put语句是否用于增加列,如果是则可以确定该操作指令更改了表结构。The master node can use the keyword identification to determine whether the received operation instruction changes the table structure of the table. For example, if the Master node recognizes a Drop statement (used to delete a table) or a Create statement (used to create a table) in the received operation instruction, it can be considered that the operation instruction changes the table structure; if the Master node is receiving If the Alter statement is recognized in the operation instruction (can be used to delete or add a column family), it can be considered that the operation instruction may change the table structure. It is necessary to further determine whether the Alter statement is used to delete or add a column family. If yes, It is determined that the operation instruction changes the table structure; if the Master node recognizes the Put statement (can be used to add a column) in the received operation instruction, it can be considered that the operation instruction may change the table structure, and the Put statement needs to be further determined. Whether to increase the column, if it is, you can determine that the operation instruction changes the table structure.
步骤302,Master节点在确定上述操作指令对表的表结构进行更改时,记录增加的表的表名、增加的列族的列族名或增加的列的列名或者删除包含删除的表的表名、删除的列族的列族名或删除的列的列名的记录。具体地,如果更改包括增加表、增加列族或增加列,则Master节点记录增加的表的表名、增加的列族的列族名或增加的列的列名。如果更改包括删除表、删除列族或删除列,则Master节点删除包含删除的表的表名、删除的列族的列族名或删除的列的列名的记录。Step 302: When determining that the operation instruction changes the table structure of the table, the master node records the table name of the added table, the column family name of the added column family, or the column name of the added column, or deletes the table including the deleted table. The name, the column family name of the deleted column family, or the record of the column name of the deleted column. Specifically, if the change includes adding a table, adding a column family, or adding a column, the Master node records the table name of the added table, the column family name of the added column family, or the column name of the added column. If the change includes deleting the table, deleting the column family, or deleting the column, the Master node deletes the record containing the table name of the deleted table, the column family name of the deleted column family, or the column name of the deleted column.
Master节点可以将增加的表的表名、增加的列族的列族名或增加的列的列名记录在预先建立的二维表中,和/或从该二维表中删除包含删除的表的表名、删除的列族的列族名或删除的列的列名的记录。以上所述的二维表可以用于实时反映表的表结构,该二维表至少可以包括以下字段:表名、列族名和列名。表1示出了该二维表的一种结构示例。利用该二维表可以快速获得表准确的表结构,无需再去遍历表的全部记录。例如,通过在表1中搜索“T1”,可以快速查找到“T1”这张表下包含的列有A、B、C。The master node may record the table name of the added table, the column family name of the added column family, or the column name of the added column in a pre-established two-dimensional table, and/or delete the table containing the deletion from the two-dimensional table. The table name, the column family name of the deleted column family, or the record of the column name of the deleted column. The two-dimensional table described above can be used to reflect the table structure of the table in real time, and the two-dimensional table can include at least the following fields: table name, column family name, and column name. Table 1 shows an example of the structure of the two-dimensional table. The two-dimensional table can be used to quickly obtain an accurate table structure, without having to traverse all the records of the table. For example, by searching for "T1" in Table 1, you can quickly find that "T1" contains the columns A, B, and C under the table.
表1 二维表示例Table 1 Two-dimensional representation
表名Table Name 列族名Column family name 列名Column name
T1T1    
T2T2 F1F1  
T1T1 F1F1 AA
T1T1 F1F1 BB
T1T1 F2F2 CC
该二维表可以建立在关系型数据库中,如Access、Oracle等数据库中;或者,该二维表也可以建立在文本文件中,如可扩展标记语言(Extensible Markup Language,XML)、EXCEL等格式的文本中。The two-dimensional table can be built in a relational database, such as Access, Oracle, etc.; or, the two-dimensional table can also be built in a text file, such as Extensible Markup Language (XML), EXCEL, etc. In the text.
此外,包含表、列族或列的名称的记录的存储形式不仅限于二维表,还可以以例如文本等的其他方式进行存储。Master节点基于接收到的操作指令,对应地修改所存储的记录中的表结构相关的元数据,例如增加或删除表名、 列族名和列名,使得所存储的记录能够实时反映表的表结构。Further, the storage form of the record including the name of the table, the column family, or the column is not limited to the two-dimensional table, and may be stored in other manners such as text. Based on the received operation instruction, the master node correspondingly modifies the metadata related to the table structure in the stored record, for example, adding or deleting a table name, a column family name, and a column name, so that the stored record can reflect the table structure of the table in real time. .
在判断出接收的操作指令对表的表结构进行更改后,步骤302中Master节点可通过以下方式将更改的表结构对应的元数据记录到上述二维表中:After determining that the received operation instruction changes the table structure of the table, in step 302, the master node may record the metadata corresponding to the changed table structure into the two-dimensional table by:
1)如果操作指令用于增加表,则可以在二维表中添加一条包含了该增加的表的表名的记录。实际应用中,通常用Create语句来增加表。例如,如果某操作指令指示创建表“T1”,则可以添加一条如表1中的第一行所示的记录。1) If the operation instruction is used to add a table, a record containing the table name of the added table may be added to the two-dimensional table. In practical applications, the Create statement is usually used to add a table. For example, if an operation instruction indicates that the table "T1" is created, a record as shown in the first line in Table 1 can be added.
2)如果操作指令用于删除表,则可以在二维表中删除包含该表的表名的一条或多条记录。实际应用中,通常用Drop语句来删除表。例如,如果某操作指令指示删除表“T1”,则可以在以上示出的表1中删除第一行、第三行、第四行和第五行所示的记录。2) If the operation instruction is used to delete the table, one or more records containing the table name of the table may be deleted in the two-dimensional table. In practical applications, the Drop statement is usually used to delete a table. For example, if an operation instruction instructs deletion of the table "T1", the records shown in the first row, the third row, the fourth row, and the fifth row can be deleted in Table 1 shown above.
3)如果操作指令用于在指定的表中增加列族,则可以在二维表中添加一条记录,该记录包括增加的列族的列族名、以及该列族所在表的表名。实际应用中,通常用Alter语句来增加列族。例如,如果某操作指令指示在表“T2”中创建列族“F1”时,则可以添加一条如表1中的第二行所示的记录。3) If the operation instruction is used to add a column family to the specified table, a record can be added to the two-dimensional table, including the column family name of the added column family and the table name of the table in which the column family is located. In practical applications, the Alter statement is usually used to increase the column family. For example, if an operation instruction indicates that the column family "F1" is created in the table "T2", then a record as shown in the second row in Table 1 can be added.
4)如果操作指令用于删除指定表中的指定列族,则可以在二维表中删除包含该指定列族的列族名和该指定列族所属表的表名的记录。实际应用中,通常用Alter语句来删除表中的列族。例如,如果某操作指令指示删除表“T1”中的列族“F1”,则可以在以上示出的表1中删除第三行和第四行所示的记录。4) If the operation instruction is used to delete the specified column family in the specified table, the record containing the column family name of the specified column family and the table name of the table to which the specified column family belongs may be deleted in the two-dimensional table. In practical applications, the Alter statement is usually used to delete the column family in the table. For example, if an operation instruction instructs deletion of the column family "F1" in the table "T1", the records shown in the third row and the fourth row can be deleted in Table 1 shown above.
5)如果操作指令用于在指定表的指定列族下增加列,则可以在二维表中增加一条记录,该记录包括增加的列的列名、该列所在表的表名和该列所在列族的列族名。实际应用中,通常用Put语句来增加列。例如,如果某操作指令指示在表“T1”的列族“F1”下增加列“A”,则可以添加如表1中的第三行所示的记录。5) If the operation instruction is used to add a column under the specified column family of the specified table, you can add a record to the two-dimensional table, the record includes the column name of the added column, the table name of the table in which the column is located, and the column of the column The family name of the family. In practical applications, the Put statement is usually used to increase the column. For example, if an operation instruction indicates that the column "A" is added under the column family "F1" of the table "T1", the record as shown in the third row in Table 1 can be added.
6)如果操作指令用于删除指定表中指定列族下的指定列,则可以在二维表中删除包含该指定列的列名、该指定列所属列族的列族名及该指定列所属表的表名的记录。例如,如果某操作指令指示删除表“T1”的列族“F1”中的列“A”,则可以在以上示出的表1中删除第三行所示的记录。6) If the operation instruction is used to delete the specified column under the specified column family in the specified table, the column name containing the specified column, the column family name of the column family to which the specified column belongs, and the column to which the specified column belongs may be deleted in the two-dimensional table. A record of the table name of the table. For example, if an operation instruction instructs deletion of the column "A" in the column family "F1" of the table "T1", the record shown in the third row can be deleted in Table 1 shown above.
本公开提供的二维表除了可以包含用于描述表的表结构的字段之外,还 可以包括表的业务描述字段、列族的业务描述字段、列的业务描述字段、表的创建时间字段、列族的创建时间字段、列的创建时间字段等。表、列族和列的业务描述字段分别用于注释表、列族和列的业务含义,比如通过下表2的示例,可以知晓列名“A”表示学生姓名。表、列族和列的创建时间字段分别用于注释表、列族和列的创建时间,该创建时间可以精确到毫秒。HBase可以提供外部接口收集表、列族和列的业务描述,并存储到二维表中;以及可以通过本地的系统时间确定表、列族和列的创建时间,并存储到二维表中。The two-dimensional table provided by the present disclosure may include a service description field of a table, a service description field of a column family, a service description field of a column, a creation time field of a table, in addition to a field for describing a table structure of the table, The creation time field of the column family, the creation time field of the column, and so on. The business description fields of the table, column family, and column are used to annotate the business meaning of the table, column family, and column, respectively. For example, by the example in Table 2 below, it can be known that the column name "A" represents the student name. The creation time fields for tables, column families, and columns are used to create time for annotation tables, column families, and columns, respectively, which can be accurate to milliseconds. HBase can provide business descriptions of external interface collection tables, column families, and columns, and store them in a two-dimensional table; and can determine the creation time of tables, column families, and columns through local system time, and store them in a two-dimensional table.
表2 二维表示例Table 2 Two-dimensional representation
表名Table Name 列族名Column family name 列名Column name 列的业务描述Column business description 列的创建时间Column creation time
T1T1 F1F1 AA A表示学生姓名A indicates the student's name yyyy-MM-dd-hh-mm-ss-SSSyyyy-MM-dd-hh-mm-ss-SSS
T1T1 F1F1 BB B表示学生性别B indicates student gender yyyy-MM-dd-hh-mm-ss-SSSyyyy-MM-dd-hh-mm-ss-SSS
T1T1 F2F2 CC C表示学生年龄C indicates the age of the student yyyy-MM-dd-hh-mm-ss-SSSyyyy-MM-dd-hh-mm-ss-SSS
步骤303,Master节点在确定上述操作指令未对表的表结构进行更改时,不作任何处理。In step 303, the master node does not perform any processing when it is determined that the operation instruction does not change the table structure of the table.
例如,SQL语言中的Update语句用于对值的更新,一般不会更改表的表结构,如果Master节点收到的操作指令中包含了Update语句,则可以直接结束本次对该元数据采集程序包的调用,无需更新二维表。For example, the Update statement in the SQL language is used to update the value. Generally, the table structure of the table is not changed. If the operation command received by the Master node includes an Update statement, the metadata collection procedure can be directly ended. The package is called without updating the two-dimensional table.
可选的,Master节点可以采用异步的方式来实现图2所示的方法,即Master节点一方面执行步骤300至步骤303,一方面正常响应该操作指令,根据该操作指令指示的动作处理表。这样可以不影响HBase的正常运行。Optionally, the master node may implement the method shown in FIG. 2 in an asynchronous manner, that is, the master node performs step 300 to step 303 on the one hand, and normally responds to the operation instruction on the one hand, according to the action processing table indicated by the operation instruction. This can not affect the normal operation of HBase.
本申请中,采集并存储元数据之后,还可以对存储的元数据进行呈现和维护。以二维表存储在数据库(以下简称元数据库)中为例,通过Web应用程序的方式来呈现和维护HBase元数据的方式是:In the present application, after the metadata is collected and stored, the stored metadata can also be presented and maintained. Taking a two-dimensional table stored in a database (hereinafter referred to as a metabase) as an example, the way to present and maintain HBase metadata through a web application is:
首先在HBase集群或能连接该元数据库的服务器上搭建一个Web容器,然后在Web容器上构建一个Java开发的Web应用程序,该Web应用程序的架构如图5所示,可以包括数据接入层、业务接口与实现层、REST(Representational State Transfer)API(Application Programming Interface,应用程序编程接口)和前端视图层。该Web应用程序可以通过JDBC(Java Data Base Connectivity, Java数据库连接)的方式与元数据库进行交互。First, build a Web container on the HBase cluster or the server that can connect to the metabase, and then build a Java-developed Web application on the Web container. The architecture of the Web application is shown in Figure 5. It can include the data access layer. , business interface and implementation layer, REST (Representational State Transfer) API (Application Programming Interface) and front-end view layer. The web application can interact with the metabase via JDBC (Java Data Base Connectivity).
例如,Web应用程序的数据接入层可以从元数据库中读取存储的HBase元数据;业务接口与实现层可以负责调度数据接入层与REST API,将组装好的元数据在数据接入层与REST API间传递;前端视图层通过调用REST API获取组装好的元数据,并且将它们有序的展现在Web页面中。可以通过Web应用程序提供的接口,如页面,对二维表中指定的表或列增加业务描述,也可以根据业务描述或者列名、表名模糊查询相关的元数据。For example, the data access layer of the web application can read the stored HBase metadata from the metadata database; the service interface and the implementation layer can be responsible for scheduling the data access layer and the REST API, and assembling the assembled metadata in the data access layer. Passed between the REST API; the front-end view layer gets the assembled metadata by calling the REST API and presenting them in an orderly way on the web page. You can add a service description to a specified table or column in a two-dimensional table through an interface provided by a web application, such as a page, or you can ambiguously query related metadata according to a business description or a column name or a table name.
若二维表存储在文本文件中,则Web应用程序可以通过现有的管理文件的形式来管理元数据,这里不作详述。If the two-dimensional table is stored in a text file, the web application can manage the metadata in the form of an existing management file, which will not be described in detail herein.
通过图2所示的流程可以看出,本公开利用HBase自带的协处理器功能实现了对表的表结构对应的元数据的自动采集和存储,基于存储的这类元数据可以快速获知某一张表的表结构,无需再去遍历该表的全部记录,实现了这类元数据从无法管理到可以管理的变革。此外,本公开还提供可以对指定的表或列增加业务描述的接口,增加了HBase的可用性,降低了基于HBase的应用的开发难度。It can be seen from the flow shown in FIG. 2 that the present disclosure utilizes the coprocessor function provided by HBase to realize automatic collection and storage of metadata corresponding to the table structure of the table, and can quickly learn a certain kind of metadata based on the stored metadata. The table structure of a table eliminates the need to traverse all the records of the table, realizing the transformation of such metadata from unmanageable to manageable. In addition, the present disclosure also provides an interface that can add a service description to a specified table or column, increases the availability of HBase, and reduces the development difficulty of an HBase-based application.
下面通过一个具体实施例,说明本公开如何通过二维表管理表的表结构对应的元数据。The following describes how the present disclosure manages the metadata corresponding to the table structure of the table through a two-dimensional table through a specific embodiment.
首先,创建一个关系型数据库作为元数据库,在该元数据库中建立一张二维表,用于存储表的表结构对应的元数据。该二维表包括表名、列族名、列名和列的创建时间等字段。First, create a relational database as a metabase, and create a two-dimensional table in the metabase for storing the metadata corresponding to the table structure of the table. The two-dimensional table includes fields such as table name, column family name, column name, and column creation time.
在初始状态,当HBase内没有任何表的数据时,该二维表为一张空表。In the initial state, when there is no data of any table in HBase, the two-dimensional table is an empty table.
在t1时刻,Master节点收到一个操作指令,该操作指令指示在HBase中创建表“T1”,在表“T1”内创建列族“F1”,以及在列族“F1”下创建列“A”和“B”。根据上文描述的方法,由于该操作指令更改了表的结构,则可以在事先建立的二维表中添加与该操作指令相关的记录,如下表3所示。At time t1, the master node receives an operation instruction indicating that the table "T1" is created in HBase, the column family "F1" is created in the table "T1", and the column "A" is created under the column family "F1". "and "B". According to the method described above, since the operation instruction changes the structure of the table, the record related to the operation instruction can be added to the two-dimensional table created in advance, as shown in Table 3 below.
表3table 3
表名Table Name 列族名Column family name 列名Column name 列的创建时间Column creation time
T1T1 F1F1 AA t1T1
T1T1 F1F1 BB t1T1
在t2时刻,Master节点收到一个操作指令,该操作指令指示向表“T1”的行键为“1”的“F1”列族成员“A”和“B”分别添加数据。根据上文描述的方法,由于该操作指令未更改表的表结构,故不更新二维表,此时的二维表仍如表3所示。At time t2, the master node receives an operation instruction indicating that data is added to the "F1" column family members "A" and "B" of the "F1" row key of the table "T1", respectively. According to the method described above, since the operation instruction does not change the table structure of the table, the two-dimensional table is not updated, and the two-dimensional table at this time is still as shown in Table 3.
在t3时刻,Master节点收到一个操作指令,该操作指令指示向表“T1”的行键为“2”的“F1”列族成员“A”、“B”和“C”分别添加数据。通过比较该操作指令包含的表结构和表3包含的表结构,可知该操作指令包含的列“C”未出现在表3中,即该操作指令实质上更改了表结构,从而可以将表“T1”的列族“F1”下增加的列“C”同步到二维表中,更新后的二维表如下表4所示。At time t3, the master node receives an operation instruction indicating that data is added to the "F1" column family members "A", "B", and "C" of the "F1" row key of the table "T1", respectively. By comparing the table structure contained in the operation instruction with the table structure contained in Table 3, it can be seen that the column "C" contained in the operation instruction does not appear in Table 3, that is, the operation instruction substantially changes the table structure, so that the table can be " The column "C" added under the column family "F1" of T1" is synchronized to the two-dimensional table, and the updated two-dimensional table is as shown in Table 4 below.
表4Table 4
表名Table Name 列族名Column family name 列名Column name 列的创建时间Column creation time
T1T1 F1F1 AA t1T1
T1T1 F1F1 BB t1T1
T1T1 F1F1 CC t3T3
在t4时刻,Master节点收到一个操作指令,该操作指令指示删除表“T1”下的列族“F1”。由于该操作指令更改了表结构,则可以在表4所示的二维表中删除列族名为“F1”的三条记录,即删除表4中的三条记录。At time t4, the master node receives an operation instruction indicating the deletion of the column family "F1" under the table "T1". Since the operation instruction changes the table structure, the three records of the column family name "F1" can be deleted in the two-dimensional table shown in Table 4, that is, the three records in Table 4 are deleted.
以上对本公开的方法进行了描述。下面对本公开的装置进行描述,该装置具有实现上述方法的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块或单元。The method of the present disclosure has been described above. The apparatus of the present disclosure will now be described, which has the function of implementing the above method. The functions may be implemented by hardware or by corresponding software implemented by hardware. The hardware or software includes one or more modules or units corresponding to the functions described above.
图3为公开提供的一种Master节点的硬件结构示意图。该Master节点可本包括处理器601、存储有机器可执行指令的机器可读存储介质602。处理器601与机器可读存储介质602通信,通过读取并执行机器可读存储介质602中的机器可执行指令,可执行上文描述的元数据管理方法。FIG. 3 is a schematic diagram of a hardware structure of a master node according to the disclosure. The Master node can include a processor 601, a machine readable storage medium 602 that stores machine executable instructions. The processor 601 is in communication with a machine readable storage medium 602 that can perform the metadata management methods described above by reading and executing machine executable instructions in the machine readable storage medium 602.
参见图4,本公开还提供了一种元数据管理装置,该元数据管理装置可以包括在如图3所示的Master节点中。如图4所示,该装置可以包括以下单元:Referring to FIG. 4, the present disclosure also provides a metadata management apparatus, which may be included in a Master node as shown in FIG. As shown in Figure 4, the device can include the following units:
指令检测单元500,用于检测针对表的操作指令;An instruction detecting unit 500, configured to detect an operation instruction for the table;
指令解析单元501,用于解析针对表的操作指令,并判断所述操作指令 是否对表的表结构发生更改;所述更改包括增加表、删除表、增加列族、删除列族、增加列和删除列中的至少一种;以及The instruction parsing unit 501 is configured to parse an operation instruction for the table, and determine whether the operation instruction changes a table structure of the table; the change includes adding a table, deleting a table, adding a column family, deleting a column family, adding a column, and Delete at least one of the columns; and
元数据采集单元502,用于在所述指令解析单元501确定所述操作指令对表的表结构发生更改时,记录增加的表的表名、增加的列族的列族名或增加的列的列名或者删除包含删除的表的表名、删除的列族的列族名或删除的列的列名的记录。具体地,如果更改包括增加表、增加列族或增加列,则元数据采集单元502记录增加的表的表名、增加的列族的列族名或增加的列的列名。如果更改包括删除表、删除列族或删除列,则元数据采集单元502删除包含删除的表的表名、删除的列族的列族名或删除的列的列名的记录。The metadata collection unit 502 is configured to: when the instruction parsing unit 501 determines that the operation instruction changes the table structure of the table, record the table name of the added table, the column family name of the added column family, or the added column. Column name or delete the record containing the table name of the deleted table, the column family name of the deleted column family, or the column name of the deleted column. Specifically, if the change includes adding a table, adding a column family, or adding a column, the metadata collection unit 502 records the table name of the added table, the column family name of the added column family, or the column name of the added column. If the change includes deleting the table, deleting the column family, or deleting the column, the metadata collection unit 502 deletes the record including the table name of the deleted table, the column family name of the deleted column family, or the column name of the deleted column.
可选的,所述指令解析单元501在确定所述操作指令满足以下条件中的至少一个时,确定所述操作指令对表的表结构发生更改:1)所述操作指令包含Drop语句;2)所述操作指令包含Create语句;3)所述操作指令包含Alter语句,且所述Alter语句用于增加列族和/或删除列族;4)所述操作指令包含Put语句,且所述Put语句用于增加列。Optionally, the instruction parsing unit 501 determines, when the operation instruction satisfies at least one of the following conditions, that the operation instruction changes a table structure of the table: 1) the operation instruction includes a Drop statement; 2) The operation instruction includes a Create statement; 3) the operation instruction includes an Alter statement, and the Alter statement is used to add a column family and/or delete a column family; 4) the operation instruction includes a Put statement, and the Put statement Used to add columns.
可选的,元数据采集单元502将增加的表的表名、增加的列族的列族名或增加的列的列名记录在预先建立的二维表中,和/或从所述二维表中删除包含删除的表的表名、删除的列族的列族名或删除的列的列名的记录包含表、列族或列的名称的记录。所述二维表位于关系型数据库或文本文件中,所述二维表包括以下字段:表名、列族名和列名。Optionally, the metadata collection unit 502 records the table name of the added table, the column family name of the added column family, or the column name of the added column in a pre-established two-dimensional table, and/or from the two-dimensional A record that contains the table name of the deleted table, the column family name of the deleted column family, or the column name of the deleted column contains records for the name of the table, column family, or column. The two-dimensional table is located in a relational database or text file, and the two-dimensional table includes the following fields: a table name, a column family name, and a column name.
所述元数据采集单元502可以通过以下步骤将更改的表结构元数据同步到预先建立的二维表中:The metadata collection unit 502 can synchronize the changed table structure metadata into a pre-established two-dimensional table by the following steps:
1)如果所述操作指令包含Drop语句,则从所述二维表中删除一条或多条记录,所述一条或多条记录包含所述Drop语句指示删除的表的表名;1) if the operation instruction includes a Drop statement, deleting one or more records from the two-dimensional table, the one or more records including a table name of the table indicated by the Drop statement indicating deletion;
2)如果所述操作指令包含Create语句,则在所述二维表中增加一条或多条记录,所述一条或多条记录包含所述Create语句指示增加的表的表名;2) if the operation instruction includes a Create statement, adding one or more records to the two-dimensional table, the one or more records including the table name of the table indicated by the Create statement;
3)如果所述操作指令包含Alter语句且所述Alter语句用于删除列族,则从所述二维表中删除一条或多条记录,所述一条或多条记录包含所述Alter语句指示删除的列族的列族名以及该列族所属表的表名;3) if the operation instruction includes an Alter statement and the Alter statement is used to delete a column family, deleting one or more records from the two-dimensional table, the one or more records including the Alter statement indicating deletion The column family name of the column family and the table name of the table to which the column family belongs;
4)如果所述操作指令包含Alter语句且所述Alter语句用于增加列族,则在所述二维表中增加一条或多条记录,所述一条或多条记录包含所述Alter 语句指示增加的列族的列族名以及该列族所属表的表名;4) if the operation instruction includes an Alter statement and the Alter statement is used to add a column family, adding one or more records to the two-dimensional table, the one or more records including the Alter statement indicating an increase The column family name of the column family and the table name of the table to which the column family belongs;
5)如果所述操作指令包含Put语句且所述Put语句用于增加列,则在所述二维表中增加一条或多条记录,所述一条或多条记录包含所述Put语句指示增加的列的列名、该列所属列族的列族名以及该列所属表的表名。5) if the operation instruction includes a Put statement and the Put statement is used to add a column, adding one or more records to the two-dimensional table, the one or more records including the Put statement indicating an increase The column name of the column, the column family name of the column family to which the column belongs, and the table name of the table to which the column belongs.
可选的,所述二维表还可以包括以下字段中的至少一个:表的业务描述、列族的业务描述、列的业务描述、表的创建时间、列族的创建时间和列的创建时间。Optionally, the two-dimensional table may further include at least one of the following fields: a service description of the table, a service description of the column family, a service description of the column, a creation time of the table, a creation time of the column family, and a creation time of the column. .
可选的,所述元数据管理装置还可以包括:指令执行单元,用于在所述指令解析单元501解析所述操作指令的同时,根据所述操作指令指示的动作处理表。Optionally, the metadata management apparatus may further include: an instruction execution unit, configured to process the table according to the action indicated by the operation instruction while the instruction parsing unit 501 parses the operation instruction.
如上所述,本公开可以是方法、装置、程序和机器可读存储介质。其中,机器可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。机器可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。机器可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的机器可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。As noted above, the present disclosure can be a method, apparatus, program, and machine readable storage medium. The machine readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device. The machine-readable storage medium can be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (non-exhaustive lists) of machine-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, for example, with instructions stored thereon A raised structure in the hole card or groove, and any suitable combination of the above. A machine-readable storage medium as used herein is not to be interpreted as a transient signal itself, such as radio waves or other free-propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (eg, light pulses through a fiber optic cable), or through wires The electrical signal transmitted.
这里所描述的机器可执行指令可以从机器可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收机器可执行指令,并转发该机器可执行指令,以供存储在各个计算/处理设备中的机器可读存储介质中。The machine-executable instructions described herein can be downloaded from a machine-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device can execute instructions from the network receiver and forward the machine executable instructions for storage in a machine readable storage medium in each computing/processing device.
用于执行本公开操作的机器可执行指令可以是汇编指令、指令集架构 (ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。机器可执行指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用机器可执行指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行机器可执行指令,从而实现本公开的各个方面。Machine-executable instructions for performing the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine related instructions, microcode, firmware instructions, state setting data, or in one or more programming languages. Source code or object code written in any combination, including object oriented programming languages such as Smalltalk, C++, etc., as well as conventional procedural programming languages such as the "C" language or similar programming languages. The machine executable instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer, partly on the remote computer, or entirely on the remote computer or server. . In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or wide area network (WAN), or can be connected to an external computer (eg, using an Internet service provider to access the Internet) connection). In some embodiments, the electronic circuitry can be customized by utilizing state information of machine executable instructions, such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA). The machine can execute instructions to implement various aspects of the present disclosure.
这里参照根据本公开实施例的方法、装置和程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由机器可执行指令实现。Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus, and program products according to embodiments of the present disclosure. It will be understood that each block of the flowchart illustrations and/or FIG.
这些机器可执行指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些机器可执行指令存储在机器可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的机器可读存储介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。The machine executable instructions can be provided to a general purpose computer, a special purpose computer, or a processor of other programmable data processing apparatus to produce a machine such that when executed by a processor of a computer or other programmable data processing apparatus. Means for implementing the functions/acts specified in one or more of the blocks of the flowcharts and/or block diagrams. The machine executable instructions can also be stored in a machine readable storage medium that causes the computer, programmable data processing apparatus, and/or other apparatus to operate in a particular manner, such that the machine readable storage medium storing the instructions includes An article of manufacture that includes instructions for implementing various aspects of the functions/acts recited in one or more of the flowcharts.
也可以把机器可执行指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。The machine executable instructions can also be loaded onto a computer, other programmable data processing device, or other device to perform a series of operational steps on a computer, other programmable data processing device or other device to produce a computer implemented process. Thus, instructions executed on a computer, other programmable data processing apparatus, or other device implement the functions/acts recited in one or more blocks of the flowcharts and/or block diagram.
附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和 计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram can represent a module, a program segment, or a portion of an instruction that includes one or more components for implementing the specified logical functions. Executable instructions. In some alternative implementations, the functions noted in the blocks may also occur in a different order than those illustrated in the drawings. For example, two consecutive blocks may be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts, can be implemented in a dedicated hardware-based system that performs the specified function or function. Or it can be implemented by a combination of dedicated hardware and computer instructions.
以上所述仅为本公开的较佳实施例而已,并不用以限制本公开,凡在本公开的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本公开保护的范围之内。The above description is only the preferred embodiment of the present disclosure, and is not intended to limit the disclosure, and any modifications, equivalents, improvements, etc., which are made within the spirit and principles of the present disclosure, should be included in the present disclosure. Within the scope of protection.

Claims (15)

  1. 一种元数据管理方法,包括:A metadata management method comprising:
    HBase架构中的主(Master)节点检测针对表的操作指令;A master node in the HBase architecture detects an operation instruction for a table;
    所述Master节点解析所述操作指令,并判断所述操作指令是否对表的表结构进行更改;所述更改包括增加表、删除表、增加列族、删除列族、增加列和删除列中的至少一种;The master node parses the operation instruction, and determines whether the operation instruction changes a table structure of a table; the changes include adding a table, deleting a table, adding a column family, deleting a column family, adding a column, and deleting a column. At least one
    所述Master节点在确定所述操作指令对表的表结构进行更改、并且所述更改包括增加表、增加列族或增加列时,记录增加的表的表名、增加的列族的列族名或增加的列的列名;以及The master node records the table name of the added table and the column family name of the added column family when determining that the operation instruction changes the table structure of the table, and the change includes adding a table, adding a column family, or adding a column. Or the column name of the added column;
    所述Master节点在确定所述操作指令对表的表结构进行更改、并且所述更改包括删除表、删除列族或删除列时,删除包含删除的表的表名、删除的列族的列族名或删除的列的列名的记录。The master node deletes the table name including the deleted table and the column family of the deleted column family when determining that the operation instruction changes the table structure of the table, and the change includes deleting the table, deleting the column family, or deleting the column. A record of the column name of the named or deleted column.
  2. 如权利要求1所述的元数据管理方法,其中,在所述操作指令满足以下条件中的至少一个时,确定所述操作指令对表的表结构进行更改:The metadata management method according to claim 1, wherein the operation instruction is determined to change a table structure of the table when the operation instruction satisfies at least one of the following conditions:
    所述操作指令包含Drop语句;The operation instruction includes a Drop statement;
    所述操作指令包含Create语句;The operation instruction includes a Create statement;
    所述操作指令包含Alter语句,且所述Alter语句用于增加列族和/或删除列族;以及The operation instruction includes an Alter statement, and the Alter statement is used to add a column family and/or delete a column family;
    所述操作指令包含Put语句,且所述Put语句用于增加列。The operation instruction includes a Put statement, and the Put statement is used to add a column.
  3. 如权利要求2所述的元数据管理方法,其中,所述Master节点将增加的表的表名、增加的列族的列族名或增加的列的列名记录在预先建立的二维表中,和/或从所述二维表中删除包含删除的表的表名、删除的列族的列族名或删除的列的列名的记录,The metadata management method according to claim 2, wherein said master node records the table name of the added table, the column family name of the added column family, or the column name of the added column in a pre-established two-dimensional table And/or deleting from the two-dimensional table a record containing the table name of the deleted table, the column family name of the deleted column family, or the column name of the deleted column,
    所述二维表位于关系型数据库或文本文件中,所述二维表包括以下字段:表名、列族名和列名。The two-dimensional table is located in a relational database or text file, and the two-dimensional table includes the following fields: a table name, a column family name, and a column name.
  4. 如权利要求3所述的元数据管理方法,其中,The metadata management method according to claim 3, wherein
    如果所述操作指令包含Drop语句,则所述Master节点从所述二维表中删除一条或多条记录,所述一条或多条记录包含所述Drop语句指示删除的表的表名,以及If the operation instruction includes a Drop statement, the Master node deletes one or more records from the two-dimensional table, the one or more records including a table name of the table indicated by the Drop statement indicating deletion, and
    如果所述操作指令包含Create语句,则所述Master节点在所述二维表中增加一条或多条记录,所述一条或多条记录包含所述Create语句指示增加 的表的表名。If the operation instruction includes a Create statement, the Master node adds one or more records to the two-dimensional table, the one or more records including the table name of the table in which the Create statement indicates an addition.
  5. 如权利要求3所述的元数据管理方法,其中,The metadata management method according to claim 3, wherein
    如果所述操作指令包含Alter语句且所述Alter语句用于删除列族,则所述Master节点从所述二维表中删除一条或多条记录,所述一条或多条记录包含所述Alter语句指示删除的列族的列族名以及该列族所属表的表名,以及If the operation instruction includes an Alter statement and the Alter statement is used to delete a column family, the Master node deletes one or more records from the two-dimensional table, the one or more records including the Alter statement Indicates the column family name of the column family being deleted and the table name of the table to which the column family belongs, and
    如果所述操作指令包含Alter语句且所述Alter语句用于增加列族,则所述Master节点在所述二维表中增加一条或多条记录,所述一条或多条记录包含所述Alter语句指示增加的列族的列族名以及该列族所属表的表名。If the operation instruction includes an Alter statement and the Alter statement is used to add a column family, the Master node adds one or more records in the two-dimensional table, the one or more records including the Alter statement Indicates the column family name of the added column family and the table name of the table to which the column family belongs.
  6. 如权利要求3所述的元数据管理方法,其中,The metadata management method according to claim 3, wherein
    如果所述操作指令包含Put语句且所述Put语句用于增加列,则所述Master节点在所述二维表中增加一条或多条记录,所述一条或多条记录包含所述Put语句指示增加的列的列名、该列所属列族的列族名以及该列所属表的表名。If the operation instruction includes a Put statement and the Put statement is used to add a column, the Master node adds one or more records in the two-dimensional table, and the one or more records include the Put statement indication The column name of the added column, the column family name of the column family to which the column belongs, and the table name of the table to which the column belongs.
  7. 如权利要求3所述的元数据管理方法,其中,所述二维表还包括以下字段中的至少一个:表的业务描述、列族的业务描述、列的业务描述、表的创建时间、列族的创建时间和列的创建时间。The metadata management method according to claim 3, wherein the two-dimensional table further comprises at least one of the following fields: a service description of the table, a service description of the column family, a service description of the column, a creation time of the table, and a column. The creation time of the family and the creation time of the column.
  8. 一种HBase架构中的主(Master)节点,包括:A master node in an HBase architecture, including:
    处理器;以及Processor;
    存储有机器可执行指令的机器可读存储介质,a machine readable storage medium storing machine executable instructions,
    其中,通过读取并执行所述机器可执行指令,所述处理器被使得:Wherein, by reading and executing the machine executable instructions, the processor is caused to:
    检测针对表的操作指令;Detecting operational instructions for the table;
    解析所述操作指令,并判断所述操作指令是否对表的表结构进行更改;所述更改包括增加表、删除表、增加列族、删除列族、增加列和删除列中的至少一种;Parsing the operation instruction, and determining whether the operation instruction changes a table structure of the table; the change includes at least one of adding a table, deleting a table, adding a column family, deleting a column family, adding a column, and deleting a column;
    在确定所述操作指令对表的表结构进行更改、并且所述更改包括增加表、增加列族或增加列时,记录增加的表的表名、增加的列族的列族名或增加的列的列名;以及Recording the table name of the added table, the column family name of the added column family, or the added column when it is determined that the operation instruction changes the table structure of the table, and the change includes adding a table, adding a column family, or adding a column Column name;
    在确定所述操作指令对表的表结构进行更改、并且所述更改包括删除表、删除列族或删除列时,删除包含删除的表的表名、删除的列族的列族名或删除的列的列名的记录。Determining the table name of the deleted table, the column family name of the deleted column family, or deleting the table structure when the operation instruction changes the table structure of the table, and the change includes deleting the table, deleting the column family, or deleting the column. A record of the column name of the column.
  9. 如权利要求8所述的Master节点,其中,在所述操作指令满足以下条件中的至少一个时,确定所述操作指令对表的表结构进行更改:The Master node according to claim 8, wherein the operation instruction is determined to change a table structure of the table when the operation instruction satisfies at least one of the following conditions:
    所述操作指令包含Drop语句;The operation instruction includes a Drop statement;
    所述操作指令包含Create语句;The operation instruction includes a Create statement;
    所述操作指令包含Alter语句,且所述Alter语句用于增加列族和/或删除列族;以及The operation instruction includes an Alter statement, and the Alter statement is used to add a column family and/or delete a column family;
    所述操作指令包含Put语句,且所述Put语句用于增加列。The operation instruction includes a Put statement, and the Put statement is used to add a column.
  10. 如权利要求9所述的Master节点,其中,所述处理器将增加的表的表名、增加的列族的列族名或增加的列的列名记录在预先建立的二维表中,和/或从所述二维表中删除包含删除的表的表名、删除的列族的列族名或删除的列的列名的记录,The master node according to claim 9, wherein said processor records the table name of the added table, the column family name of the added column family, or the column name of the added column in a pre-established two-dimensional table, and / or delete the record containing the table name of the deleted table, the column family name of the deleted column family, or the column name of the deleted column from the two-dimensional table,
    所述二维表位于关系型数据库或文本文件中,所述二维表包括以下字段:表名、列族名和列名。The two-dimensional table is located in a relational database or text file, and the two-dimensional table includes the following fields: a table name, a column family name, and a column name.
  11. 如权利要求10所述的Master节点,其中,The Master node according to claim 10, wherein
    如果所述操作指令包含Drop语句,则所述处理器从所述二维表中删除一条或多条记录,所述一条或多条记录包含所述Drop语句指示删除的表的表名,以及If the operation instruction includes a Drop statement, the processor deletes one or more records from the two-dimensional table, the one or more records including a table name of the table indicated by the Drop statement indicating deletion, and
    如果所述操作指令包含Create语句,则所述处理器在所述二维表中增加一条或多条记录,所述一条或多条记录包含所述Create语句指示增加的表的表名。If the operation instruction includes a Create statement, the processor adds one or more records to the two-dimensional table, the one or more records including the table name of the table in which the Create statement indicates an increase.
  12. 如权利要求10所述的Master节点,其中,The Master node according to claim 10, wherein
    如果所述操作指令包含Alter语句且所述Alter语句用于删除列族,则所述处理器从所述二维表中删除一条或多条记录,所述一条或多条记录包含所述Alter语句指示删除的列族的列族名以及该列族所属表的表名,以及If the operation instruction includes an Alter statement and the Alter statement is for deleting a column family, the processor deletes one or more records from the two-dimensional table, the one or more records including the Alter statement Indicates the column family name of the column family being deleted and the table name of the table to which the column family belongs, and
    如果所述操作指令包含Alter语句且所述Alter语句用于增加列族,则所述处理器在所述二维表中增加一条或多条记录,所述一条或多条记录包含所述Alter语句指示增加的列族的列族名以及该列族所属表的表名。If the operation instruction includes an Alter statement and the Alter statement is used to add a column family, the processor adds one or more records to the two-dimensional table, the one or more records including the Alter statement Indicates the column family name of the added column family and the table name of the table to which the column family belongs.
  13. 如权利要求10所述的Master节点,其中,The Master node according to claim 10, wherein
    如果所述操作指令包含Put语句且所述Put语句用于增加列,则所述处理器在所述二维表中增加一条或多条记录,所述一条或多条记录包含所述 Put语句指示增加的列的列名、该列所属列族的列族名以及该列所属表的表名。If the operation instruction includes a Put statement and the Put statement is used to add a column, the processor adds one or more records to the two-dimensional table, the one or more records including the Put statement indication The column name of the added column, the column family name of the column family to which the column belongs, and the table name of the table to which the column belongs.
  14. 如权利要求10所述的Master节点,其中,所述二维表还包括以下字段中的至少一个:表的业务描述、列族的业务描述、列的业务描述、表的创建时间、列族的创建时间和列的创建时间。The master node according to claim 10, wherein the two-dimensional table further comprises at least one of the following fields: a service description of the table, a service description of the column family, a service description of the column, a creation time of the table, and a column family. Create time and column creation time.
  15. 一种机器可读存储介质,包括在被HBase架构中的主(Master)节点执行时使所述Master节点执行如下操作的指令:A machine readable storage medium comprising instructions that, when executed by a master node in an HBase architecture, cause the master node to perform the following operations:
    检测针对表的操作指令;Detecting operational instructions for the table;
    解析所述操作指令,并判断所述操作指令是否对表的表结构进行更改;所述更改包括增加表、删除表、增加列族、删除列族、增加列和删除列中的至少一种;Parsing the operation instruction, and determining whether the operation instruction changes a table structure of the table; the change includes at least one of adding a table, deleting a table, adding a column family, deleting a column family, adding a column, and deleting a column;
    在确定所述操作指令对表的表结构进行更改、并且所述更改包括增加表、增加列族或增加列时,记录增加的表的表名、增加的列族的列族名或增加的列的列名;以及Recording the table name of the added table, the column family name of the added column family, or the added column when it is determined that the operation instruction changes the table structure of the table, and the change includes adding a table, adding a column family, or adding a column Column name;
    在确定所述操作指令对表的表结构进行更改、并且所述更改包括删除表、删除列族或删除列时,删除包含删除的表的表名、删除的列族的列族名或删除的列的列名的记录。Determining the table name of the deleted table, the column family name of the deleted column family, or deleting the table structure when the operation instruction changes the table structure of the table, and the change includes deleting the table, deleting the column family, or deleting the column. A record of the column name of the column.
PCT/CN2018/086398 2017-05-11 2018-05-10 Metadata management WO2018205981A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710331521.9A CN108241724A (en) 2017-05-11 2017-05-11 A kind of metadata management method and device
CN201710331521.9 2017-05-11

Publications (1)

Publication Number Publication Date
WO2018205981A1 true WO2018205981A1 (en) 2018-11-15

Family

ID=62702928

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/086398 WO2018205981A1 (en) 2017-05-11 2018-05-10 Metadata management

Country Status (2)

Country Link
CN (1) CN108241724A (en)
WO (1) WO2018205981A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188118A (en) * 2019-04-25 2019-08-30 广州至真信息科技有限公司 A kind of method of data synchronization, device

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189783B (en) * 2018-08-03 2023-10-03 北京涛思数据科技有限公司 Time sequence database table structure change processing method
CN110471927A (en) * 2019-08-20 2019-11-19 浙江大搜车软件技术有限公司 Metadata acquisition method, apparatus, computer equipment and storage medium
CN111159272A (en) * 2019-12-31 2020-05-15 青梧桐有限责任公司 Data quality monitoring and early warning method and system based on data warehouse and ETL
CN111159161A (en) * 2019-12-31 2020-05-15 青梧桐有限责任公司 ETL rule-based data quality monitoring and early warning system and method
CN113467774B (en) * 2021-07-30 2024-01-30 北京鼎普科技股份有限公司 WEB terminal business software development framework and method
CN115062084B (en) * 2022-08-19 2022-11-04 中关村科学城城市大脑股份有限公司 Method and device for constructing API (application programming interface) based on database metadata

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103312791A (en) * 2013-05-24 2013-09-18 上海和伍新材料科技有限公司 Internet of things heterogeneous data storage method and system
CN103853714A (en) * 2012-11-28 2014-06-11 中国移动通信集团河南有限公司 Data processing method and device
CN105677826A (en) * 2016-01-04 2016-06-15 博康智能网络科技股份有限公司 Resource management method for massive unstructured data
CN106503276A (en) * 2017-01-06 2017-03-15 山东浪潮云服务信息科技有限公司 A kind of method and apparatus of the time series databases for real-time monitoring system
CN106528674A (en) * 2016-10-31 2017-03-22 厦门服云信息科技有限公司 Method and device for high-performance query based on Hbase row keys

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346447A (en) * 2014-10-28 2015-02-11 浪潮电子信息产业股份有限公司 Partitioned connection method oriented to mixed type big data processing systems
CN104391957A (en) * 2014-12-01 2015-03-04 浪潮电子信息产业股份有限公司 Data interaction analysis method for hybrid big data processing system
CN106503243B (en) * 2016-11-08 2019-08-06 国网山东省电力公司电力科学研究院 Electric power big data querying method based on HBase secondary index

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103853714A (en) * 2012-11-28 2014-06-11 中国移动通信集团河南有限公司 Data processing method and device
CN103312791A (en) * 2013-05-24 2013-09-18 上海和伍新材料科技有限公司 Internet of things heterogeneous data storage method and system
CN105677826A (en) * 2016-01-04 2016-06-15 博康智能网络科技股份有限公司 Resource management method for massive unstructured data
CN106528674A (en) * 2016-10-31 2017-03-22 厦门服云信息科技有限公司 Method and device for high-performance query based on Hbase row keys
CN106503276A (en) * 2017-01-06 2017-03-15 山东浪潮云服务信息科技有限公司 A kind of method and apparatus of the time series databases for real-time monitoring system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188118A (en) * 2019-04-25 2019-08-30 广州至真信息科技有限公司 A kind of method of data synchronization, device

Also Published As

Publication number Publication date
CN108241724A (en) 2018-07-03

Similar Documents

Publication Publication Date Title
WO2018205981A1 (en) Metadata management
CN106611046B (en) Spatial data storage processing middleware system based on big data technology
US10122783B2 (en) Dynamic data-ingestion pipeline
CN106708993B (en) Method for realizing space data storage processing middleware framework based on big data technology
US8706697B2 (en) Data retention component and framework
US10044522B1 (en) Tree-oriented configuration management service
Zafar et al. Big data: the NoSQL and RDBMS review
US10255378B2 (en) Database structure for distributed key-value pair, document and graph models
US9411840B2 (en) Scalable data structures
US8938430B2 (en) Intelligent data archiving
US10180984B2 (en) Pivot facets for text mining and search
US10965530B2 (en) Multi-stage network discovery
US20200125660A1 (en) Quick identification and retrieval of changed data rows in a data table of a database
US9189648B2 (en) Data mapping using trust services
CN105518673B (en) Managing data ingestion
CN107103011B (en) Method and device for realizing terminal data search
US11030242B1 (en) Indexing and querying semi-structured documents using a key-value store
US11086694B2 (en) Method and system for scalable complex event processing of event streams
US10262024B1 (en) Providing consistent access to data objects transcending storage limitations in a non-relational data store
US20180181581A1 (en) Systems and methods for implementing object storage and fast metadata search using extended attributes
US20180357330A1 (en) Compound indexes for graph databases
US10552394B2 (en) Data storage with improved efficiency
CN111221785A (en) Semantic data lake construction method of multi-source heterogeneous data
US20170262538A1 (en) Method of and system for grouping object in a storage device
US20150378835A1 (en) Managing data storage system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18797609

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18797609

Country of ref document: EP

Kind code of ref document: A1