WO2019100638A1 - Data synchronization method, device and equipment, and storage medium - Google Patents
Data synchronization method, device and equipment, and storage medium Download PDFInfo
- Publication number
- WO2019100638A1 WO2019100638A1 PCT/CN2018/082270 CN2018082270W WO2019100638A1 WO 2019100638 A1 WO2019100638 A1 WO 2019100638A1 CN 2018082270 W CN2018082270 W CN 2018082270W WO 2019100638 A1 WO2019100638 A1 WO 2019100638A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- library
- source
- target
- metadata
- target library
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
- G06F16/273—Asynchronous replication or reconciliation
Definitions
- the present application relates to the field of data synchronization technologies, and in particular, to a data synchronization method, apparatus, device, and storage medium.
- the scripts and synchronization scripts in the above process are all developed by the developer through manual development.
- the development process is complicated, the efficiency is low, and the error rate is high, which greatly reduces the efficiency of data synchronization.
- the purpose of the present application is to provide a data synchronization method, apparatus, device, and storage medium, which are intended to solve the problem of creating a table script and a synchronization script in a data synchronization process in the prior art.
- the development process is complicated, the efficiency is low, the error rate is high, and the defect of data synchronization efficiency is greatly reduced.
- a data synchronization method includes the following steps:
- the table type of the first target library table is one of an incremental table, a pipeline table, or a full scale table
- the table type of the second target library table is an incremental table, a pipeline table, or a full scale table.
- the step of acquiring the source library information including the source library name, the source library table name, and the source library table type, and the step of parsing the source library information corresponding to the metadata included in the source library to obtain the source table structure includes:
- the metadata information table of the obtained metadata is parsed, and the source table structure is obtained according to the metadata information table.
- the step of acquiring the table type of the first target library table and correspondingly generating a synchronization script for synchronizing metadata from the source library to the second target library table and the second target library table to the target library includes: ;
- the first sqoop synchronization script and the hive program are generated correspondingly; the first sqoop synchronization script and the hive program are used to synchronize metadata from the source library to the first target.
- the metadata in the first target library table is deduplicated according to the source table deduplication field and then stored in the second target library table.
- the step of acquiring a table type of the first target library table, and correspondingly generating a synchronization script for synchronizing metadata from the source library to the first target library table and the second target library table to the target library also includes:
- the second sqoop synchronization script and the hive program are generated correspondingly; the second sqoop synchronization script and the hive program are used to synchronize the metadata from the source library to the first target library.
- the metadata in the first target library table is stored in the second target library table.
- the step of acquiring a table type of the first target library table, and correspondingly generating a synchronization script for synchronizing metadata from the source library to the first target library table and the second target library table to the target library also includes:
- the third sqoop synchronization script and the hive program are generated correspondingly; the third sqoop synchronization script and the hive program are used to synchronize the metadata from the source library to the first target library.
- the table then stores the metadata in the first target library table into the second target library table.
- the execution periods of the first sqoop synchronization script and the hive program, the second sqoop synchronization script, the hive program, the third sqoop synchronization script, and the hive program are both 24 hours.
- the metadata information table corresponding to the metadata includes at least a table owner, a table name, a table comment, a column name, a column comment, and a column order.
- a data synchronization device comprising:
- the input parsing module is configured to obtain source library information including at least a source library name, a source library table name, and a source library table type, and parse the source library information corresponding to the metadata included in the source library to obtain a source table structure;
- a table creation script module configured to generate a first target library table for establishing temporary storage data in the target library according to the source table structure, and to establish a second target library table for storing the same data in the target library as the source library Table script
- the synchronization script generation module is configured to obtain a table type of the first target library table, and correspondingly generate a synchronization script for synchronizing the metadata from the source library to the target library through the first target library table and the second target library table.
- a data synchronization device includes: a processor, a memory, a communication bus; and the memory stores a computer readable program executable by the processor;
- the communication bus implements connection communication between the processor and the memory
- the processor implements the steps in the data synchronization method described above when the computer readable program is executed.
- a storage medium wherein the storage medium stores one or more programs, the one or more programs being executable by one or more processors to implement the steps in the data synchronization method described above.
- the present application automatically generates a table creation script and a synchronization script by configuring a limited number of entries, so that the data synchronization operation is simplified, the development efficiency is improved, and human error is reduced.
- FIG. 1 is a flowchart of a data synchronization method provided by the present application.
- FIG. 2 is a flowchart of step S100 in the data synchronization method provided by the present application.
- FIG. 3 is a flowchart of step S100 in the data synchronization method provided by the present application.
- FIG. 4 is a schematic diagram of an operating environment of a preferred embodiment of a data synchronization device provided by the present application.
- FIG. 5 is a functional block diagram of a preferred embodiment of the data synchronization procedure of the present application.
- FIG. 6 is a structural block diagram of a data synchronization system provided by the present application.
- the table creation script and the synchronization script in the data synchronization process are all developed by the developer through manual development, the development process is complicated, the efficiency is low, the error rate is high, and the efficiency of data synchronization is greatly reduced.
- the purpose is to provide a data synchronization method, device, device and storage medium, and automatically generate a table creation script and a synchronization script by configuring a limited number of entries, thereby simplifying data synchronization operation, improving development efficiency, and reducing human error.
- the data synchronization method includes the following steps:
- Step S100 Obtain source library information that includes at least a source library name, a source library table name, and a source library table type, and parse the source library information corresponding to the metadata included in the source library to obtain a source table structure.
- the synchronization tool when it is required to synchronize the metadata from the source library to the target library, it is only necessary to input the source in the interface of the synchronization tool (the synchronization tool is an application) designed by the data synchronization method.
- the source library information of the library name, the source library table name, the source library table type, the source table update field, the source table deduplication field, and the target library name may be used. After the source library information is entered, the build script and the synchronization script are automatically generated later, which makes the data synchronization operation simple.
- the source library initially stored by the metadata is an oracle database (the oracle database is also named Oracle) RDBMS, or Oracle for short, is a relational database management system from Oracle), MySQL (MySQL is an open source small relational database management system) database or PostgreSQL (PostgreSQL) Is a free object-relational database server) database, the target library is hive (hive is a data warehouse tool based on Hadoop) database.
- Oracle is also named Oracle
- MySQL MySQL is an open source small relational database management system
- PostgreSQL PostgreSQL Is a free object-relational database server
- the target library is hive (hive is a data warehouse tool based on Hadoop) database.
- Step S200 Generate a first target library table for establishing temporary storage data in the target library according to the source table structure, and a table construction script for establishing a second target library table storing the same data as the source library in the target library.
- the source table structure is automatically parsed according to the entered source library information, and the table creation script is automatically generated according to the source table structure and the preset table construction rules.
- the preset build rules In order to more clearly understand the process of automatically generating the build script, the following describes the preset build rules.
- the source library's table construction rules are as follows.
- the source table in the source library is divided into three types, namely the source table delta table (also called the source table table delta table) and the source table flow table (also called the source).
- the library table flow table) and the source table full scale also known as the source library table full scale three.
- the data in the source table delta table is continuously updated and added, and the historical data before the day is updated on the same day; the data in the source table flow table is continuously added, and the historical data before the day is not updated on the same day; the source table is full.
- the amount of data in the table is small, such as some configuration tables or dimension tables.
- a first target library table corresponding to the source table ie, a src table, that is, a source file table, a source table
- a second target library table ie, an ods table
- Operational Data Store table operation data storage table
- the src table is used as a temporary table, which is divided into src increment table (by day partition), src flow table (by day partition), src full scale (no partition required).
- the ods table is consistent with the source table data, which is divided into ods delta tables (no partitioning, ods delta table can be used to remove all data in the src delta table), ods flow meter (by day partition, not divided) Heavy), ods full scale (no partitioning, no weight removal).
- the source including the source library name, the source library table name, the source library table type, the source table update field, the source table deduplication field, and the target library name is entered.
- Library information is ok, which simplifies operation.
- Step S300 Obtain a table type of the first target library table, and generate a synchronization script for synchronizing the metadata from the source library to the target library through the first target library table and the second target library table.
- the table type of the first target library table is one of a delta table, a pipeline table, or a full scale table
- the table type of the second target library table is an increment table, a flow table, or a full amount.
- One of the tables; and the table type of the first target library table is the same as the source library table type, and the table type of the second target library table is the same as the source library table type. In this way, according to the source library table type of the source library information, the table type of the first target library table is obtained, and the table type of the second target library table is obtained.
- the step S100 specifically includes:
- Step S101 Obtain source library information that includes the source library name, the source library table name, the source library table type, the source table update field, the source table deduplication field, and the target library name.
- n_bas_pc_sop n_bas_pc_sop is also just a source database table name in a source database, the data included is metadata
- pa_nesop pa_nesop is the target library name of the hive type
- Source table deduplication field id
- Target library name pa_nesop.
- Step S102 Obtain metadata from a source library corresponding to the source library name in the source library information.
- the metadata information is mainly various information of the front-end business system, such as various identity and deposit information of the customers of the banking system, and the metadata information is embodied by the table owner, table name, table comment, column name, column comment and Column order and other parts of the metadata information table.
- Step S103 Parse the metadata information table of the acquired metadata, and obtain the source table structure according to the metadata information table.
- the table structure defines the fields, types, primary keys, foreign keys, and indexes of a table. These basic attributes form the table structure of the database.
- the source table structure that is, the source table corresponding to the table structure in the source library.
- the change of the table structure is mainly reflected in the change of the column name, such as adding or modifying the gender information column of the customer, the income status information column, etc., so in this embodiment, the table mainly focuses on the metadata information.
- the owner's column names are compared.
- the step S300 includes:
- Step S301 Obtain a table type of the first target library table, and determine that the table type of the first target library table is an incremental table, a flow table, or a full scale table;
- Step S302 when the table type of the first target library table is an incremental table, correspondingly generating a first sqoop synchronization script and a hive program; the first sqoop synchronization script and the hive program are used to synchronize metadata from the source library to In the specified partition of the first target library table, the metadata in the first target library table is deduplicated according to the source table deduplication field and then stored in the second target library table.
- the first target library table (n_bas_pc_sop_src) and the second target library table (n_bas_pc_sop_ods) are automatically generated.
- the program automatically generates the first sqoop synchronization script and the hive program, and the data is The source library is synchronized to a new partition of the first target library table (n_bas_pc_sop_src), and then all the data in the first target library table (n_bas_pc_sop_src) is deduplicated according to the source table deduplication field (id) and stored in the second target library table. (n_bas_pc_sop_ods).
- the execution period of the first sqoop synchronization script and the hive program is 24h (
- Hive stores the separator between fields in the file in hdfs;
- --lines-terminated-by “ ⁇ n”– Hive is the separator between each line in the file stored in hdfs;
- the synchronization script can be automatically generated according to the source library information entered and the preset table construction rules.
- the step S300 further includes:
- Step S303 when the table type of the first target library table is a pipeline table, correspondingly generating a second sqoop synchronization script and a hive program; the second sqoop synchronization script and the hive program are used to synchronize metadata from the source library to the first In the specified partition of the target library table, the metadata in the first target library table is stored in the second target library table.
- the program automatically generates a second sqoop synchronization script and a hive program, and synchronizes the data from the source library to the first day.
- the program In a new partition of the target library table (n_bas_pc_sop_src), all the data in the first target library table (n_bas_pc_sop_src) is directly stored in the second target library table (n_bas_pc_sop_ods) according to the source table without deduplication.
- the execution period of the second sqoop synchronization script and the hive program is 24h (that is, once a day).
- the step S300 further includes:
- Step S304 when the table type of the first target library table is a full scale, corresponding to generating a third sqoop synchronization script and a hive program; the third sqoop synchronization script and the hive program are used to synchronize metadata from the source library to the first A target library table, and then storing the metadata in the first target library table into the second target library table.
- the program automatically generates a third sqoop synchronization script and a hive program, and the data is not removed from the full table of the source library every day.
- the data is directly synchronized to the first target library table (n_bas_pc_sop_src), and then all the data in the first target library table (n_bas_pc_sop_src) is directly stored in the second target library table (n_bas_pc_sop_ods).
- the execution period of the third sqoop synchronization script and the hive program is 24h (that is, once a day).
- FIG. 4 is a schematic diagram showing the internal structure of a computer device in an embodiment.
- the computer device may be a terminal or a server, wherein the terminal may be a communication device, such as a smart phone, a tablet computer, a notebook computer, or a desktop computer.
- the server can be a standalone server or a server cluster consisting of multiple servers.
- the computer device includes a processor, a non-volatile storage medium, an internal memory, and a network interface connected by a system bus.
- the non-volatile storage medium of the computer device can store an operating system and a computer readable program that, when executed, can cause the processor to perform a method of verifying the difficulty prediction.
- the processor of the computer device is used to provide computing and control capabilities to support the operation of the entire computer device.
- the internal memory can store a computer readable program that, when executed by the processor, causes the processor to perform a data synchronization method.
- the network interface of the computer device is used for network communication, such as sending assigned tasks. It will be understood by those skilled in the art that the structure shown in FIG. 4 does not constitute a limitation on the computer device to which the present application is applied, and the specific computer device may include more or less components than those shown in the figure. , or combine some components, or have different component arrangements.
- the application also provides a data synchronization device including a processor 10, a memory 20, and a display 30.
- Figure 4 shows only some of the components of the data synchronization device, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
- the memory 20, in some embodiments, may be an internal storage unit of components of the data synchronization device, such as a hard disk or memory of a data synchronization device.
- the memory 20 may also be an external storage device of each component of the data synchronization device in other embodiments, such as a plug-in hard disk equipped on each component of the data synchronization device, and a smart memory card (Smart Media Card, SMC), Secure Digital (SD) card, flash card (Flash) Card) and so on.
- the memory 20 may also include both an internal storage unit of the data synchronization device and an external storage device.
- the memory 20 is configured to store application software and various types of data installed on the data synchronization device, such as the program code of the installation data synchronization device.
- the memory 20 can also be used to temporarily store data that has been output or is about to be output.
- a memory synchronization program 40 is stored on the memory 20, and the data synchronization program 40 can be executed by the processor 10 to implement the modified source database table structure method of various embodiments of the present application.
- the processor 10 is configured to execute program code or process data stored in the memory 20, for example, to execute the rights authentication method and the like.
- the display 30 is for displaying information processed in the WeChat customer behavior feedback device and a user interface for displaying visualizations, such as an assignment information interface, an authentication report interface, and the like.
- the components 10-30 of the WeChat customer behavior feedback device communicate with one another via a system bus.
- the various steps of the data synchronization method described above are implemented when the processor 10 executes the data synchronization program 40 in the memory 20.
- FIG. 5 is a functional block diagram of a data synchronization apparatus for implementing the data synchronization method of the present application.
- the data synchronization device can be divided into a record resolution module 31, a table creation script generation module 32, and a synchronization script generation module 33:
- the input parsing module 31 is configured to obtain source library information that includes at least a source library name, a source library table name, and a source library table type, and parse the source library information corresponding to the metadata included in the source library to obtain a source table structure;
- a table creation script module 32 configured to generate, according to the source table structure, a first target library table for establishing temporary storage data in the target library, and a second target library table for storing the same data in the target library as that of the source library Table creation script;
- the synchronization script generating module 33 is configured to obtain a table type of the first target library table, and correspondingly generate a synchronization script for synchronizing the metadata from the source library to the target library through the first target library table and the second target library table.
- the table type of the first target library table is one of an incremental table, a pipeline table, or a full scale table
- the table type of the second target library table is an incremental table, a pipeline table, or a full scale table.
- the step of acquiring the source library information including the source library name, the source library table name, and the source library table type, and the step of parsing the source library information corresponding to the metadata included in the source library to obtain the source table structure includes:
- the metadata information table of the obtained metadata is parsed, and the source table structure is obtained according to the metadata information table.
- the steps include:
- the first sqoop synchronization script and the hive program are generated correspondingly; the first sqoop synchronization script and the hive program are used to synchronize metadata from the source library to the first target.
- the metadata in the first target library table is deduplicated according to the source table deduplication field and then stored in the second target library table.
- the steps also include:
- the second sqoop synchronization script and the hive program are generated correspondingly; the second sqoop synchronization script and the hive program are used to synchronize the metadata from the source library to the first target library.
- the metadata in the first target library table is stored in the second target library table.
- the steps also include:
- the third sqoop synchronization script and the hive program are generated correspondingly; the third sqoop synchronization script and the hive program are used to synchronize the metadata from the source library to the first target library.
- the table then stores the metadata in the first target library table into the second target library table.
- the execution periods of the first sqoop synchronization script and the hive program, the second sqoop synchronization script, the hive program, the third sqoop synchronization script, and the hive program are both 24 hours.
- the metadata information table corresponding to the metadata includes at least a table owner, a table name, a table comment, a column name, a column comment, and a column order.
- the present application further provides a data synchronization system.
- a plurality of source databases 110, a target database 120, and a data synchronization device 130 are included.
- the metadata of the plurality of source databases 110 are processed by the data synchronization device 130 and uploaded to the target database 120 by the automatically generated table creation script and synchronization script.
- the present application also provides a storage medium accordingly.
- the storage medium stores one or more programs that can be executed by one or more processors to implement the various steps of the data synchronization method described above.
- the computer program when executed, may include the flow of an embodiment of the methods as described above.
- the foregoing computer readable storage medium can be a magnetic disk, an optical disk, or a read-only storage memory (Read-Only) Non-volatile storage media such as Memory, ROM).
- the present application automatically generates a table creation script and a synchronization script by configuring a limited number of entries, thereby simplifying data synchronization operations, improving development efficiency, and reducing human error.
- a computer program to instruct related hardware (such as a processor, a controller, etc.), and the program may be stored in a computer.
- the program may include the flow of the method embodiments as described above when executed.
- the computer readable storage medium described therein may be a memory, a magnetic disk, an optical disk, or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Disclosed in the present application are a data synchronization method, device and equipment, and a readable storage medium, the method comprising: acquiring inputted source library information that at least comprises source library name, source library table name and source library table type, and parsing metadata comprised in a source library corresponding to the source library information so as to obtain a source table structure; generating, according to the source table structure, a first target library table for establishing temporary storage data in a target library and a table creation script for establishing a second target library table which stores the same data as the source library in the target library; and acquiring the table type of the first target library table, and correspondingly generating a synchronization script for synchronizing metadata from the source library to the target library by means of the first target library table and the second target library table in sequence.
Description
本申请要求于2017年11月21日提交中国专利局、申请号为201711175125.8、发明名称为“一种数据同步方法、设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。This application claims the priority of the Chinese Patent Application filed on November 21, 2017, the Chinese Patent Office, Application No. 201711175125.8, entitled "A Data Synchronization Method, Apparatus, and Computer Readable Storage Medium", the entire contents of which are hereby incorporated by reference. The citation is incorporated in the application.
技术领域Technical field
本申请涉及数据同步技术领域,具体涉及一种数据同步方法、装置、设备及存储介质。The present application relates to the field of data synchronization technologies, and in particular, to a data synchronization method, apparatus, device, and storage medium.
背景技术Background technique
目前,对企业级数据库的数据进行同步操作(即将数据通过同步工具由源库同步至目标库)时,需要进行如下的开发:Currently, when synchronizing the data of an enterprise database (that is, synchronizing data from the source library to the target library through the synchronization tool), the following development is required:
1、根据源库表结构信息,建立目标hive数据库中对应的表;1. According to the source library table structure information, establish a corresponding table in the target hive database;
2、根据使用的同步工具开发同步脚本程序;2. Develop a synchronization script program based on the synchronization tool used;
上述过程中的建表脚本、以及同步脚本都是开发人员通过人工开发,开发过程复杂,效率低下,错误率高,极大的降低了数据同步的效率。The scripts and synchronization scripts in the above process are all developed by the developer through manual development. The development process is complicated, the efficiency is low, and the error rate is high, which greatly reduces the efficiency of data synchronization.
因此,现有技术还有待于改进和发展。Therefore, the prior art has yet to be improved and developed.
申请内容Application content
鉴于上述现有技术的不足之处,本申请的目的在于提供数据同步方法、装置、设备及存储介质,旨在解决现有技术中在数据同步过程中建表脚本、以及同步脚本都是开发人员通过人工开发,开发过程复杂,效率低下,错误率高,极大的降低了数据同步的效率的缺陷。In view of the above-mentioned deficiencies of the prior art, the purpose of the present application is to provide a data synchronization method, apparatus, device, and storage medium, which are intended to solve the problem of creating a table script and a synchronization script in a data synchronization process in the prior art. Through manual development, the development process is complicated, the efficiency is low, the error rate is high, and the defect of data synchronization efficiency is greatly reduced.
为了达到上述目的,本申请采取了以下技术方案:In order to achieve the above objectives, the present application adopts the following technical solutions:
一种数据同步方法,包括如下步骤:A data synchronization method includes the following steps:
获取所录入至少包括源库名称、源库表名称、源库表类型的源库信息,解析源库信息对应源库中所包括元数据得到源表结构;Obtaining the source library information included in the source library name, the source library table name, and the source library table type, and parsing the source library information corresponding to the metadata included in the source library to obtain the source table structure;
根据源表结构生成用于在目标库中建立临时存储数据的第一目标库表、和在目标库中建立存储与源库相同数据的第二目标库表的建表脚本;Generating, according to the source table structure, a first target library table for establishing temporary storage data in the target library, and a table creation script for establishing a second target library table storing the same data as the source library in the target library;
获取第一目标库表的表类型,并对应生成用于将元数据从源库依次经过第一目标库表、第二目标库表同步至目标库的同步脚本。Obtaining a table type of the first target library table, and correspondingly generating a synchronization script for synchronizing the metadata from the source library through the first target library table and the second target library table to the target library.
可选地,所述第一目标库表的表类型为增量表、流水表或全量表中的一种;所述第二目标库表的表类型为增量表、流水表或全量表中的一种。Optionally, the table type of the first target library table is one of an incremental table, a pipeline table, or a full scale table; and the table type of the second target library table is an incremental table, a pipeline table, or a full scale table. One kind.
可选地,所述获取所录入至少包括源库名称、源库表名称、源库表类型的源库信息,解析源库信息对应源库中所包括元数据得到源表结构的步骤包括:Optionally, the step of acquiring the source library information including the source library name, the source library table name, and the source library table type, and the step of parsing the source library information corresponding to the metadata included in the source library to obtain the source table structure includes:
获取所录入包括源库名称、源库表名、源库表类型、源表更新字段、源表除重字段及目标库名称的源库信息;Obtaining the source library information included in the source library name, the source library table name, the source library table type, the source table update field, the source table deduplication field, and the target library name;
从与源库信息中源库名称对应的源库获取元数据;Obtaining metadata from a source library corresponding to the source library name in the source library information;
解析获取元数据的元数据信息表,并根据元数据信息表对应得到源表结构。The metadata information table of the obtained metadata is parsed, and the source table structure is obtained according to the metadata information table.
可选地,所述获取第一目标库表的表类型,并对应生成用于将元数据从源库依次经过第一目标库表、第二目标库表同步至目标库的同步脚本的步骤包括;Optionally, the step of acquiring the table type of the first target library table and correspondingly generating a synchronization script for synchronizing metadata from the source library to the second target library table and the second target library table to the target library includes: ;
获取第一目标库表的表类型,判断第一目标库表的表类型是增量表、流水表或是全量表;Obtaining a table type of the first target library table, and determining that the table type of the first target library table is an incremental table, a flow table, or a full scale table;
当第一目标库表的表类型是增量表时,则对应生成第一sqoop同步脚本及hive程序;所述第一sqoop同步脚本及hive程序用于将元数据从源库同步至第一目标库表的指定分区中,再将第一目标库表中的元数据根据源表除重字段进行除重后存入第二目标库表。When the table type of the first target library table is an incremental table, the first sqoop synchronization script and the hive program are generated correspondingly; the first sqoop synchronization script and the hive program are used to synchronize metadata from the source library to the first target. In the specified partition of the library table, the metadata in the first target library table is deduplicated according to the source table deduplication field and then stored in the second target library table.
可选地,所述获取第一目标库表的表类型,并对应生成用于将元数据从源库依次经过第一目标库表、第二目标库表同步至目标库的同步脚本的步骤中还包括:Optionally, the step of acquiring a table type of the first target library table, and correspondingly generating a synchronization script for synchronizing metadata from the source library to the first target library table and the second target library table to the target library Also includes:
当第一目标库表的表类型是流水表时,则对应生成第二sqoop同步脚本及hive程序;所述第二sqoop同步脚本及hive程序用于将元数据从源库同步至第一目标库表的指定分区中,再将第一目标库表中的元数据存入第二目标库表。When the table type of the first target library table is a pipeline table, the second sqoop synchronization script and the hive program are generated correspondingly; the second sqoop synchronization script and the hive program are used to synchronize the metadata from the source library to the first target library. In the specified partition of the table, the metadata in the first target library table is stored in the second target library table.
可选地,所述获取第一目标库表的表类型,并对应生成用于将元数据从源库依次经过第一目标库表、第二目标库表同步至目标库的同步脚本的步骤中还包括:Optionally, the step of acquiring a table type of the first target library table, and correspondingly generating a synchronization script for synchronizing metadata from the source library to the first target library table and the second target library table to the target library Also includes:
当第一目标库表的表类型是全量表时,则对应生成第三sqoop同步脚本及hive程序;所述第三sqoop同步脚本及hive程序用于将元数据从源库同步至第一目标库表,再将第一目标库表中的元数据存入第二目标库表。When the table type of the first target library table is a full scale, the third sqoop synchronization script and the hive program are generated correspondingly; the third sqoop synchronization script and the hive program are used to synchronize the metadata from the source library to the first target library. The table then stores the metadata in the first target library table into the second target library table.
可选地,所述第一sqoop同步脚本及hive程序、第二sqoop同步脚本及hive程序、第三sqoop同步脚本及hive程序的执行周期均为24h。Optionally, the execution periods of the first sqoop synchronization script and the hive program, the second sqoop synchronization script, the hive program, the third sqoop synchronization script, and the hive program are both 24 hours.
可选地,所述元数据对应的元数据信息表至少包括表属主,表名称,表注释,列名称,列注释及列顺序。Optionally, the metadata information table corresponding to the metadata includes at least a table owner, a table name, a table comment, a column name, a column comment, and a column order.
一种数据同步装置,所述数据同步装置包括:A data synchronization device, the data synchronization device comprising:
录入解析模块,设置为获取所录入至少包括源库名称、源库表名称、源库表类型的源库信息,解析源库信息对应源库中所包括元数据得到源表结构;The input parsing module is configured to obtain source library information including at least a source library name, a source library table name, and a source library table type, and parse the source library information corresponding to the metadata included in the source library to obtain a source table structure;
建表脚本生成模块,设置为根据源表结构生成用于在目标库中建立临时存储数据的第一目标库表、和在目标库中建立存储与源库相同数据的第二目标库表的建表脚本;a table creation script module, configured to generate a first target library table for establishing temporary storage data in the target library according to the source table structure, and to establish a second target library table for storing the same data in the target library as the source library Table script
同步脚本生成模块,设置为获取第一目标库表的表类型,并对应生成用于将元数据从源库依次经过第一目标库表、第二目标库表同步至目标库的同步脚本。The synchronization script generation module is configured to obtain a table type of the first target library table, and correspondingly generate a synchronization script for synchronizing the metadata from the source library to the target library through the first target library table and the second target library table.
一种数据同步设备,包括:处理器、存储器、通信总线;所述存储器上存储有可被所述处理器执行的计算机可读程序;A data synchronization device includes: a processor, a memory, a communication bus; and the memory stores a computer readable program executable by the processor;
所述通信总线实现处理器和存储器之间的连接通信;The communication bus implements connection communication between the processor and the memory;
所述处理器执行所述计算机可读程序时实现上述的数据同步方法中的步骤。The processor implements the steps in the data synchronization method described above when the computer readable program is executed.
一种存储介质,其中,所述存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现上述的数据同步方法中的步骤。A storage medium, wherein the storage medium stores one or more programs, the one or more programs being executable by one or more processors to implement the steps in the data synchronization method described above.
相较于现有技术,本申请通过配置有限的条目,自动生成建表脚本和同步脚本,使数据同步操作简单化,提升开发效率,并减少人为错误。Compared with the prior art, the present application automatically generates a table creation script and a synchronization script by configuring a limited number of entries, so that the data synchronization operation is simplified, the development efficiency is improved, and human error is reduced.
附图说明DRAWINGS
图1为本申请提供的数据同步方法的流程图。FIG. 1 is a flowchart of a data synchronization method provided by the present application.
图2为本申请提供的数据同步方法中步骤S100的流程图。FIG. 2 is a flowchart of step S100 in the data synchronization method provided by the present application.
图3为本申请提供的数据同步方法中步骤S100的流程图。FIG. 3 is a flowchart of step S100 in the data synchronization method provided by the present application.
图4为本申请提供的数据同步设备的较佳实施例的运行环境示意图。FIG. 4 is a schematic diagram of an operating environment of a preferred embodiment of a data synchronization device provided by the present application.
图5为本申请所述的数据同步程序的较佳实施例的功能模块图。FIG. 5 is a functional block diagram of a preferred embodiment of the data synchronization procedure of the present application.
图6为本申请提供的数据同步系统的结构框图。FIG. 6 is a structural block diagram of a data synchronization system provided by the present application.
具体实施方式Detailed ways
鉴于现有技术中数据同步过程中建表脚本、以及同步脚本都是开发人员通过人工开发,开发过程复杂,效率低下,错误率高,极大的降低了数据同步的效率的缺点,本申请的目的在于提供一种数据同步方法、装置、设备及存储介质,通过配置有限的条目,自动生成建表脚本和同步脚本,使数据同步操作简单化,提升开发效率,并减少人为错误。In view of the prior art, the table creation script and the synchronization script in the data synchronization process are all developed by the developer through manual development, the development process is complicated, the efficiency is low, the error rate is high, and the efficiency of data synchronization is greatly reduced. The purpose is to provide a data synchronization method, device, device and storage medium, and automatically generate a table creation script and a synchronization script by configuring a limited number of entries, thereby simplifying data synchronization operation, improving development efficiency, and reducing human error.
请参阅图1,本申请提供的数据同步方法,包括以下步骤:Referring to FIG. 1, the data synchronization method provided by the present application includes the following steps:
步骤S100、获取所录入至少包括源库名称、源库表名称、源库表类型的源库信息,解析源库信息对应源库中所包括元数据得到源表结构。Step S100: Obtain source library information that includes at least a source library name, a source library table name, and a source library table type, and parse the source library information corresponding to the metadata included in the source library to obtain a source table structure.
本实施例中,当需要将元数据从源库同步至目标库中时,只需在由所述数据同步方法对应设计的同步工具(同步工具即是一个应用程序)的界面中,录入包括源库名称、源库表名、源库表类型、源表更新字段、源表除重字段及目标库名称的源库信息即可。在完成了源库信息录入后,后续自动生成建表脚本和同步脚本,使数据同步操作简单化。In this embodiment, when it is required to synchronize the metadata from the source library to the target library, it is only necessary to input the source in the interface of the synchronization tool (the synchronization tool is an application) designed by the data synchronization method. The source library information of the library name, the source library table name, the source library table type, the source table update field, the source table deduplication field, and the target library name may be used. After the source library information is entered, the build script and the synchronization script are automatically generated later, which makes the data synchronization operation simple.
其中,元数据所初始存储的源库为oracle数据库(oracle数据库又名Oracle
RDBMS,或简称Oracle,是甲骨文公司的一款关系数据库管理系统),MySQL(MySQL是一个开放源码的小型关联式数据库管理系统)数据库或PostgreSQL(PostgreSQL
是一个自由的对象-关系数据库服务器)数据库,所述目标库为hive(hive是基于Hadoop的一个数据仓库工具)数据库。这些数据库均为常见且易于操作的数据库管理系统及工具,便于在本实施例中对数据进行分析及处理。Wherein, the source library initially stored by the metadata is an oracle database (the oracle database is also named Oracle)
RDBMS, or Oracle for short, is a relational database management system from Oracle), MySQL (MySQL is an open source small relational database management system) database or PostgreSQL (PostgreSQL)
Is a free object-relational database server) database, the target library is hive (hive is a data warehouse tool based on Hadoop) database. These databases are common and easy to operate database management systems and tools to facilitate analysis and processing of data in this embodiment.
步骤S200,根据源表结构生成用于在目标库中建立临时存储数据的第一目标库表、和在目标库中建立存储与源库相同数据的第二目标库表的建表脚本。Step S200: Generate a first target library table for establishing temporary storage data in the target library according to the source table structure, and a table construction script for establishing a second target library table storing the same data as the source library in the target library.
本实施例中,当录入源库信息后,则需根据录入的源库信息自动解析源表结构,根据源表结构即及预设的建表规则来自动生成建表脚本。为了更清楚的理解自动生成建表脚本的过程,下面具体对预设的建表规则进行说明。In this embodiment, after the source library information is entered, the source table structure is automatically parsed according to the entered source library information, and the table creation script is automatically generated according to the source table structure and the preset table construction rules. In order to more clearly understand the process of automatically generating the build script, the following describes the preset build rules.
源库的建表规则如下,源库中的源表分为三种类型,分别为源表增量表(也可称为源库表增量表)、源表流水表(也可称为源库表流水表)及源表全量表(也可称为源库表全量表)三种。其中,源表增量表中数据不断有更新和新增,且当天会更新当天之前的历史数据;源表流水表中数据不断有新增,且当天不更新当天之前的历史数据;源表全量表中数据量较小,保存例如一些配置表或者维度表。The source library's table construction rules are as follows. The source table in the source library is divided into three types, namely the source table delta table (also called the source table table delta table) and the source table flow table (also called the source). The library table flow table) and the source table full scale (also known as the source library table full scale) three. Among them, the data in the source table delta table is continuously updated and added, and the historical data before the day is updated on the same day; the data in the source table flow table is continuously added, and the historical data before the day is not updated on the same day; the source table is full. The amount of data in the table is small, such as some configuration tables or dimension tables.
目标库的建表规则如下,在目标库中建立与源表相对应的第一目标库表(即src表,也即源文件表、source表)和第二目标库表(即ods表,也即Operational
Data
Store表、操作数据存储表)。其中,src表作为临时表,其分为src增量表(按日分区)、src流水表(按日分区)、src全量表(不需分区)。ods表与源表数据一致,其分为ods增量表(不需分区、ods增量表将src增量表中全部数据除重后即可使用)、ods流水表(按日分区、不除重)、ods全量表(不需分区、不除重)。The rules for creating a target library are as follows: a first target library table corresponding to the source table (ie, a src table, that is, a source file table, a source table) and a second target library table (ie, an ods table) are created in the target library. Operational
Data
Store table, operation data storage table). Among them, the src table is used as a temporary table, which is divided into src increment table (by day partition), src flow table (by day partition), src full scale (no partition required). The ods table is consistent with the source table data, which is divided into ods delta tables (no partitioning, ods delta table can be used to remove all data in the src delta table), ods flow meter (by day partition, not divided) Heavy), ods full scale (no partitioning, no weight removal).
当预设了上述的建表规则后,在同步工具的界面中,录入包括源库名称、源库表名、源库表类型、源表更新字段、源表除重字段及目标库名称的源库信息即可,这样简化了操作。After the above-mentioned table construction rules are preset, in the interface of the synchronization tool, the source including the source library name, the source library table name, the source library table type, the source table update field, the source table deduplication field, and the target library name is entered. Library information is ok, which simplifies operation.
步骤S300、获取第一目标库表的表类型,并对应生成用于将元数据从源库依次经过第一目标库表、第二目标库表同步至目标库的同步脚本。Step S300: Obtain a table type of the first target library table, and generate a synchronization script for synchronizing the metadata from the source library to the target library through the first target library table and the second target library table.
本实施例中,由于所述第一目标库表的表类型为增量表、流水表或全量表中的一种;所述第二目标库表的表类型为增量表、流水表或全量表中的一种;且所述第一目标库表的表类型与源库表类型相同,且所述第二目标库表的表类型与源库表类型相同。这样,根据所录入源库信息的源库表类型,即可得到所述第一目标库表的表类型,并得到所述第二目标库表的表类型。In this embodiment, the table type of the first target library table is one of a delta table, a pipeline table, or a full scale table; and the table type of the second target library table is an increment table, a flow table, or a full amount. One of the tables; and the table type of the first target library table is the same as the source library table type, and the table type of the second target library table is the same as the source library table type. In this way, according to the source library table type of the source library information, the table type of the first target library table is obtained, and the table type of the second target library table is obtained.
在一个实施例中,如图2所示,所述步骤S100具体包括:In an embodiment, as shown in FIG. 2, the step S100 specifically includes:
步骤S101、获取所录入包括源库名称、源库表名、源库表类型、源表更新字段、源表除重字段及目标库名称的源库信息。Step S101: Obtain source library information that includes the source library name, the source library table name, the source library table type, the source table update field, the source table deduplication field, and the target library name.
具体的,例如,当需要将元数据从nesop数据库(nesop是源库名称)中同步一张表n_bas_pc_sop(n_bas_pc_sop也仅仅是一个源数据库中一张源库表名,其中包括的数据为元数据)至hive数据库pa_nesop(pa_nesop是hive类型的目标库名称),则可在同步工具的界面上需录入如下信息:Specifically, for example, when it is necessary to synchronize metadata from the nesop database (nesop is the source library name), a table n_bas_pc_sop (n_bas_pc_sop is also just a source database table name in a source database, the data included is metadata) to hive The database pa_nesop (pa_nesop is the target library name of the hive type), you can enter the following information on the interface of the synchronization tool:
源库名称:nesopSource library name: nesop
源库表名:n_bas_pc_sopSource library table name: n_bas_pc_sop
源库表类型:增量表Source library table type: delta table
源表更新字段:update_dateSource table update field: update_date
源表除重字段:idSource table deduplication field: id
目标库名称:pa_nesop。Target library name: pa_nesop.
在整个的配置过程中,只需完成上述源库信息的录入即可,极大地方便了用户。In the entire configuration process, it is only necessary to complete the input of the above source library information, which greatly facilitates the user.
步骤S102、从与源库信息中源库名称对应的源库获取元数据。Step S102: Obtain metadata from a source library corresponding to the source library name in the source library information.
元数据信息主要为前端业务系统的各种信息,如银行业务系统的客户的各种身份及存款信息等,元数据信息体现在由表属主、表名称、表注释、列名称,列注释和列顺序等部分组成的元数据信息表中。The metadata information is mainly various information of the front-end business system, such as various identity and deposit information of the customers of the banking system, and the metadata information is embodied by the table owner, table name, table comment, column name, column comment and Column order and other parts of the metadata information table.
步骤S103、解析获取元数据的元数据信息表,并根据元数据信息表对应得到源表结构。Step S103: Parse the metadata information table of the acquired metadata, and obtain the source table structure according to the metadata information table.
表结构定义一个表的字段、类型、主键、外键、索引,这些基本的属性组成了数据库的表结构。源表结构,也即源库中源库表对应表结构。在业务系统中,其表结构改变主要体现在列名称的变化上,如增加或修改了如客户的性别信息栏、收入状况信息栏等等,故本实施例中主要针对元数据信息的各个表属主的列名称进行对比。The table structure defines the fields, types, primary keys, foreign keys, and indexes of a table. These basic attributes form the table structure of the database. The source table structure, that is, the source table corresponding to the table structure in the source library. In the business system, the change of the table structure is mainly reflected in the change of the column name, such as adding or modifying the gender information column of the customer, the income status information column, etc., so in this embodiment, the table mainly focuses on the metadata information. The owner's column names are compared.
在一个实施例中,如图3所示,所述步骤S300包括:In an embodiment, as shown in FIG. 3, the step S300 includes:
步骤S301、获取第一目标库表的表类型,判断第一目标库表的表类型是增量表、流水表或是全量表;Step S301: Obtain a table type of the first target library table, and determine that the table type of the first target library table is an incremental table, a flow table, or a full scale table;
步骤S302、当第一目标库表的表类型是增量表时,则对应生成第一sqoop同步脚本及hive程序;所述第一sqoop同步脚本及hive程序用于将元数据从源库同步至第一目标库表的指定分区中,再将第一目标库表中的元数据根据源表除重字段进行除重后存入第二目标库表。Step S302, when the table type of the first target library table is an incremental table, correspondingly generating a first sqoop synchronization script and a hive program; the first sqoop synchronization script and the hive program are used to synchronize metadata from the source library to In the specified partition of the first target library table, the metadata in the first target library table is deduplicated according to the source table deduplication field and then stored in the second target library table.
根据建表脚本,即可自动生成第一目标库表(n_bas_pc_sop_src)和第二目标库表(n_bas_pc_sop_ods)。此时,判断第一目标库表(n_bas_pc_sop_src)是增量表(即该增量表为分区表,按天分区)时,则程序会自动生成第一sqoop同步脚本及hive程序,每天将数据从源库同步至第一目标库表(n_bas_pc_sop_src)的一个新分区中,之后将第一目标库表(n_bas_pc_sop_src)中所有数据根据源表除重字段(id)除重,存入第二目标库表(n_bas_pc_sop_ods)。其中,所述第一sqoop同步脚本及hive程序的执行周期为24h(即每天执行一次)。According to the build script, the first target library table (n_bas_pc_sop_src) and the second target library table (n_bas_pc_sop_ods) are automatically generated. At this point, when the first target library table (n_bas_pc_sop_src) is determined to be an incremental table (that is, the delta table is a partition table, partitioned by day), the program automatically generates the first sqoop synchronization script and the hive program, and the data is The source library is synchronized to a new partition of the first target library table (n_bas_pc_sop_src), and then all the data in the first target library table (n_bas_pc_sop_src) is deduplicated according to the source table deduplication field (id) and stored in the second target library table. (n_bas_pc_sop_ods). The execution period of the first sqoop synchronization script and the hive program is 24h (that is, once a day).
为了更清楚的了解sqoop同步脚本,下面通过一具体实施例来说明。例如执行sqoop指令将数据从mysql导入到hive中,指令为如下:In order to understand the sqoop synchronization script more clearly, the following is explained by a specific embodiment. For example, execute the sqoop command to import data from mysql into hive. The instructions are as follows:
Sqoop import--connect
jdbc:mysql://10.1.11.78:3306/video --table base_event --username root
--password 123456 --m 1 --hive-import --hive-database video --hive-table
base_event --hive-overwrite --fields-terminated --by "\t" –lines –terminated
--by “\n”---as-textfile;Sqoop import--connect
Jdbc:mysql://10.1.11.78:3306/video --table base_event --username root
--password 123456 --m 1 --hive-import --hive-database video --hive-table
Base_event --hive-overwrite --fields-terminated --by "\t" –lines –terminated
--by "\n"---as-textfile;
其中,sqoop import --执行sqoop导入指令;Among them, sqoop import -- execute the sqoop import instruction;
--connect
jdbc:mysql://hostname:port/database--要连接的数据库地址、端口号、数据库database;--connect
Jdbc: mysql://hostname:port/database--the database address, port number, database database to be connected;
--table base_app ----- 要操作的数据库表;--table base_app ----- The database table to be operated;
--username root ----- 连接数据库的用户名;--username root ----- The username to connect to the database;
--password 123456 --- 连接数据库的密码;--password 123456 --- Password to connect to the database;
-m 1 ------ 要启动的map数量;-m 1 ------ the number of maps to be started;
--hive-import --- 采用hive方式导入;--hive-import --- Imported by hive;
[--create-hive-table] ---
如果导入的表在hive中不存在的话,sqoop自动在hive中创建该表。但是当表存在的情况下,添加该选项会导致指令报错。[--create-hive-table] ---
If the imported table does not exist in the hive, sqoop automatically creates the table in hive. However, when the table exists, adding this option will cause the command to report an error.
--hive-database XXX ---
要将数据库表导入到hive的那个database中;--hive-database XXX ---
To import the database table into the database of hive;
--hive-table base_ XXX --- 要将数据库表导入到hive的表中;--hive-table base_ XXX --- To import the database table into the hive table;
--hive-overwrite
---如果hive的表中已经存在数据,添加该项操作后,会将原有的数据覆盖掉;--hive-overwrite
--- If there is already data in the table of hive, after adding this operation, the original data will be overwritten;
--fields-terminated-by “\t” ---
hive存储到hdfs中的文件中字段间的分隔符;--fields-terminated-by "\t" ---
Hive stores the separator between fields in the file in hdfs;
--lines-terminated-by “\n”–
hive存储到hdfs中的文件中每行间的分隔符;--lines-terminated-by “\n”–
Hive is the separator between each line in the file stored in hdfs;
--as-textfile ---hive存储到hdfs中的文件格式,采用文本存储。--as-textfile ---hive Stores the file format into hdfs, using text storage.
也即根据所录入的源库信息,及预设的建表规则就能自动生成同步脚本。That is, the synchronization script can be automatically generated according to the source library information entered and the preset table construction rules.
在一个实施例中,如图3所示,所述步骤S300还包括:In an embodiment, as shown in FIG. 3, the step S300 further includes:
步骤S303、当第一目标库表的表类型是流水表时,则对应生成第二sqoop同步脚本及hive程序;所述第二sqoop同步脚本及hive程序用于将元数据从源库同步至第一目标库表的指定分区中,再将第一目标库表中的元数据存入第二目标库表。Step S303, when the table type of the first target library table is a pipeline table, correspondingly generating a second sqoop synchronization script and a hive program; the second sqoop synchronization script and the hive program are used to synchronize metadata from the source library to the first In the specified partition of the target library table, the metadata in the first target library table is stored in the second target library table.
判断第一目标库表(n_bas_pc_sop_src)是流水表(即该流水表为分区表,按天分区)时,则程序会自动生成第二sqoop同步脚本及hive程序,每天将数据从源库同步至第一目标库表(n_bas_pc_sop_src)的一个新分区中,之后将第一目标库表(n_bas_pc_sop_src)中所有数据根据源表无需除重,直接存入第二目标库表(n_bas_pc_sop_ods)。其中,所述第二sqoop同步脚本及hive程序的执行周期为24h(即每天执行一次)。When the first target library table (n_bas_pc_sop_src) is determined to be a pipeline table (that is, the pipeline table is a partition table, partitioned by day), the program automatically generates a second sqoop synchronization script and a hive program, and synchronizes the data from the source library to the first day. In a new partition of the target library table (n_bas_pc_sop_src), all the data in the first target library table (n_bas_pc_sop_src) is directly stored in the second target library table (n_bas_pc_sop_ods) according to the source table without deduplication. The execution period of the second sqoop synchronization script and the hive program is 24h (that is, once a day).
在一个实施例中,如图3所示,所述步骤S300还包括:In an embodiment, as shown in FIG. 3, the step S300 further includes:
步骤S304、当第一目标库表的表类型是全量表时,则对应生成第三sqoop同步脚本及hive程序;所述第三sqoop同步脚本及hive程序用于将元数据从源库同步至第一目标库表,再将第一目标库表中的元数据存入第二目标库表。Step S304, when the table type of the first target library table is a full scale, corresponding to generating a third sqoop synchronization script and a hive program; the third sqoop synchronization script and the hive program are used to synchronize metadata from the source library to the first A target library table, and then storing the metadata in the first target library table into the second target library table.
判断第一目标库表(n_bas_pc_sop_src)是全量表(即该全量表为非分区表)时,则程序会自动生成第三sqoop同步脚本及hive程序,每天将数据从源库的全表不进行除重直接同步至第一目标库表(n_bas_pc_sop_src),之后将第一目标库表(n_bas_pc_sop_src)中所有数据,直接存入第二目标库表(n_bas_pc_sop_ods)。其中,所述第三sqoop同步脚本及hive程序的执行周期为24h(即每天执行一次)。When the first target library table (n_bas_pc_sop_src) is a full scale table (that is, the full scale table is a non-partitioned table), the program automatically generates a third sqoop synchronization script and a hive program, and the data is not removed from the full table of the source library every day. The data is directly synchronized to the first target library table (n_bas_pc_sop_src), and then all the data in the first target library table (n_bas_pc_sop_src) is directly stored in the second target library table (n_bas_pc_sop_ods). The execution period of the third sqoop synchronization script and the hive program is 24h (that is, once a day).
请继续参阅图4,图4为一个实施例中计算机设备的内部结构示意图。该计算机设备可以是终端,也可以是服务器,其中,终端可以是智能手机、平板电脑、笔记本电脑、台式电脑等具有通信功能的电子设备。服务器可以是独立的服务器,也可以是多个服务器组成的服务器集群。参照图4,该计算机设备包括通过系统总线连接的处理器、非易失性存储介质、内存储器和网络接口。其中,该计算机设备的非易失性存储介质可存储操作系统和计算机可读程序,该计算机可读程序被执行时,可使得处理器执行一种核保难度预测的方法。该计算机设备的处理器用于提供计算和控制能力,支撑整个计算机设备的运行。该内存储器中可储存有计算机可读程序,该计算机可读程序被处理器执行时,可使得处理器执行一种数据同步方法。计算机设备的网络接口用于进行网络通信,如发送分配的任务等。本领域技术人员可以理解,图4中示出的结构,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Please continue to refer to FIG. 4. FIG. 4 is a schematic diagram showing the internal structure of a computer device in an embodiment. The computer device may be a terminal or a server, wherein the terminal may be a communication device, such as a smart phone, a tablet computer, a notebook computer, or a desktop computer. The server can be a standalone server or a server cluster consisting of multiple servers. Referring to FIG. 4, the computer device includes a processor, a non-volatile storage medium, an internal memory, and a network interface connected by a system bus. Wherein, the non-volatile storage medium of the computer device can store an operating system and a computer readable program that, when executed, can cause the processor to perform a method of verifying the difficulty prediction. The processor of the computer device is used to provide computing and control capabilities to support the operation of the entire computer device. The internal memory can store a computer readable program that, when executed by the processor, causes the processor to perform a data synchronization method. The network interface of the computer device is used for network communication, such as sending assigned tasks. It will be understood by those skilled in the art that the structure shown in FIG. 4 does not constitute a limitation on the computer device to which the present application is applied, and the specific computer device may include more or less components than those shown in the figure. , or combine some components, or have different component arrangements.
本申请还提供一种数据同步设备,其包括处理器10、存储器20及显示器30。图4仅示出了数据同步设备的部分组件,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。The application also provides a data synchronization device including a processor 10, a memory 20, and a display 30. Figure 4 shows only some of the components of the data synchronization device, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
所述存储器20在一些实施例中可以是所述数据同步设备的各组件的内部存储单元,例如数据同步设备的硬盘或内存。所述存储器20在另一些实施例中也可以是所述数据同步设备的各组件的外部存储设备,例如数据同步设备的各组件上配备的插接式硬盘,智能存储卡(Smart
Media Card, SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash
Card)等。进一步地,所述存储器20还可以既包括所述数据同步设备的内部存储单元也包括外部存储设备。所述存储器20用于存储安装于所述数据同步设备的应用软件及各类数据,例如所述安装数据同步设备的程序代码等。所述存储器20还可以用于暂时地存储已经输出或者将要输出的数据。在一实施例中,存储器20上存储有数据同步程序40,该数据同步程序40可被处理器10所执行,从而实现本申请各实施例的修改源数据库表结构方法。The memory 20, in some embodiments, may be an internal storage unit of components of the data synchronization device, such as a hard disk or memory of a data synchronization device. The memory 20 may also be an external storage device of each component of the data synchronization device in other embodiments, such as a plug-in hard disk equipped on each component of the data synchronization device, and a smart memory card (Smart
Media Card, SMC), Secure Digital (SD) card, flash card (Flash)
Card) and so on. Further, the memory 20 may also include both an internal storage unit of the data synchronization device and an external storage device. The memory 20 is configured to store application software and various types of data installed on the data synchronization device, such as the program code of the installation data synchronization device. The memory 20 can also be used to temporarily store data that has been output or is about to be output. In one embodiment, a memory synchronization program 40 is stored on the memory 20, and the data synchronization program 40 can be executed by the processor 10 to implement the modified source database table structure method of various embodiments of the present application.
所述处理器10在用于运行所述存储器20中存储的程序代码或处理数据,例如执行所述权限认证方法等。所述显示器30用于显示在所述微信客户行为反馈设备中处理的信息以及用于显示可视化的用户界面,例如指派信息界面、认证报告界面等。所述微信客户行为反馈设备的部件10-30通过系统总线相互通信。The processor 10 is configured to execute program code or process data stored in the memory 20, for example, to execute the rights authentication method and the like. The display 30 is for displaying information processed in the WeChat customer behavior feedback device and a user interface for displaying visualizations, such as an assignment information interface, an authentication report interface, and the like. The components 10-30 of the WeChat customer behavior feedback device communicate with one another via a system bus.
在一实施例中,当处理器10执行所述存储器20中数据同步程序40时,实现上述的数据同步方法的各个步骤。In an embodiment, the various steps of the data synchronization method described above are implemented when the processor 10 executes the data synchronization program 40 in the memory 20.
请参阅图5,其为实现本申请数据同步方法的数据同步装置的功能模块图。在本实施例中,数据同步装置可以被分割成录入解析模块31、建表脚本生成模块32和同步脚本生成模块33:Please refer to FIG. 5, which is a functional block diagram of a data synchronization apparatus for implementing the data synchronization method of the present application. In this embodiment, the data synchronization device can be divided into a record resolution module 31, a table creation script generation module 32, and a synchronization script generation module 33:
录入解析模块31,用于获取所录入至少包括源库名称、源库表名称、源库表类型的源库信息,解析源库信息对应源库中所包括元数据得到源表结构;The input parsing module 31 is configured to obtain source library information that includes at least a source library name, a source library table name, and a source library table type, and parse the source library information corresponding to the metadata included in the source library to obtain a source table structure;
建表脚本生成模块32,用于根据源表结构生成用于在目标库中建立临时存储数据的第一目标库表、和在目标库中建立存储与源库相同数据的第二目标库表的建表脚本;a table creation script module 32, configured to generate, according to the source table structure, a first target library table for establishing temporary storage data in the target library, and a second target library table for storing the same data in the target library as that of the source library Table creation script;
同步脚本生成模块33,用于获取第一目标库表的表类型,并对应生成用于将元数据从源库依次经过第一目标库表、第二目标库表同步至目标库的同步脚本。The synchronization script generating module 33 is configured to obtain a table type of the first target library table, and correspondingly generate a synchronization script for synchronizing the metadata from the source library to the target library through the first target library table and the second target library table.
可选地,所述第一目标库表的表类型为增量表、流水表或全量表中的一种;所述第二目标库表的表类型为增量表、流水表或全量表中的一种。Optionally, the table type of the first target library table is one of an incremental table, a pipeline table, or a full scale table; and the table type of the second target library table is an incremental table, a pipeline table, or a full scale table. One kind.
可选地,所述获取所录入至少包括源库名称、源库表名称、源库表类型的源库信息,解析源库信息对应源库中所包括元数据得到源表结构的步骤包括:Optionally, the step of acquiring the source library information including the source library name, the source library table name, and the source library table type, and the step of parsing the source library information corresponding to the metadata included in the source library to obtain the source table structure includes:
获取所录入包括源库名称、源库表名、源库表类型、源表更新字段、源表除重字段及目标库名称的源库信息;Obtaining the source library information included in the source library name, the source library table name, the source library table type, the source table update field, the source table deduplication field, and the target library name;
从与源库信息中源库名称对应的源库获取元数据;Obtaining metadata from a source library corresponding to the source library name in the source library information;
解析获取元数据的元数据信息表,并根据元数据信息表对应得到源表结构。The metadata information table of the obtained metadata is parsed, and the source table structure is obtained according to the metadata information table.
可选地,所述所述获取第一目标库表的表类型,并对应生成用于将元数据从源库依次经过第一目标库表、第二目标库表同步至目标库的同步脚本的步骤包括:Optionally, the acquiring the table type of the first target library table, and correspondingly generating a synchronization script for synchronizing the metadata from the source library to the target library through the first target library table and the second target library table. The steps include:
获取第一目标库表的表类型,判断第一目标库表的表类型是增量表、流水表或是全量表;Obtaining a table type of the first target library table, and determining that the table type of the first target library table is an incremental table, a flow table, or a full scale table;
当第一目标库表的表类型是增量表时,则对应生成第一sqoop同步脚本及hive程序;所述第一sqoop同步脚本及hive程序用于将元数据从源库同步至第一目标库表的指定分区中,再将第一目标库表中的元数据根据源表除重字段进行除重后存入第二目标库表。When the table type of the first target library table is an incremental table, the first sqoop synchronization script and the hive program are generated correspondingly; the first sqoop synchronization script and the hive program are used to synchronize metadata from the source library to the first target. In the specified partition of the library table, the metadata in the first target library table is deduplicated according to the source table deduplication field and then stored in the second target library table.
可选地,所述所述获取第一目标库表的表类型,并对应生成用于将元数据从源库依次经过第一目标库表、第二目标库表同步至目标库的同步脚本的步骤还包括:Optionally, the acquiring the table type of the first target library table, and correspondingly generating a synchronization script for synchronizing the metadata from the source library to the target library through the first target library table and the second target library table. The steps also include:
当第一目标库表的表类型是流水表时,则对应生成第二sqoop同步脚本及hive程序;所述第二sqoop同步脚本及hive程序用于将元数据从源库同步至第一目标库表的指定分区中,再将第一目标库表中的元数据存入第二目标库表。When the table type of the first target library table is a pipeline table, the second sqoop synchronization script and the hive program are generated correspondingly; the second sqoop synchronization script and the hive program are used to synchronize the metadata from the source library to the first target library. In the specified partition of the table, the metadata in the first target library table is stored in the second target library table.
可选地,所述所述获取第一目标库表的表类型,并对应生成用于将元数据从源库依次经过第一目标库表、第二目标库表同步至目标库的同步脚本的步骤还包括:Optionally, the acquiring the table type of the first target library table, and correspondingly generating a synchronization script for synchronizing the metadata from the source library to the target library through the first target library table and the second target library table. The steps also include:
当第一目标库表的表类型是全量表时,则对应生成第三sqoop同步脚本及hive程序;所述第三sqoop同步脚本及hive程序用于将元数据从源库同步至第一目标库表,再将第一目标库表中的元数据存入第二目标库表。When the table type of the first target library table is a full scale, the third sqoop synchronization script and the hive program are generated correspondingly; the third sqoop synchronization script and the hive program are used to synchronize the metadata from the source library to the first target library. The table then stores the metadata in the first target library table into the second target library table.
可选地,所述第一sqoop同步脚本及hive程序、第二sqoop同步脚本及hive程序、第三sqoop同步脚本及hive程序的执行周期均为24h。Optionally, the execution periods of the first sqoop synchronization script and the hive program, the second sqoop synchronization script, the hive program, the third sqoop synchronization script, and the hive program are both 24 hours.
可选地,所述元数据对应的元数据信息表至少包括表属主,表名称,表注释,列名称,列注释及列顺序。Optionally, the metadata information table corresponding to the metadata includes at least a table owner, a table name, a table comment, a column name, a column comment, and a column order.
基于上述数据同步方法、设备和装置,本申请还相应提供了一种数据同步系统,请参阅图6,其包括若干个源数据库110、一目标数据库120和一数据同步设备130。Based on the above data synchronization method, device and device, the present application further provides a data synchronization system. Referring to FIG. 6, a plurality of source databases 110, a target database 120, and a data synchronization device 130 are included.
其中,若干个源数据库110的元数据,均通过数据同步设备130中处理后由自动生成的建表脚本和同步脚本上传至目标数据库120。The metadata of the plurality of source databases 110 are processed by the data synchronization device 130 and uploaded to the target database 120 by the automatically generated table creation script and synchronization script.
基于上述数据同步方法、设备和装置,本申请还相应提供了一种存储介质。所述存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现上述的数据同步方法的各个步骤。Based on the above data synchronization method, device and device, the present application also provides a storage medium accordingly. The storage medium stores one or more programs that can be executed by one or more processors to implement the various steps of the data synchronization method described above.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,前述计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,前述计算机可读取存储介质可为磁碟、光盘、只读存储记忆体(Read-Only
Memory,ROM)等非易失性存储介质。A person skilled in the art can understand that all or part of the process of implementing the above embodiments can be completed by a computer program to instruct related hardware, and the computer program can be stored in a non-volatile computer readable storage medium. The computer program, when executed, may include the flow of an embodiment of the methods as described above. Wherein, the foregoing computer readable storage medium can be a magnetic disk, an optical disk, or a read-only storage memory (Read-Only)
Non-volatile storage media such as Memory, ROM).
综上所述,本申请通过配置有限的条目,自动生成建表脚本和同步脚本,使数据同步操作简单化,提升开发效率,并减少人为错误。In summary, the present application automatically generates a table creation script and a synchronization script by configuring a limited number of entries, thereby simplifying data synchronization operations, improving development efficiency, and reducing human error.
当然,本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关硬件(如处理器,控制器等)来完成,所述程序可存储于一计算机可读取的存储介质中,该程序在执行时可包括如上述各方法实施例的流程。其中所述的计算机可读存储介质可为存储器、磁碟、光盘等。Certainly, those skilled in the art can understand that all or part of the processes in implementing the above embodiments may be completed by a computer program to instruct related hardware (such as a processor, a controller, etc.), and the program may be stored in a computer. In a readable storage medium, the program may include the flow of the method embodiments as described above when executed. The computer readable storage medium described therein may be a memory, a magnetic disk, an optical disk, or the like.
应当理解的是,本申请的应用不限于上述的举例,对本领域普通技术人员来说,可以根据上述说明加以改进或变换,所有这些改进和变换都应属于本申请所附权利要求的保护范围。It should be understood that the application of the present application is not limited to the above-described examples, and those skilled in the art can make modifications and changes in accordance with the above description, all of which are within the scope of the appended claims.
Claims (20)
- 一种数据同步方法,其中,包括如下步骤: A data synchronization method includes the following steps:获取所录入至少包括源库名称、源库表名称、源库表类型的源库信息,解析源库信息对应源库中所包括元数据得到源表结构;Obtaining the source library information included in the source library name, the source library table name, and the source library table type, and parsing the source library information corresponding to the metadata included in the source library to obtain the source table structure;根据源表结构生成用于在目标库中建立临时存储数据的第一目标库表、和在目标库中建立存储与源库相同数据的第二目标库表的建表脚本;Generating, according to the source table structure, a first target library table for establishing temporary storage data in the target library, and a table creation script for establishing a second target library table storing the same data as the source library in the target library;获取第一目标库表的表类型,并对应生成用于将元数据从源库依次经过第一目标库表、第二目标库表同步至目标库的同步脚本。Obtaining a table type of the first target library table, and correspondingly generating a synchronization script for synchronizing the metadata from the source library through the first target library table and the second target library table to the target library.
- 根据权利要求1所述数据同步方法,其中,所述第一目标库表的表类型为增量表、流水表或全量表中的一种;所述第二目标库表的表类型为增量表、流水表或全量表中的一种。The data synchronization method according to claim 1, wherein the table type of the first target library table is one of a delta table, a pipeline table, or a full scale table; and the table type of the second target library table is an increment. One of a table, a flow meter, or a full scale.
- 根据权利要求2所述数据同步方法,其中,所述获取所录入至少包括源库名称、源库表名称、源库表类型的源库信息,解析源库信息对应源库中所包括元数据得到源表结构的步骤包括:The data synchronization method according to claim 2, wherein the obtaining source library information including at least a source library name, a source library table name, and a source library table type is obtained, and the source library information is corresponding to the metadata included in the source library. The steps of the source table structure include:获取所录入包括源库名称、源库表名、源库表类型、源表更新字段、源表除重字段及目标库名称的源库信息;Obtaining the source library information included in the source library name, the source library table name, the source library table type, the source table update field, the source table deduplication field, and the target library name;从与源库信息中源库名称对应的源库获取元数据;Obtaining metadata from a source library corresponding to the source library name in the source library information;解析获取元数据的元数据信息表,并根据元数据信息表对应得到源表结构。The metadata information table of the obtained metadata is parsed, and the source table structure is obtained according to the metadata information table.
- 根据权利要求3所述数据同步方法,其中,所述获取第一目标库表的表类型,并对应生成用于将元数据从源库依次经过第一目标库表、第二目标库表同步至目标库的同步脚本的步骤包括:The data synchronization method according to claim 3, wherein the obtaining the table type of the first target library table and correspondingly generating the metadata for sequentially transferring the metadata from the source library to the first target library table and the second target library table to The steps of the synchronization script for the target library include:获取第一目标库表的表类型,判断第一目标库表的表类型是增量表、流水表或是全量表;Obtaining a table type of the first target library table, and determining that the table type of the first target library table is an incremental table, a flow table, or a full scale table;当第一目标库表的表类型是增量表时,则对应生成第一sqoop同步脚本及hive程序;所述第一sqoop同步脚本及hive程序用于将元数据从源库同步至第一目标库表的指定分区中,再将第一目标库表中的元数据根据源表除重字段进行除重后存入第二目标库表。When the table type of the first target library table is an incremental table, the first sqoop synchronization script and the hive program are generated correspondingly; the first sqoop synchronization script and the hive program are used to synchronize metadata from the source library to the first target. In the specified partition of the library table, the metadata in the first target library table is deduplicated according to the source table deduplication field and then stored in the second target library table.
- 根据权利要求4所述数据同步方法,其中,所述获取第一目标库表的表类型,并对应生成用于将元数据从源库依次经过第一目标库表、第二目标库表同步至目标库的同步脚本的步骤中还包括:The data synchronization method according to claim 4, wherein the obtaining a table type of the first target library table is correspondingly generated for synchronizing metadata from the source library to the first target library table and the second target library table to The steps of the synchronization script of the target library also include:当第一目标库表的表类型是流水表时,则对应生成第二sqoop同步脚本及hive程序;所述第二sqoop同步脚本及hive程序用于将元数据从源库同步至第一目标库表的指定分区中,再将第一目标库表中的元数据存入第二目标库表。When the table type of the first target library table is a pipeline table, the second sqoop synchronization script and the hive program are generated correspondingly; the second sqoop synchronization script and the hive program are used to synchronize the metadata from the source library to the first target library. In the specified partition of the table, the metadata in the first target library table is stored in the second target library table.
- 根据权利要求5所述数据同步方法,其中,所述获取第一目标库表的表类型,并对应生成用于将元数据从源库依次经过第一目标库表、第二目标库表同步至目标库的同步脚本的步骤中还包括:The data synchronization method according to claim 5, wherein the obtaining the table type of the first target library table and correspondingly generating the metadata for sequentially transferring the metadata from the source library to the first target library table and the second target library table to The steps of the synchronization script of the target library also include:当第一目标库表的表类型是全量表时,则对应生成第三sqoop同步脚本及hive程序;所述第三sqoop同步脚本及hive程序用于将元数据从源库同步至第一目标库表,再将第一目标库表中的元数据存入第二目标库表。When the table type of the first target library table is a full scale, the third sqoop synchronization script and the hive program are generated correspondingly; the third sqoop synchronization script and the hive program are used to synchronize the metadata from the source library to the first target library. The table then stores the metadata in the first target library table into the second target library table.
- 根据权利要求6所述数据同步方法,其中,所述第一sqoop同步脚本及hive程序、第二sqoop同步脚本及hive程序、第三sqoop同步脚本及hive程序的执行周期均为24h。The data synchronization method according to claim 6, wherein the execution periods of the first sqoop synchronization script and the hive program, the second sqoop synchronization script and the hive program, the third sqoop synchronization script, and the hive program are both 24h.
- 根据权利要求1所述数据同步方法,其中,所述元数据对应的元数据信息表至少包括表属主,表名称,表注释,列名称,列注释及列顺序。The data synchronization method according to claim 1, wherein the metadata information table corresponding to the metadata includes at least a table owner, a table name, a table comment, a column name, a column comment, and a column order.
- 一种数据同步装置,其中,所述数据同步装置包括:A data synchronization device, wherein the data synchronization device comprises:录入解析模块,设置为获取所录入至少包括源库名称、源库表名称、源库表类型的源库信息,解析源库信息对应源库中所包括元数据得到源表结构;The input parsing module is configured to obtain source library information including at least a source library name, a source library table name, and a source library table type, and parse the source library information corresponding to the metadata included in the source library to obtain a source table structure;建表脚本生成模块,设置为根据源表结构生成用于在目标库中建立临时存储数据的第一目标库表、和在目标库中建立存储与源库相同数据的第二目标库表的建表脚本;a table creation script module, configured to generate a first target library table for establishing temporary storage data in the target library according to the source table structure, and to establish a second target library table for storing the same data in the target library as the source library Table script同步脚本生成模块,设置为获取第一目标库表的表类型,并对应生成用于将元数据从源库依次经过第一目标库表、第二目标库表同步至目标库的同步脚本。The synchronization script generation module is configured to obtain a table type of the first target library table, and correspondingly generate a synchronization script for synchronizing the metadata from the source library to the target library through the first target library table and the second target library table.
- 如权利要求9所述的数据同步装置,其中,所述第一目标库表的表类型为增量表、流水表或全量表中的一种;所述第二目标库表的表类型为增量表、流水表或全量表中的一种。The data synchronization apparatus according to claim 9, wherein the table type of the first target library table is one of an increment table, a pipeline table, or a full scale table; and the table type of the second target library table is increased. One of a scale, a flow meter, or a full scale.
- 如权利要求10所述的数据同步装置,其中,所述获取所录入至少包括源库名称、源库表名称、源库表类型的源库信息,解析源库信息对应源库中所包括元数据得到源表结构的步骤包括:The data synchronization device according to claim 10, wherein the obtaining source library information including at least a source library name, a source library table name, and a source library table type is retrieved, and the source library information is parsed corresponding to the metadata included in the source library. The steps to obtain the source table structure include:获取所录入包括源库名称、源库表名、源库表类型、源表更新字段、源表除重字段及目标库名称的源库信息;Obtaining the source library information included in the source library name, the source library table name, the source library table type, the source table update field, the source table deduplication field, and the target library name;从与源库信息中源库名称对应的源库获取元数据;Obtaining metadata from a source library corresponding to the source library name in the source library information;解析获取元数据的元数据信息表,并根据元数据信息表对应得到源表结构。The metadata information table of the obtained metadata is parsed, and the source table structure is obtained according to the metadata information table.
- 如权利要求11所述的数据同步装置,其中,所述获取第一目标库表的表类型,并对应生成用于将元数据从源库依次经过第一目标库表、第二目标库表同步至目标库的同步脚本的步骤包括:The data synchronization device according to claim 11, wherein said obtaining a table type of the first target library table and correspondingly generating for synchronizing the metadata from the source library through the first target library table and the second target library table The steps to the synchronization script to the target library include:获取第一目标库表的表类型,判断第一目标库表的表类型是增量表、流水表或是全量表;Obtaining a table type of the first target library table, and determining that the table type of the first target library table is an incremental table, a flow table, or a full scale table;当第一目标库表的表类型是增量表时,则对应生成第一sqoop同步脚本及hive程序;所述第一sqoop同步脚本及hive程序用于将元数据从源库同步至第一目标库表的指定分区中,再将第一目标库表中的元数据根据源表除重字段进行除重后存入第二目标库表。When the table type of the first target library table is an incremental table, the first sqoop synchronization script and the hive program are generated correspondingly; the first sqoop synchronization script and the hive program are used to synchronize metadata from the source library to the first target. In the specified partition of the library table, the metadata in the first target library table is deduplicated according to the source table deduplication field and then stored in the second target library table.
- 一种数据同步设备,其中,包括:处理器、存储器、通信总线;所述存储器上存储有可被所述处理器执行的计算机可读程序;A data synchronization device, comprising: a processor, a memory, a communication bus; and a memory readable program executable by the processor;所述通信总线实现处理器和存储器之间的连接通信;The communication bus implements connection communication between the processor and the memory;所述处理器执行所述计算机可读程序时,实现以下步骤:When the processor executes the computer readable program, the following steps are implemented:获取所录入至少包括源库名称、源库表名称、源库表类型的源库信息,解析源库信息对应源库中所包括元数据得到源表结构;Obtaining the source library information included in the source library name, the source library table name, and the source library table type, and parsing the source library information corresponding to the metadata included in the source library to obtain the source table structure;根据源表结构生成用于在目标库中建立临时存储数据的第一目标库表、和在目标库中建立存储与源库相同数据的第二目标库表的建表脚本;Generating, according to the source table structure, a first target library table for establishing temporary storage data in the target library, and a table creation script for establishing a second target library table storing the same data as the source library in the target library;获取第一目标库表的表类型,并对应生成用于将元数据从源库依次经过第一目标库表、第二目标库表同步至目标库的同步脚本。Obtaining a table type of the first target library table, and correspondingly generating a synchronization script for synchronizing the metadata from the source library through the first target library table and the second target library table to the target library.
- 如权利要求13所述的数据同步设备,其中,所述第一目标库表的表类型为增量表、流水表或全量表中的一种;所述第二目标库表的表类型为增量表、流水表或全量表中的一种。The data synchronization device according to claim 13, wherein the table type of the first target library table is one of an increment table, a pipeline table, or a full scale table; and the table type of the second target library table is increased. One of a scale, a flow meter, or a full scale.
- 如权利要求14所述的数据同步设备,其中,所述获取所录入至少包括源库名称、源库表名称、源库表类型的源库信息,解析源库信息对应源库中所包括元数据得到源表结构的步骤包括:The data synchronization device according to claim 14, wherein the obtaining source library information including at least a source library name, a source library table name, and a source library table type is obtained, and the source library information is parsed corresponding to the metadata included in the source library. The steps to obtain the source table structure include:获取所录入包括源库名称、源库表名、源库表类型、源表更新字段、源表除重字段及目标库名称的源库信息;Obtaining the source library information included in the source library name, the source library table name, the source library table type, the source table update field, the source table deduplication field, and the target library name;从与源库信息中源库名称对应的源库获取元数据;Obtaining metadata from a source library corresponding to the source library name in the source library information;解析获取元数据的元数据信息表,并根据元数据信息表对应得到源表结构。The metadata information table of the obtained metadata is parsed, and the source table structure is obtained according to the metadata information table.
- 如权利要求15所述的数据同步设备,其中,所述获取第一目标库表的表类型,并对应生成用于将元数据从源库依次经过第一目标库表、第二目标库表同步至目标库的同步脚本的步骤包括:The data synchronization device according to claim 15, wherein said obtaining a table type of the first target library table and correspondingly generating for synchronizing metadata from the source library through the first target library table and the second target library table The steps to the synchronization script to the target library include:获取第一目标库表的表类型,判断第一目标库表的表类型是增量表、流水表或是全量表;Obtaining a table type of the first target library table, and determining that the table type of the first target library table is an incremental table, a flow table, or a full scale table;当第一目标库表的表类型是增量表时,则对应生成第一sqoop同步脚本及hive程序;所述第一sqoop同步脚本及hive程序用于将元数据从源库同步至第一目标库表的指定分区中,再将第一目标库表中的元数据根据源表除重字段进行除重后存入第二目标库表。When the table type of the first target library table is an incremental table, the first sqoop synchronization script and the hive program are generated correspondingly; the first sqoop synchronization script and the hive program are used to synchronize metadata from the source library to the first target. In the specified partition of the library table, the metadata in the first target library table is deduplicated according to the source table deduplication field and then stored in the second target library table.
- 一种存储介质,其中,所述存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,实现以下步骤:A storage medium, wherein the storage medium stores one or more programs, the one or more programs being executable by one or more processors, implementing the following steps:获取所录入至少包括源库名称、源库表名称、源库表类型的源库信息,解析源库信息对应源库中所包括元数据得到源表结构;Obtaining the source library information included in the source library name, the source library table name, and the source library table type, and parsing the source library information corresponding to the metadata included in the source library to obtain the source table structure;根据源表结构生成用于在目标库中建立临时存储数据的第一目标库表、和在目标库中建立存储与源库相同数据的第二目标库表的建表脚本;Generating, according to the source table structure, a first target library table for establishing temporary storage data in the target library, and a table creation script for establishing a second target library table storing the same data as the source library in the target library;获取第一目标库表的表类型,并对应生成用于将元数据从源库依次经过第一目标库表、第二目标库表同步至目标库的同步脚本。Obtaining a table type of the first target library table, and correspondingly generating a synchronization script for synchronizing the metadata from the source library through the first target library table and the second target library table to the target library.
- 如权利要求17所述的存储介质,其中,所述第一目标库表的表类型为增量表、流水表或全量表中的一种;所述第二目标库表的表类型为增量表、流水表或全量表中的一种。The storage medium of claim 17, wherein the table type of the first target library table is one of a delta table, a pipeline table, or a full scale table; and the table type of the second target library table is an increment One of a table, a flow meter, or a full scale.
- 如权利要求18所述的存储介质,其中,所述获取所录入至少包括源库名称、源库表名称、源库表类型的源库信息,解析源库信息对应源库中所包括元数据得到源表结构的步骤包括:The storage medium of claim 18, wherein the obtaining source library information including at least a source library name, a source library table name, and a source library table type is obtained, and the parsing source library information corresponding to the metadata included in the source library is obtained. The steps of the source table structure include:获取所录入包括源库名称、源库表名、源库表类型、源表更新字段、源表除重字段及目标库名称的源库信息;Obtaining the source library information included in the source library name, the source library table name, the source library table type, the source table update field, the source table deduplication field, and the target library name;从与源库信息中源库名称对应的源库获取元数据;Obtaining metadata from a source library corresponding to the source library name in the source library information;解析获取元数据的元数据信息表,并根据元数据信息表对应得到源表结构。The metadata information table of the obtained metadata is parsed, and the source table structure is obtained according to the metadata information table.
- 如权利要求18所述的存储介质,其中,所述获取第一目标库表的表类型,并对应生成用于将元数据从源库依次经过第一目标库表、第二目标库表同步至目标库的同步脚本的步骤包括:The storage medium of claim 18, wherein the obtaining a table type of the first target library table and correspondingly generating the metadata for sequentially transferring the metadata from the source library to the second target library table to the second target library table The steps of the synchronization script for the target library include:获取第一目标库表的表类型,判断第一目标库表的表类型是增量表、流水表或是全量表;Obtaining a table type of the first target library table, and determining that the table type of the first target library table is an incremental table, a flow table, or a full scale table;当第一目标库表的表类型是增量表时,则对应生成第一sqoop同步脚本及hive程序;所述第一sqoop同步脚本及hive程序用于将元数据从源库同步至第一目标库表的指定分区中,再将第一目标库表中的元数据根据源表除重字段进行除重后存入第二目标库表。 When the table type of the first target library table is an incremental table, the first sqoop synchronization script and the hive program are generated correspondingly; the first sqoop synchronization script and the hive program are used to synchronize metadata from the source library to the first target. In the specified partition of the library table, the metadata in the first target library table is deduplicated according to the source table deduplication field and then stored in the second target library table.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711175125.8A CN107967316A (en) | 2017-11-22 | 2017-11-22 | A kind of method of data synchronization, equipment and computer-readable recording medium |
CN201711175125.8 | 2017-11-22 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019100638A1 true WO2019100638A1 (en) | 2019-05-31 |
Family
ID=62000385
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/082270 WO2019100638A1 (en) | 2017-11-22 | 2018-04-09 | Data synchronization method, device and equipment, and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107967316A (en) |
WO (1) | WO2019100638A1 (en) |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109241026B (en) * | 2018-07-18 | 2021-10-15 | 创新先进技术有限公司 | Data management method, device and system |
CN109241184B (en) * | 2018-08-20 | 2024-03-15 | 中国平安人寿保险股份有限公司 | Data synchronization method, device, computer equipment and storage medium |
CN109558448B (en) * | 2018-10-10 | 2021-04-06 | 北京海数宝科技有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN109657002B (en) * | 2018-11-09 | 2022-04-08 | 山东中创软件商用中间件股份有限公司 | Multi-table batch data synchronization method, device and equipment |
CN109684407A (en) * | 2018-11-23 | 2019-04-26 | 武汉达梦数据库有限公司 | A kind of method of data synchronization of DML operation |
CN109614446A (en) * | 2018-11-23 | 2019-04-12 | 金色熊猫有限公司 | Method of data synchronization, device, electronic equipment and storage medium |
CN110020840B (en) * | 2019-01-04 | 2023-09-22 | 创新先进技术有限公司 | Data transmission method and system thereof |
CN109977157A (en) * | 2019-02-27 | 2019-07-05 | 深圳点猫科技有限公司 | A kind of method and electronic equipment importing data to target directory based on data platform |
CN110059134A (en) * | 2019-03-18 | 2019-07-26 | 深圳市买买提信息科技有限公司 | A kind of data are synchronized to method, relevant apparatus and the equipment of cloud platform |
CN110209680A (en) * | 2019-04-25 | 2019-09-06 | 深圳壹账通智能科技有限公司 | Data-updating method, device and electronic device based on Hive external table |
CN110933144A (en) * | 2019-11-09 | 2020-03-27 | 许继集团有限公司 | Substation master station system and database synchronization method |
CN111125254A (en) * | 2019-12-23 | 2020-05-08 | 北京懿医云科技有限公司 | Database synchronization method and device, electronic equipment and computer readable medium |
CN111209282A (en) * | 2020-01-10 | 2020-05-29 | 深圳前海环融联易信息科技服务有限公司 | Data storage method and device, computer equipment and storage medium |
CN111324610A (en) * | 2020-02-19 | 2020-06-23 | 深圳市融壹买信息科技有限公司 | Data synchronization method and device |
CN111367883A (en) * | 2020-02-25 | 2020-07-03 | 平安科技(深圳)有限公司 | Data synchronization method, device, equipment and computer readable storage medium |
CN111400397B (en) * | 2020-02-29 | 2023-04-11 | 平安科技(深圳)有限公司 | Data synchronization method, device, equipment and computer storage medium |
CN111259068A (en) * | 2020-04-28 | 2020-06-09 | 成都四方伟业软件股份有限公司 | Data development method and system based on data warehouse |
CN111767267B (en) * | 2020-06-18 | 2024-05-10 | 杭州数梦工场科技有限公司 | Metadata processing method and device and electronic equipment |
CN111858760B (en) * | 2020-07-13 | 2024-03-22 | 中国工商银行股份有限公司 | Data processing method and device for heterogeneous database |
CN112364049B (en) * | 2020-11-10 | 2024-05-17 | 中国平安人寿保险股份有限公司 | Data synchronization script generation method, system, terminal and storage medium |
CN112269788A (en) * | 2020-11-13 | 2021-01-26 | 中盈优创资讯科技有限公司 | Method and device for improving click House data storage performance |
CN112597150A (en) * | 2020-12-04 | 2021-04-02 | 光大科技有限公司 | Data acquisition method and device, readable storage medium and electronic device |
CN112817934A (en) * | 2021-01-21 | 2021-05-18 | 厦门熵基科技有限公司 | Data migration method, device, equipment and computer readable storage medium |
CN113076314B (en) * | 2021-03-30 | 2024-04-19 | 深圳市酷开网络科技股份有限公司 | Data table storage method and device and computer readable storage medium |
CN113127448A (en) * | 2021-04-23 | 2021-07-16 | 深圳市酷开网络科技股份有限公司 | Method, device, server and storage medium for generating domain dimension table |
CN114780641B (en) * | 2022-05-07 | 2023-07-14 | 湖南长银五八消费金融股份有限公司 | Multi-library multi-table synchronization method, device, computer equipment and storage medium |
CN116089537B (en) * | 2023-04-07 | 2023-08-04 | 江西省智能产业技术创新研究院 | Incremental data synchronization method, system, computer and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102495910A (en) * | 2011-12-28 | 2012-06-13 | 畅捷通信息技术股份有限公司 | Device and method for data timing synchronization of heterogeneous system |
CN103440273A (en) * | 2013-08-06 | 2013-12-11 | 北京航空航天大学 | Data cross-platform migration method and device |
CN103744949A (en) * | 2013-12-31 | 2014-04-23 | 金蝶软件(中国)有限公司 | Data integrating method and system |
CN104317843A (en) * | 2014-10-11 | 2015-01-28 | 上海瀚之友信息技术服务有限公司 | Data synchronism ETL (Extract Transform Load) system |
US20170147672A1 (en) * | 2015-11-25 | 2017-05-25 | International Business Machines Corporation | Determining Data Replication Cost for Cloud Based Application |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8639657B2 (en) * | 2008-10-29 | 2014-01-28 | International Business Machines Corporation | Reorganizing table-based data objects |
CN102346775A (en) * | 2011-09-26 | 2012-02-08 | 苏州博远容天信息科技有限公司 | Method for synchronizing multiple heterogeneous source databases based on log |
US9245249B2 (en) * | 2013-03-12 | 2016-01-26 | Labtech Llc | General, flexible, resilent ticketing interface between a device management system and ticketing systems |
CN106919697B (en) * | 2017-03-07 | 2020-09-25 | 浪潮云信息技术股份公司 | Method for simultaneously importing data into multiple Hadoop assemblies |
CN107330003A (en) * | 2017-06-12 | 2017-11-07 | 上海藤榕网络科技有限公司 | Method of data synchronization, system, memory and data syn-chronization equipment |
CN107301250B (en) * | 2017-07-27 | 2020-06-26 | 南京南瑞集团公司 | Multi-source database collaborative backup method |
-
2017
- 2017-11-22 CN CN201711175125.8A patent/CN107967316A/en active Pending
-
2018
- 2018-04-09 WO PCT/CN2018/082270 patent/WO2019100638A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102495910A (en) * | 2011-12-28 | 2012-06-13 | 畅捷通信息技术股份有限公司 | Device and method for data timing synchronization of heterogeneous system |
CN103440273A (en) * | 2013-08-06 | 2013-12-11 | 北京航空航天大学 | Data cross-platform migration method and device |
CN103744949A (en) * | 2013-12-31 | 2014-04-23 | 金蝶软件(中国)有限公司 | Data integrating method and system |
CN104317843A (en) * | 2014-10-11 | 2015-01-28 | 上海瀚之友信息技术服务有限公司 | Data synchronism ETL (Extract Transform Load) system |
US20170147672A1 (en) * | 2015-11-25 | 2017-05-25 | International Business Machines Corporation | Determining Data Replication Cost for Cloud Based Application |
Also Published As
Publication number | Publication date |
---|---|
CN107967316A (en) | 2018-04-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019100638A1 (en) | Data synchronization method, device and equipment, and storage medium | |
WO2019037396A1 (en) | Account clearing method, device and equipment and storage medium | |
WO2018205376A1 (en) | Association information querying method, terminal, server management system, and computer readable storage medium | |
WO2019041832A1 (en) | Method, server and system for modifying source database table structure, and storage medium | |
WO2019104876A1 (en) | Insurance product pushing method and system, terminal, client terminal, and storage medium | |
WO2019024219A1 (en) | Automatic document generation method and apparatus, automatic document generator and medium | |
WO2019061613A1 (en) | Loan qualification screening method, device and computer readable storage medium | |
WO2019165691A1 (en) | Method, apparatus and device for automatically generating test case, and readable storage medium | |
WO2018188196A1 (en) | Data version control method, data version controller, device and computer-readable storage medium | |
WO2019000801A1 (en) | Data synchronization method, apparatus, and device, and computer readable storage medium | |
WO2019174375A1 (en) | Interface test method, apparatus and device, and computer-readable storage medium | |
WO2015158297A1 (en) | Method, apparatus, and system for controlling delivery task in social networking platform | |
WO2017143692A1 (en) | Smart television and voice control method therefor | |
WO2019080247A1 (en) | Method, apparatus and device for generating insurance policy approval, and computer readable storage medium | |
WO2018227880A1 (en) | Data comparison method, apparatus and device, and readable storage medium | |
WO2019169814A1 (en) | Method, apparatus and device for automatically generating chinese annotation, and storage medium | |
WO2015035777A1 (en) | Software upgrade method and system for mobile terminal | |
WO2018120430A1 (en) | Page construction method, terminal, computer-readable storage medium and page construction device | |
WO2014026526A1 (en) | Natural person information setting method and electronic device | |
WO2019114262A1 (en) | User interface loading method, smart television and computer-readable storage medium | |
WO2019000800A1 (en) | Credential preparation method, apparatus, and device and computer readable storage medium | |
WO2019109521A1 (en) | Identity approval method, apparatus and device for video interview, and readable storage medium | |
WO2018149190A1 (en) | Component debugging method, device and apparatus, and computer readable storage medium | |
WO2018214599A1 (en) | Scalable data reporting method and system, and storage medium | |
WO2018014567A1 (en) | Method for improving performance of virtual machine, and terminal, device and computer-readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02/10/2020) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18881860 Country of ref document: EP Kind code of ref document: A1 |